中文摘要 |
Precise phone/syllable boundary labeling of the utterances in a speech corpus plays an important role in constructing a corpus-based TTS (text-to-speech) system. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this paper, we propose a new procedure for refining the boundaries of utterances in a Mandarin speech corpus. This procedure employs different sets of acoustic features for four different phonetic categories. In addition, a new scheme is proposed to deal with the “periodic voiced + periodic voiced” case, which produced most of the segmentation errors in our experiment. Several experiments were conducted to demonstrate the feasibility of the proposed approach. |