中文摘要 |
Research on text-to-speech (TTS) conversion for Mandarin Chinese is a much younger enterprise than comparable research for English or other European languages. Nonetheless, impressive progress has been made over the last couple of decades, and Mandarin Chinese systems now exist which approach, or in some ways even surpass in quality available systems for English. This article has two goals. The first is to summarize the published literature on Mandarin synthesis, with a view to clarifying the similarities or differences among the various efforts. One property shared by a great many systems is the dependence on the syllable as the basic unit of synthesis. We shall argue that this property stems both from the accidental fact that Mandarin has a small number of syllable types, and from traditional Sinological views of the linguistic structure of Chinese. Despite the popularity of the syllable, though, there are problems with using it as the basic synthesis unit, as we shall show. The second goal is to describe in more detail some specific problems in text-to-speech conversion for Mandarin, namely text analysis, concatenative unit selection, segmental duration and tone and intonation modeling. We illustrate these topics by describing our own work on Mandarin synthesis at Bell Laboratories. The paper starts with an introduction to some basic concepts in speech synthesis, which is intended as an aid to readers who are less familiar with this area of research. |