英文摘要 |
The newly-developed prosody module of our text-to-speech (TTS) system is described in the paper. We present two main works on it's establishment and improvement. On the basis of potential factors influencing prosody parameters, inclusive of duration, pitch and intensity, the prosody model is built as groundwork of this module which is superior to the former rule-based one in generation of natural prosody. In addition, due to the current model's flaw in prediction of the pitch contour, we further employ an technique named“Soft Template Mark-up Language“(STEM-ML) to improve the smoothness of intonation which has the crucial influence on the naturalness of synthetic speech. Results of the evaluation indicate that the new prosody model is precise enough to predict reliable prosody parameters'values and with the STEM-ML technique, the prosody module can further yield 14.75% reduction in the root mean square (RMS) error of the predicted pitch contour. |