英文摘要 |
The fluency and continuity properties are very important in singing voice synthesis. In order to synthesize smooth and continuous singing voice, the Hidden Markov Model(HMM) -based synthesis approach is employed to build our Mandarin singing voice synthesis system. The system is designed to generate Mandarin songs with arbitrary lyrics and melodies in a certain pitch range. We also build a singing voice database for system training and synthesis, which is designed based on the phonetic converge of Mandarin speech. In addition, the acoustic feature extraction using STRAIGHT algorithm is employed to generate satisfactory vocoded singing voices. The purpose of this paper is to elaborate the construction of Mandarin singing voice synthesis system by defining the synthesis model and question set for HMM-based singing voice synthesis. In addition, we implemented two techniques, including pitch-shift pseudo data extension and vibrato post-processing, to make synthesized singing voice more natural. The proposed system framework consists of two main phases, the training phase and the synthesis phase. In the training phase, excitation, spectral and aperiodic factors are extracted from a singing voice database. The lyrics and notes of songs in the singing voice corpus are considered as contextual information for generating context-dependent label sequences. Then, the sequences are clustered with context-dependent question set and then the context-dependent HMMs are trained based on the clustered phone segments. In the synthesis phase, the input musical score and the lyric are converted into a context-dependent label sequence. The label sequence, consisting of excitation, spectrum and aperiodic factors, for the given song is constructed by concatenating the parameters generated from the context-dependent HMMs. Finally, the generated parameter sequences are synthesized using Mel Log Spectrum Approximation(MLSA) filter to generate the singing voice. |