英文摘要 |
In a text-to-speech (TTS) conversion system based on the time-domain pitch-synchronous overlap-add (TD-PSOLA) method, accurate estimation of pitch periods and pitch marks is necessary for pitch modification to assure an optimal quality of the synthetic speech. In general, there are two major issues on pitch marking: pitch detection and location determination. In this paper, an adaptable filter, which serves as a bandpass filter, is proposed for pitch detection to transform the voiced speech into a sine-like wave. Based on the sine-like wave, a peak-valley decision method is investigated to determine the appropriate part (positive part and negative part) of the voiced speech for pitch mark estimation. At each pitch period, two possible peaks/valleys are searched and the dynamic programming is performed to obtain the pitch marks. Experimental results indicate that our proposed method performed very well if correct pitch information is estimated. |