中文摘要 |
In a text-to-speech (TTS) conversion system based on the time-domain
pitch-synchronous overlap-add (TD-PSOLA) method, accurate estimation of pitch
periods and pitch marks is necessary for pitch modification to assure optimal
quality of synthetic speech. In general, there are two major tasks in pitch marking:
pitch detection and location determination. In this paper, an adaptable filter, which
serves as a bandpass filter, is proposed for use in pitch detection to transform
voiced speech into a sine-like wave. The pass band of the adaptable filter can be
adapted based on the fundamental frequency. Based on the sine-like wave, a
peak-valley decision method is proposed to determine the appropriate parts
(positive part and negative part) of voiced speech for use in pitch mark estimation.
In each pitch period, two possible peaks/valleys are searched, and dynamic
programming is performed to obtain pitch marks. Experimental results indicate that
our proposed method performs very well if correct pitch information is estimated. |