英文摘要 |
Cepstral statistics normalization techniques have been shown to be very successful at improving the noise robustness of speech features. In this paper, we propose a hybrid-based scheme to achieve a more accurate estimate of the statistical information of features in these techniques. By properly integrating codebook and utterance/segment knowledge, the resulting hybrid-based normalization methods significantly outperform conventional utterance-based, segment-based and codebook-based ones in recognition accuracy. For the Aurora-2 clean-condition training task, the proposed hybrid codebook/segment-based histogram equalization (CS-HEQ) achieves an average recognition accuracy of 90.66%, which is better than utterance-based HEQ (87.62%), segment-based HEQ (85.92%) and codebook-based HEQ (85.29%). Furthermore, the high-performance CS-HEQ can be implemented with a short delay and can thus be applied in real-time online systems. A similar performance promotion can be also found in the methods of hybrid-based cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), cepstral gain normalization (CGN) and higher-order cepstral moment normalization (HOCMN). |