英文摘要 |
The modulation spectra of speech features are often distorted due to environmental interferences. In order to reduce the distortion, in this paper we apply the minimum variance (MV) criterion to obtain the optimal frequency response of the temporal filter, and then two approaches, least-squares spectral fitting (LSSF) and magnitude spectrum interpolation (MSI) are used to obtain the filtered feature sequence. Accordingly, two new temporal processing approaches are proposed, which are named MV-LSSF and MV-MSI, respectively. In the Aurora-2 clean-condition training task, we show that the new MV-LSSF and MV-MSI give more than 50% relative error rate reduction over the baseline, and provide relative error rate reductions of 8.18% and 2.73% over the conventional LSSF and MSI, respectively. These results reveal that the proposed methods significantly enhance the robustness of speech features in noise-corrupted environments. |