英文摘要 |
In this study, we focus on the issue of noise distortion in speech signals, and develop two novel unsupervised speech enhancement algorithms including temporal lowpass filtering (TLP) and relative-to-maximum masking (RMM). Both of these two algorithms are conducted on the magnitude spectrogram of speech signals. TLP uses a simple moving-average filter to emphasize the low modulation frequencies of speech signals, which are believed to contain richer linguistic information and exhibit higher signal-to-noise ratios (SNR). Comparatively, in RMM we apply a mask that is directly multiplied with the speech spectrogram in a pointwise manner, and the used masking value is directly proportional to the magnitude of each temporal-frequency (T-F) point in the spectrogram. The preliminary experiments conducted on a subset of TIMIT database show that the two novel methods can promote the quality of noisecorrupted speech signals significantly, and both of them can be integrated with a well-known supervised speech enhancement scenario, namely fully convolutional network, to achieve even better perceptual speech quality values. |