A study of enhancing the modulation spectrum of speech signals via nonnegative matrix factorization
作者 王緒翔鄭至皓曹昱洪志偉
在本論文中,我們使用了非負矩陣分解(nonnegative matrix decomposition, NMF)技術來強化語音特徵調變頻譜,並且分別訓練資料中乾淨語音及雜訊的調變頻譜強度基底(basis),利用所得到的基底來分解測試語音之調變頻譜強度,最後搭配原始相角透過反傅立葉轉換(inverse Fourier transform)得到新的聲學頻譜並進而得到強化語音訊號。另外我們提出兩種變形以利降低演算複雜度:一種是將相鄰的聲學頻率點視為一體處理、另一種則是只處理低頻率區域的調變頻譜。最後,我們還與傳統的語音強化法做比較,如頻譜消去法和韋納濾波器法及最小期望平方誤差之短時頻譜強度估測法,驗證提出之方法的可行性。在實驗資料庫的選擇上,我們引用AURORA-2連續數字語料庫之部分語句,其中的語音訊號受到加成性雜訊影響,實驗結果顯示上述之新方法對於基礎實驗而言,能有效提升雜訊環境下語音訊號的品質(PESQ)。
In this paper, we propose to enhance the modulation spectrum of the spectrograms for speech signals via the technique of non-negative matrix factorization (NMF). In the training phase, the clean speech and noise in the training set are separately transformed to spectrograms and modulation spectra in turn, and then the magnitude modulation spectra are used to train the NMF-based basis matrices for clean speech and noise, respectively. In the test phase, the test signal is converted to its modulation spectrum, which is then enhanced via NMF with the basis matrices obtained in the training phase. The updated modulation spectrum is finally transformed back to the time domain as the enhanced signal. In addition, we propose two variants for the newly method in order to possess relatively high computation complexity One is to consider the several adjacent acoustic frequencies as a whole for the subsequent processing, and the other is to process the low modulation frequency components. These new methods are validated via a subset of the Aurora-2 noisy connected-digit database. Preliminary experiments have indicated that these methods can achieve better signal quality relative to the baseline results in terms of the Perceptual Evaluation of Speech Quality (PESQ) index, and they outperform some well-known speech enhancement methods including spectral subtraction (SS), Wiener filtering (WF) and minimum mean squared error short-time spectral amplitude estimation (MMSE-STSA).
起訖頁 181-193
關鍵詞 非負矩陣分解法語音強化時頻圖non-negative matrix factorizationspeech enhancementmodulation spectrumspectrogram
刊名 ROCLING論文集  
期數 2016 (2016期)
出版單位 中華民國計算語言學學會
