| 中文摘要 |
語音增強(Speech Enhancement, SE)系統不僅能夠提升語音的聽覺品質,還可以與自動語音辨識系統(Automatic Speech Recognition, ASR)相結合,從而增強ASR在噪聲環境下的強健性。然而,單通道SE可能會產生對ASR辨識不利的偽影,進而導致ASR的識別錯誤。最近的研究表明,通過引入新的SE損失函數NAaLoss,對模型進行微調,能夠有效減少模型產生偽影的效果。然而,該方法仍然存在潛在的錯誤假設。因此,在本研究中,我們通過深入分析該方法並進行大量實驗和案例分析,尋找其內部的潛在問題。為此,我們提出了改進後的新損失函數AaWLoss。經過修正和優化,AaWLoss成功解決了NAaLoss在相同設置下可能喪失抑制噪聲條件偽影功能的缺點。此外,AaWLoss在抑制乾淨條件下的偽影能力達到了巅峰水平,甚至使經過增強的乾淨語音具備了有利於ASR辨識的資訊。 |
| 英文摘要 |
The Speech Enhancement (SE) system not only enhances the perceptual quality of speech but also make the ASR performance robust in noisy enviornments when integrating with ASR systems. However, single-channel SE may generate detrimental artifacts to ASR recognition, leading to recognition errors. Recent research indicates that by introducing the novel SE loss function NAaLoss and fine-tuning the model, the generation of artifacts can be effectively reduced. Nonetheless, this approach still needs to be revised in its underlying assumptions. Therefore, we extensively analyze this method in this study and conduct numerous experiments and case studies to identify the inconsistencies. To address this, we propose an improved loss function, AaWLoss. AaWLoss successfully resolves the potential loss of noise-condition artifact suppression inherent in NAaLoss under the same settings through modifications and optimizations. Furthermore, AaWLoss achieves peak performance in suppressing artifacts under clean conditions, even adding information beneficial for ASR recognition to the enhanced clean speech. |