用於語音增強之偽影感知加權損失函數

En-Lun Yu; Kuan-Hsun Ho; Berlin Chen

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	用於語音增強之偽影感知加權損失函數
並列篇名	AaWLoss: An Artifact-aware Weighted Loss Function for Speech Enhancement
作者	En-Lun Yu (En-Lun Yu)、Kuan-Hsun Ho (Kuan-Hsun Ho)、Berlin Chen (Berlin Chen)
中文摘要	語音增強（Speech Enhancement, SE）系統不僅能夠提升語音的聽覺品質，還可以與自動語音辨識系統（Automatic Speech Recognition, ASR）相結合，從而增強ASR在噪聲環境下的強健性。然而，單通道SE可能會產生對ASR辨識不利的偽影，進而導致ASR的識別錯誤。最近的研究表明，通過引入新的SE損失函數NAaLoss，對模型進行微調，能夠有效減少模型產生偽影的效果。然而，該方法仍然存在潛在的錯誤假設。因此，在本研究中，我們通過深入分析該方法並進行大量實驗和案例分析，尋找其內部的潛在問題。為此，我們提出了改進後的新損失函數AaWLoss。經過修正和優化，AaWLoss成功解決了NAaLoss在相同設置下可能喪失抑制噪聲條件偽影功能的缺點。此外，AaWLoss在抑制乾淨條件下的偽影能力達到了巅峰水平，甚至使經過增強的乾淨語音具備了有利於ASR辨識的資訊。
英文摘要	The Speech Enhancement (SE) system not only enhances the perceptual quality of speech but also make the ASR performance robust in noisy enviornments when integrating with ASR systems. However, single-channel SE may generate detrimental artifacts to ASR recognition, leading to recognition errors. Recent research indicates that by introducing the novel SE loss function NAaLoss and fine-tuning the model, the generation of artifacts can be effectively reduced. Nonetheless, this approach still needs to be revised in its underlying assumptions. Therefore, we extensively analyze this method in this study and conduct numerous experiments and case studies to identify the inconsistencies. To address this, we propose an improved loss function, AaWLoss. AaWLoss successfully resolves the potential loss of noise-condition artifact suppression inherent in NAaLoss under the same settings through modifications and optimizations. Furthermore, AaWLoss achieves peak performance in suppressing artifacts under clean conditions, even adding information beneficial for ASR recognition to the enhanced clean speech.
起訖頁	71-78
關鍵詞	單通道語音增強、強健性自動語音辨識、偽影處理、single-channel speech enhancement、noise-robust speech Recognition、processing artifacts
刊名	ROCLING論文集
期數	202310 (2023期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Improving Low-Resource Speech Recognition through Multilingual Fine-Tuning with Language Identifiers and Self-Training
該期刊-下一篇	一套基於詞排名的抽象式文件摘要模型訓練法