多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究

李世光; 王緒翔; 曹昱; 洪志偉

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究
並列篇名	A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement
作者	李世光、王緒翔、曹昱、洪志偉
中文摘要	在當今普遍的語音應用、諸如語音辨識、語音資訊檢索及聲控機器人等，用以消除雜訊干擾的語音強化技術扮演了相當重要的角色，在眾多語音強化技術中，降噪自動編碼器（denoising auto-encoder, DAE）為近年來被廣為探討與使用的方法之一，主因是其使用了當今熱門的深度學習技術、來學習雜訊語音與乾淨語音之間的對應關係，在許多文獻中，DAE法已被證實可以有效降低雜訊成分、且不至於對原始乾淨語音產生明顯干擾，然而，其效能仍然會隨著訓練語料與模型架構其選擇的不同而有所差異。在本論文中，我們主要是探討不同訊雜比的訓練語料對於DAE法其減低雜訊效應的影響。根據我們初步的評估實驗，主要的發現在於當使用高訊雜比的訓練語料時，所對應的DAE法在各種訊雜比的測試語音上，平均而言都能得到顯著的消噪效果，且優於其他種訓練語料所得之DAE，包含了由多層訊雜比的訓練語料、及近似於訓練語料之測試語音。儘管這似乎與常理不合，我們在論文中提供了可能的解釋，並提及單純使用高訊雜比的訓練語料對於DAE訓練效率的優點，其包含了訓練語料數量相對減少、可使用較少隱藏層的簡易DAE架構、及調適至其他類型雜訊之可能性等。
英文摘要	Speech enhancement (SE) that reduces the noise effect plays an important role in the current widespread audio applications such as speech recognition, speech-based information retrieval and voice control. Among the various speech enhancement echniques, denoising auto-encoder (DAE) employs the well-known deep learning process to learn the transformation from noisy data to the respective clean noise-free counterpart, and it has been shown to be very effective in reducing the noise component as well as introducing little speech distortion. In this paper, we primarily investigate the influence of the training data with different signal-to-noise ratios (SNRs) for DAE in the corresponding SE capability. The major finding from our evaluation experiment is that the DAE trained via high- SNR data provides significantly better improvement in speech quality for the noisy testing data over a wide range of noise levels, when compared with the DAE trained via either of multi-SNR data and matched-SNR data. This result somewhat disagrees with the common and instinctive sense that the model created with multi-SNR training data behaves well on average for the testing data at an arbitrary noise level, and the matched-condition model should give the optimal performance. However, we give the possible explanations about the above finding, and explore some advantages of using simply high-SNR training data to prepare the DAE for speech enhancement. These advantages include a smaller amount of training data being required, a simpler DAE structure with fewer hidden layers and higher adaptability to other noisy situations.
起訖頁	101-113
關鍵詞	語音強化、時頻圖、降噪自動編碼器、speech enhancement、spectrogram、denoising auto-encoder
刊名	ROCLING論文集
期數	2017 (2017期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	手機平台APP之四縣客語輸入法的研發
該期刊-下一篇	基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統