藉由壓縮性之頻譜損失函數以學習DEMUCS語音強化模型之初步研究

Chi-En Dai; Qi-Wei Hong; Jeih-Weih Hung

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	藉由壓縮性之頻譜損失函數以學習DEMUCS語音強化模型之初步研究
並列篇名	Exploiting the compressed spectral loss for the learning of the DEMUCS speech enhancement network
作者	Chi-En Dai (Chi-En Dai)、Qi-Wei Hong (Qi-Wei Hong)、Jeih-Weih Hung (Jeih-Weih Hung)
中文摘要	本研究針對著名的DEMUCS語音強化模型、藉由修改其訓練時所需的損失函數，來提升其效能。DEMUCS由Facebook團隊開發，主要由卷積層組成其編碼模組與解碼模組，而兩模組之間則以長短時記憶模型來對編碼模組之輸出加以分解或降噪。雖然DEMUCS是一個純時域處理的語音強化架構，其訓練所使用的損失函數，卻同時涵蓋了時域和頻域的特徵，其中頻域上的特徵即為訊號經短時間傅立葉轉換所得的頻譜。我們探討當DEMUCS之損失函數中的頻譜其強度值做壓縮時，對於所訓練而得的模型其效能是否有明顯的改變，我們採用的壓縮運算主要是對頻譜強度取一個小於一的正冪次方值，或對頻譜強度取其對數值。當在VoiceBank-DEMAND之資料集上進行評估實驗時，初步結果表明，上述之壓縮運算為取正冪次方值時，其損失函數能使所學習的DEMUCS模型比原DEMUCS模型更有效地提升測試語音的客觀品質與可讀性指標(PESQ與STOI)，充分顯示引入次方壓縮性的頻譜強度於損失函數中能得到語音強化效能更佳的DEMUCS模型。相較而言，當壓縮運算為對數函數時，則沒有改進的效果。
英文摘要	This study aims to improve a highly effective speech enhancement technique, DEMUCS, by revising the respective loss function in learning. DEMUCS, developed by Facebook Team, is built on the Wave-UNet and consists of convolutional layer encoding and decoding blocks with an LSTM layer in between. Although DEMUCS processes the input speech utterance purely in the time (wave) domain, the applied loss function consists of wavedomain L1 distance and multi-scale shorttime- Fourier-transform (STFT) loss. That is, both time- and frequency-domain features are taken into consideration in the learning of DEMUCS. In this study, we present revising the STFT loss in DEMUCS by employing the compressed magnitude spectrogram. The compression is done by either the powerlaw operation with a positive exponent less than one, or the logarithmic operation. We evaluate the presented novel framework on the VoiceBank-DEMAND database and task. The preliminary experimental results suggest that DEMUCS containing the power-law compressed magnitude spectral loss outperforms the original DEMUCS by providing the test utterances with higher objective quality and intelligibility scores (PESQ and STOI). Relatively, the logarithm compressed magnitude spectral loss does not benefit DEMUCS. Therefore, we reveal that DEMUCS can be further improved by properly revising the STFT terms of its loss function.
起訖頁	100-106
關鍵詞	語音強化、DEMUCS、短時傅立葉轉換、損失函數、壓縮頻譜損失、對數頻譜距離、感知語音品質、短時語音可讀性、speech enhancement、DEMUCS、STFT、loss function、compressed spectral loss、logarithmic spectral distance、PESQ、STOI
刊名	ROCLING論文集
期數	202212 (2022期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	使用離散小波轉換特徵於Conv-TasNet語音強化模型的初步研究
該期刊-下一篇	以機器學習與規則方法辨識中文民事裁判書結構

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱