使用離散小波轉換特徵於Conv-TasNet語音強化模型的初步研究

Yan-Tong Chen; Zong-Tai Wu; Jeih-Weih Hung

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	使用離散小波轉換特徵於Conv-TasNet語音強化模型的初步研究
並列篇名	A Preliminary Study of the Application of Discrete Wavelet Transform Features in Conv-TasNet Speech Enhancement Model
作者	Yan-Tong Chen (Yan-Tong Chen)、Zong-Tai Wu (Zong-Tai Wu)、Jeih-Weih Hung (Jeih-Weih Hung)
中文摘要	當前基於深度類神經網路架構之語音強化模型，常使用時域特徵來加以學習其模型參數，時域特徵如同經典的頻域特徵一般，能夠使所得模型達到優異的語音強化效果。基於此概念，本研究主要是探討如何從時域的語音中提取資訊、以在語音強化中創建更有效的特徵。我們提出了在時域中擷取短時間的子頻帶信號，並將它們融合成為單一特徵。具體方法是應用離散小波變換對每個輸入的音框信號進行分解、以獲得子頻帶信號，並對這些信號進行投影融合處理以創建最終小波域特徵。對應的融合處理法稱為雙投影融合(bi-projection fusion, BPF)法。同時，我們將藉由離散小波轉換之融合小波域特徵與原始時域特徵加以整合、來學習一高效的語音強化網路：全卷積時域音頻分離網路(Conv-TasNet)，藉此來強化受雜訊干擾的語音訊號、提升其品質與可讀性。我們在VoiceBank-DEMAND與VoiceBank-QUT兩個語音強化資料集上進行了評估實驗，初步結果表明，所提出的方法比原始單純使用時域特徵的Conv-TasNet實現了更高的客觀語音品質和可讀性指標，表明融合小波域特徵可以輔助原時域特徵、從輸入的雜訊語音中學習一個更有效的Conv-TasNet網路、達到更佳的語音強化效果。
英文摘要	Nowadays, time-domain features have been widely used in speech enhancement (SE) networks like frequency-domain features to achieve excellent performance in eliminating noise from input utterances. This study primarily investigates how to extract information from time-domain utterances to create more effective features in speech enhancement. We present employing sub-signals dwelled in multiple acoustic frequency bands in time domain and integrating them into a unified featureset. We propose using the discrete wavelet transform (DWT) to decompose each input frame signal to obtain sub-band signals, and a projection fusion process is performed on these signals to create the ultimate features. The corresponding fusion strategy is the bi-projection fusion (BPF). In short, BPF exploits the sigmoid function to create ratio masks for two feature sources. The concatenation of fused DWT features and time features serves as the encoder output of a celebrated SE framework, fully-convolutional time-domain audio separation network (Conv-TasNet), to estimate the mask and then produce the enhanced time-domain utterances. The evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks. The experimental results reveal that the proposed method achieves higher speech quality and intelligibility than the original Conv-TasNet that uses time features only, indicating that the fusion of DWT features created from the input utterances can benefit time features to learn a superior Conv-TasNet in speech enhancement.
起訖頁	92-99
關鍵詞	語音強化、離散小波轉換、跨域、雙投影融合、全卷積時頻分離網路、speech enhancement、discrete wavelet transform、cross-domain、temporal speech sequence、Conv-TasNet、bi-projection fusion
刊名	ROCLING論文集
期數	202212 (2022期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Using Grammatical and Semantic Correction Model to Improve Chinese-to-Taiwanese Machine Translation Fluency
該期刊-下一篇	藉由壓縮性之頻譜損失函數以學習DEMUCS語音強化模型之初步研究

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱