月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
使用離散小波轉換特徵於Conv-TasNet語音強化模型的初步研究
並列篇名
A Preliminary Study of the Application of Discrete Wavelet Transform Features in Conv-TasNet Speech Enhancement Model
作者 Yan-Tong Chen (Yan-Tong Chen)Zong-Tai Wu (Zong-Tai Wu)Jeih-Weih Hung (Jeih-Weih Hung)
中文摘要
當前基於深度類神經網路架構之語音強化模型,常使用時域特徵來加以學習其模型參數,時域特徵如同經典的頻域特徵一般,能夠使所得模型達到優異的語音強化效果。基於此概念,本研究主要是探討如何從時域的語音中提取資訊、以在語音強化中創建更有效的特徵。我們提出了在時域中擷取短時間的子頻帶信號,並將它們融合成為單一特徵。具體方法是應用離散小波變換對每個輸入的音框信號進行分解、以獲得子頻帶信號,並對這些信號進行投影融合處理以創建最終小波域特徵。對應的融合處理法稱為雙投影融合(bi-projection fusion, BPF)法。同時,我們將藉由離散小波轉換之融合小波域特徵與原始時域特徵加以整合、來學習一高效的語音強化網路:全卷積時域音頻分離網路(Conv-TasNet),藉此來強化受雜訊干擾的語音訊號、提升其品質與可讀性。
我們在VoiceBank-DEMAND與VoiceBank-QUT兩個語音強化資料集上進行了評估實驗,初步結果表明,所提出的方法比原始單純使用時域特徵的Conv-TasNet實現了更高的客觀語音品質和可讀性指標,表明融合小波域特徵可以輔助原時域特徵、從輸入的雜訊語音中學習一個更有效的Conv-TasNet網路、達到更佳的語音強化效果。
英文摘要
Nowadays, time-domain features have been widely used in speech enhancement (SE) networks like frequency-domain features to achieve excellent performance in eliminating noise from input utterances. This study primarily investigates how to extract information from time-domain utterances to create more effective features in speech enhancement. We present employing sub-signals dwelled in multiple acoustic frequency bands in time domain and integrating them into a unified featureset. We propose using the discrete wavelet transform (DWT) to decompose each input frame signal to obtain sub-band signals, and a projection fusion process is performed on these signals to create the ultimate features. The corresponding fusion strategy is the bi-projection fusion (BPF). In short, BPF exploits the sigmoid function to create ratio masks for two feature sources. The concatenation of fused DWT features and time features serves as the encoder output of a celebrated SE framework, fully-convolutional time-domain audio separation network (Conv-TasNet), to estimate the mask and then produce the enhanced time-domain utterances.
The evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks. The experimental results reveal that the proposed method achieves higher speech quality and intelligibility than the original Conv-TasNet that uses time features only, indicating that the fusion of DWT features created from the input utterances can benefit time features to learn a superior Conv-TasNet in speech enhancement.
起訖頁 92-99
關鍵詞 語音強化離散小波轉換跨域雙投影融合全卷積時頻分離網路speech enhancementdiscrete wavelet transformcross-domaintemporal speech sequenceConv-TasNetbi-projection fusion
刊名 ROCLING論文集  
期數 202212 (2022期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 Using Grammatical and Semantic Correction Model to Improve Chinese-to-Taiwanese Machine Translation Fluency
該期刊-下一篇 藉由壓縮性之頻譜損失函數以學習DEMUCS語音強化模型之初步研究
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄