月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
基於多視角注意力機制語音增強模型於強健性自動語音辨識
並列篇名
Multi-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition
作者 趙福安洪志偉陳柏琳
中文摘要
"仰賴深度學習(Deep Learning)的發展,近年來許多研究發現相位(Phase)資訊在語音增強(Speech Enhancement, SE)中至關重要。亦有學者發現,透過時域單通道語音增強技術,可以有效地去除雜訊,進而顯著提升語音辨識的精確度。啟發於此,本研究從時域及頻域面分別探討兩種考慮相位資訊的語音增強技術,並提出多視角注意力機制語音增強模型、融合時域及頻域兩者特徵運用於語音辨識中。我們藉由Aishell-1中文語料庫評估這些語音增強技術,透過使用各種雜訊源,模擬不同的雜訊狀態作為訓練及測試,進而驗證所提出的新方法皆優於基於其他時域及頻域的方法。具體而言,當測試於訊噪比為-5dB、5dB、15dB的三種環境下,使用新提出之方法中重新訓練(Retraining)之聲學模型(Acoustic Model, AM),與基於時域的方法相比較,在已知雜訊的測試集,分別使相對字錯誤率下降3.4%、2.5%及1.6%;而在未知雜訊的測試集,則使相對字錯誤率分別下降了3.8%、4.8%及2.2%。"
英文摘要
Recently, many studies have found that phase information is crucial in Speech Enhancement (SE), and time-domain single-channel speech enhancement techniques have been proved effective on noise suppression and robust Automatic Speech Recognition (ASR). Inspired by this, this research investigates two recently proposed SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we propose a novel multi-view attention-based speech enhancement model, which can harness the synergistic power of the aforementioned time-domain and frequency-domain SE methods and can be applied equally well to robust ASR. To evaluate the effectiveness of our proposed method, we use various noise datasets to create some synthetic test data and conduct extensive experiments on the Aishell-1 Mandarin speech corpus. The evaluation results show that our proposed method is superior to some current state-of-the-art time-domain and frequency-domain SE methods. Specifically, compared with the time-domain method, our method achieves 3.4%, 2.5% and 1.6% in relative character error rate (CER) reduction at three signal-to-noise ratios (SNRs), -5 dB, 5 dB and 15 dB, respectively, for the test set of pre-known noise scenarios, while the corresponding CER reductions for the test set of unknown noise scenarios are 3.8%, 4.8% and 2.2%, respectively.
起訖頁 1-16
關鍵詞 語音強化自動語音辨識深度學習單通道語音增強重新訓練聲學模型Speech EnhancementAutomatic Speech RecognitionDeep LearningSingle-Channel Speech EnhancementRe-trainingAcoustic Models
刊名 ROCLING論文集  
期數 2020 (2020期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 探究文本提示於端對端發音訓練系統之應用
該期刊-下一篇 Lectal Variation of the Two Chinese Causative Auxiliaries
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄