調變頻分解技術於強健語音辨識之研究

張庭豪; 洪孝宗; 陳冠宇; 王新民; 陳柏琳

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	調變頻分解技術於強健語音辨識之研究
並列篇名	Investigating Modulation Spectrum Factorization Techniques for Robust Speech Recognition
作者	張庭豪、洪孝宗、陳冠宇 (Guan-Yu Chen)、王新民、陳柏琳
中文摘要	自動語音辨識(Automatic Speech Recognition, ASR)系統常因環境變異而導致效能嚴重地受影響；所以長久以來語音強健(Robustness)技術的發展是一個極為重要且熱門的研究領域。本論文旨在探究語音強健性技術，希望能透過有效的語音特徵調變頻譜處理來求取較具強健性的語音特徵。為此，我們使用非負矩陣分解(Nonnegative Matrix Factorization, NMF)以及一些改進方法來正規化調變頻譜強度成分，藉以獲得較具強健性的語音特徵。本論文有下列幾項貢獻。首先，結合稀疏性的概念，期望能夠求取到具調變頻譜局部性的資訊以及重疊較少的NMF基底向量表示。其次，基於局部不變性的概念，希望發音內容相似的語句之調變頻譜強度成分，在NMF空間有越相近的向量表示以維持語句間的關聯程度。再者，在測試階段經由正規化NMF之編碼向量，更進一步提升語音特徵之強健性。最後，我們也結合上述三種NMF的改進方法。本論文的所有實驗皆於國際通用的標竿語料──Aurora-2連續數字資料庫進行；實驗結果顯示相較於僅使用梅爾倒頻譜特徵之基礎實驗，我們所提出的改進方法皆能顯著地降低語音辨識錯誤率。此外，我們也嘗試將所提出的改進方法與一些知名的特徵強健技術做比較和結合，以驗證這些改進方法之實用性。
英文摘要	The performance of an automatic speech recognition (ASR) system often deteriorates sharply due to the interference from varying environmental noise. As such, the development of effective and efficient robustness techniques has long been a challenging research subject in the ASR community. In this article, we attempt to obtain noise-robust speech features through modulation spectrum processing of the original speech features. To this end, we explore the use of nonnegative matrix factorization (NMF) and its extensions on the magnitude modulation spectra of speech features so as to distill the most important and noise-resistant information cues that can benefit the ASR performance. The main contributions include three aspects: 1) we leverage the notion of sparseness to obtain more localized and parts-based representations of the magnitude modulation spectra with fewer basis vectors; 2) the prior knowledge of the similarities among training utterances is taken into account as an additional constraint during the NMF derivation; and 3) the resulting encoding vectors of NMF are further normalized so as to further enhance their robustness of representation. A series of experiments conducted on the Aurora-2 benchmark task demonstrate that our methods can deliver remarkable improvements over the baseline NMF method and achieve performance on par with or better than several widely-used robustness methods.
起訖頁	87-105
關鍵詞	語音辨識、雜訊、強健性、調變頻譜、非負矩陣分解、Speech Recognition、Language Model、Concept Information、Model Adaptation
刊名	中文計算語言學期刊
期數	201512 (20:2期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	節錄式語音文件摘要使用表示法學習技術
該期刊-下一篇	透過語音特徵建構基於堆疊稀疏自編碼器演算法之婚姻治療中夫妻互動行為量表自動化評分系統

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱