月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
結合鑑別式訓練聲學模型之類神經網路架構及優化方法的改進
並列篇名
Leveraging Discriminative Training and Improved Neural Network Architecture and Optimization Method
作者 趙偉成張修瑞羅天宏陳柏琳
中文摘要
本論文探討聲學模型上的改進對於大詞彙連續中文語音辨識的影響。在基礎聲學模型的訓練上,有別於以往語音辨識通常使用交互熵(Cross Entropy)作為深度類神經網路目標函數,我們使用Lattice-free Maximum Mutual Information(LF-MMI)做為序列式鑑別訓練的目標函數。LF-MMI使得能夠藉由圖形處理器(Graphical Processing Unit, GPU)上快速地進行前向後向運算,並且找出所有可能路徑的後驗機率,省去傳統鑑別式訓練前需要提前生成詞圖(Word Lattices)的步驟。針對這樣的訓練方式,類神經網路的部分通常使用所謂的時間延遲類神經網路(Time-Delay Neural Network, TDNN)做為聲學模型可達到不錯的辨識效果。因此,本篇論文將基於TDNN模型加深類神經網路層數,並藉由半正交低秩矩陣分解使得深層類神經網路訓練過程更加穩定。另一方面,為了增加模型的一般化能力(Generalization Ability),我們使用來回針法(Backstitch)的優化算法。在中文廣播新聞的辨識任務顯示,上述兩種改進方法的結合能讓TDNN-LF-MMI的模型在字錯誤率(Character Error Rate, CER)有相當顯著的降低。
英文摘要
This paper sets out to investigate the effect of acoustic modeling on Mandarin large vocabulary continuous speech recognition (LVCSR). In order to obtain more discriminative baseline acoustic models, we adopt the recently proposed lattice-free maximum mutual information (LF-MMI) criterion as the objective for sequential training of component neural networks in replace of the conventional cross entropy criterion. LF-MMI brings the benefit of efficient forward-backward statistics accumulation on top of the graphical processing unit (GPU) for all hypothesized word sequences without the need of an explicit word lattice generation process. Paired with LF-MMI, the component neural networks of acoustic models implemented with the so-called time-delay neural network (TDNN) often lead to impressive performance. In view of the above, we explore an integration of two novel extensions of acoustic modeling. One is to conduct semi-orthogonal low-rank matrix factorization on the TDNN-based acoustic models with deeper network layers to increase their robustness. The other is to integrate the backstitch mechanism into the update process of acoustic models for promoting the level of generalization. Extensive experiments carried out on a Mandarin broadcast news transcription task reveal that the integration of these two novel extensions of acoustic modeling can yield considerably improvements over the baseline LF-MMI in terms of character error rate (CER) reduction.
起訖頁 35-46
關鍵詞 中文大詞彙連續語音辨識聲學模型鑑別式訓練矩陣分解來回針法Mandarin Large Vocabulary Continuous Speech RecognitionAcoustic ModelDiscriminative TrainingMatrix FactorizationBackstitch
刊名 中文計算語言學期刊  
期數 201812 (23:2期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 結合鑑別式訓練與模型合併於半監督式語音辨識之研究
該期刊-下一篇 Supporting Evidence Retrieval for Answering Yes/No Questions
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄