融合多種深層類神經網路聲學模型與分類技術於華語錯誤發音檢測之研究

許曜麒; 楊明翰; 洪孝宗; 熊玉雯; 宋曜廷; 陳柏琳

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	融合多種深層類神經網路聲學模型與分類技術於華語錯誤發音檢測之研究
並列篇名	Exploring Combinations of Various Deep Neural Network based Acoustic Models and Classification Techniques for Mandarin Mispronunciation Detection
作者	許曜麒、楊明翰、洪孝宗、熊玉雯、宋曜廷 (Yao-Ting Sung)、陳柏琳
中文摘要	錯誤發音檢測（mispronunciation detection）為電腦輔助發音訓練（computer assisted pronun-ciation training, CAPT）研究中十分重要的一個環節，其目的是回饋給語言學習者是否在其讀誦一段話中的出現錯誤發音。一般而言，錯誤發音檢測流程可分為兩部分：1）前端特徵擷取模組，基於學習者所念誦的音素或語句段落和聲學模型（acoustic model）的比對以擷取對應的具有鑑別性之發音檢測特徵；2）後端分類模組，基於所求得發音檢測特徵，判斷音素或語句段落所歸屬類別（正確發音或錯誤發音）。在本篇論文延續錯誤發音檢測研究而主要有三項貢獻：1）比較並結合當前基於深層類神經網路（deep neural networks, DNN）與摺積類神經網路（convolutional neuron networks, CNN）之先進的聲學模型以產生更具鑑別性發音檢測特徵；2）我們比較並結合不同分類方法，以期能達到更佳的發音檢測表現；3）針對錯誤發音檢測所包括的模組，進行一系列廣泛且深入的實驗分析與討論。從一套以華語做為第二語學習目標語言的大量語料庫之實驗結果顯示，我們所提出融合多種深層類神經網路聲學模型與分類技術的方法的確能較基礎方法有顯著的效能提升。
英文摘要	Automatic mispronunciation detection plays a crucial role in a computer assisted pronunciation training (CAPT) system. The main purpose of mispronunciation detection is to judge whether the pronunciations of a non-native speaker are correct or not. In general, the process of mispronunciation detection can be divided into two parts: 1) a front-end feature extraction module that generates pronunciation detection features based on an input speech segment and its associated reference acoustic models; and 2) a back-end classification module that determines the correctness of the pronunciation of the speech segment according to the output of a classifier that takes the pronunciation detection features of the segment as the input. The main contributions of this work are three-fold. First, we investigate the use of two state-of-the-art acoustic models, respectively based on deep neural networks (DNN) and convolutional neural networks (CNN), and compare their effectiveness for the extraction of discriminative pronunciation detection features. Second, we experiment with different types of classification methods and propose a novel integration of DNN- and CNN-based decision scores at the back-end. Third, we provide an extensive set of empirical evaluations on the aforementioned two modules and associated methods based on a recently compiled corpus for learning Mandarin Chinese as the second language. The experimental results reveal the performance utility of our approach in relation to several existing baselines.
起訖頁	103-120
關鍵詞	錯誤發音檢測、自動語音辨識、深層類神經網路、摺積類神經網路、Mispronunciation detection、Automatic Speech Recognition、Deep Neural Networks、Convolutional Neural Networks
刊名	ROCLING論文集
期數	2015 (2015期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	調變頻譜分解之改良於強健性語音辨識
該期刊-下一篇	透過語音特徵建構基於堆疊稀疏自編碼器演算法之婚姻治療中夫妻互動行為量表自動化評分系統