中文摘要 |
錯誤發音檢測(mispronunciation detection)為電腦輔助發音訓練(computer assisted pronun-ciation training, CAPT)研究中十分重要的一個環節,其目的是回饋給語言學習者是否在其讀誦一段話中的出現錯誤發音。一般而言,錯誤發音檢測流程可分為兩部分:1)前端特徵擷取模組,基於學習者所念誦的音素或語句段落和聲學模型(acoustic model)的比對以擷取對應的具有鑑別性之發音檢測特徵;2)後端分類模組,基於所求得發音檢測特徵,判斷音素或語句段落所歸屬類別(正確發音或錯誤發音)。在本篇論文延續錯誤發音檢測研究而主要有三項貢獻:1)比較並結合當前基於深層類神經網路(deep neural networks, DNN)與摺積類神經網路(convolutional neuron networks, CNN)之先進的聲學模型以產生更具鑑別性發音檢測特徵;2)我們比較並結合不同分類方法,以期能達到更佳的發音檢測表現;3)針對錯誤發音檢測所包括的模組,進行一系列廣泛且深入的實驗分析與討論。從一套以華語做為第二語學習目標語言的大量語料庫之實驗結果顯示,我們所提出融合多種深層類神經網路聲學模型與分類技術的方法的確能較基礎方法有顯著的效能提升。 |
英文摘要 |
Automatic mispronunciation detection plays a crucial role in a computer assisted pronunciation training (CAPT) system. The main purpose of mispronunciation detection is to judge whether the pronunciations of a non-native speaker are correct or not. In general, the process of mispronunciation detection can be divided into two parts: 1) a front-end feature extraction module that generates pronunciation detection features based on an input speech segment and its associated reference acoustic models; and 2) a back-end classification module that determines the correctness of the pronunciation of the speech segment according to the output of a classifier that takes the pronunciation detection features of the segment as the input. The main contributions of this work are three-fold. First, we investigate the use of two state-of-the-art acoustic models, respectively based on deep neural networks (DNN) and convolutional neural networks (CNN), and compare their effectiveness for the extraction of discriminative pronunciation detection features. Second, we experiment with different types of classification methods and propose a novel integration of DNN- and CNN-based decision scores at the back-end. Third, we provide an extensive set of empirical evaluations on the aforementioned two modules and associated methods based on a recently compiled corpus for learning Mandarin Chinese as the second language. The experimental results reveal the performance utility of our approach in relation to several existing baselines. |