探究文本提示於端對端發音訓練系統之應用

鄭宇森; 羅天宏; 陳柏琳

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	探究文本提示於端對端發音訓練系統之應用
並列篇名	Exploiting Text Prompts for the Development of an End-to-End Computer-Assisted Pronunciation Training System
作者	鄭宇森、羅天宏、陳柏琳
中文摘要	"近年來，電腦輔助發音訓練（Computer assisted pronunciation training, CAPT）系統的需求日益上升。然而，現階段基於端對端（End-to-End）類神經網路架構之系統在錯誤發音檢測（Mispronunciation detection）的效能仍未臻完美，其原因是此類系統的內部模型本質上仍是屬於自動語音辨識（Automatic speech recognition, ASR）模型。ASR目的是儘量正確地辨識出語者所說內容，縱使其發音是有偏誤的；而CAPT目的恰巧相反，是要能儘量正確地偵測出語者的錯誤發音。有鑒於此，本論文基於CAPT任務通常會有文本提示的特殊性，嘗試將文本提示資訊融入於端對端模型架構。我們研究使用兩個編碼器（Encoders）分別處理發音特徵以及文本特徵，並以分層式注意力機制（Hierarchical attention mechanism, HAN）來動態地結合不同編碼器產生特徵表示。本論文在一套華語學習者語料庫進行一系列實驗；透過不同評估準則所獲得結果顯示，我們所提出的方法較現有方法有較佳的錯誤發音檢測效能。"
英文摘要	More recently, there is a growing demand for the development of computer assisted pronunciation training (CAPT) systems, which can be capitalized to automatically assess the pronunciation quality of L2 learners. However, current CAPT systems that build on end-to-end (E2E) neural network architectures still fall short of expectation for the detection of mispronunciations. This is partly because most of their model components are simply designed and optimized for automatic speech recognition (ASR), but are not specifically tailored for CAPT. Unlike ASR that aims to recognize the utterance of a given speaker (even when poorly pronounced) as correctly as possible, CAPT manages to detect pronunciation errors as subtlety as possible. In view of this, we seek to develop an E2E neural CAPT method that makes use of two disparate encoders to generate embedding of an L2 speaker's test utterance and the corresponding canonical pronunciations in the given text prompt, respectively. The outputs of the two encoders are fed into a decoder through a hierarchical attention mechanism (HAM), with the purpose to enable the decoder to focus more on detecting mispronunciations. A series of experiments conducted on an L2 Mandarin Chinese speech corpus have demonstrated the effectiveness of our method in terms of different evaluation metrics, when compared with some state-of-the-art E2E neural CAPT methods.
起訖頁	1-14
關鍵詞	端對端語音辨識、電腦輔助發音訓練、分層式注意力機制、發音檢測、發音診斷、end-to-end speech recognition、Computer assisted pronunciation training、hierarchical attention mechanism、mispronunciation detection、mispronunciation diagnosis
刊名	ROCLING論文集
期數	2020 (2020期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	情感分析於投資溫度評分之應用
該期刊-下一篇	基於多視角注意力機制語音增強模型於強健性自動語音辨識