會議語音辨識使用語者資訊之語言模型調適技術

陳映文; 羅天宏; 張修瑞; 趙偉成; 陳柏琳

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	會議語音辨識使用語者資訊之語言模型調適技術
並列篇名	On the Use of Speaker-Aware Language Model Adaptation Techniques for Meeting Speech Recognition
作者	陳映文、羅天宏、張修瑞、趙偉成、陳柏琳
中文摘要	本論文試圖減緩會議語音辨識時語者間用語特性不同所造成的問題。多個語者的存在可能代表有多種的語言模式；更進一步地說，人們在講話時並沒有嚴格地遵循文法，而且通常會有說話延遲、停頓或個人慣用語以及其它獨特的說話方式。但是，過去會議語音辨識中的語言模型大都不會針對不同的語者進行調整，而是假設不同的語者間擁有相同的語言使用模式，將包含多個語者的文字轉寫合成一個訓練集，藉此訓練單一的語言模型。為突破此假設，本研究希望針對不同語者為語言模型的訓練和預測提供額外的資訊，即是語言模型的語者調適。本論文考慮兩種測試階段的情境一「已知語者」和「未知語者」，並提出了對應此兩種情境的語者特徵擷取方法，以及探討如何利用語者特徵來輔助語言模型的訓練。我們分別在中文和英文會議語音辨識任務進行一系列語言模型的語者調適實驗，其結果顯示本論文所提出的語言模型無論是在已知語者，還是未知語者情境下都有良好的表現，並且比基礎類神經網路語言模型有較佳的效能。
英文摘要	This paper embarks on alleviatingThis paper embarks on alleviating the problems caused by a multiple-speaker situation occurring frequently in a meeting for improved automatic speech recognition (ASR). There are a wide variety of ways for speakers to utter in the multiple-speaker situation. That is to say, people do not strictly follow the grammar when speaking and usually have a tendency to stutter while speaking, or often use personal idioms and some unique ways of speaking. Nevertheless, the existing language models employed in automatic transcription of meeting recordings rarely account for these facts but instead assume that all speakers participating in a meeting share the same speaking style or word-usage behavior. In turn, a single language model is built with all the manual transcripts of utterances compiled from multiple speakers that were taken holistically as the training set. To relax such an assumption, we endeavor to augment additional information cues into the training phase and the prediction phase of language modeling to accommodate the variety of speaker-related characteristics, through the process of speaker adaptation for language modeling. To this end, two disparate scenarios, i.e., 'known speakers' and 'unknown speakers,' for the prediction phase are taken into consideration for developing methods to extract speaker-related information cues to aid in the training of language models. Extensive experiments respectively carried out on automatic transcription of Mandarin and English meeting recordings show that the proposed language models along with different mechanisms for speaker adaption achieve good performance gains in relation to the baseline neural network based language model compared in this study.
起訖頁	46-60
關鍵詞	會議語音辨識、語言模型、語者調適、遞迴式類神經網路、speech recognition、language modeling、speaker adaptation、recurrent neural networks
刊名	ROCLING論文集
期數	2018 (2018期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	使用性別資訊於語者驗證系統之研究與實作
該期刊-下一篇	繁體中文依存句法剖析器