結合語音辨認及合成模組之台語語音轉換系統

許文漢; 廖元甫; 王文俊; 潘振銘

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	結合語音辨認及合成模組之台語語音轉換系統
並列篇名	Taiwanese Voice Conversion based on Cascade ASR and TTS Framework
作者	許文漢、廖元甫、王文俊、潘振銘
中文摘要	台語已被聯合國列為瀕危消失語言，急需傳承。因此，本論文研究如何做出一個可以用任何人的聲音，合成出任何台語語句的台語語音合成系統。為達到此目的，我們首先(1)建置一Taiwanese Across Taiwan (TAT) 大規模台文語音語料庫，其共有204位語者，約140小時的語料，其中有兩男兩女，每人約10小時的台語語音合成專用語料。然後(2)基於Tacotron2之語音合成架構，並加上前端中文字轉台羅拼音模組與後端WaveGlow即時語音生成器，建立中文文字轉台語語音合成系統。最後(3)基於串接台語語音辨認與語音合成架構，建置一台語語音轉換系統，並完成同語言：台語對台語語音轉換；以及跨語言：華語對台語語音轉換，兩種台語語音轉換功能。為評估此台語語音轉換系統的成效，我們透過網路公開招募到29位實驗者，進行同語言及跨語言轉換台語語音兩項評分任務，並分別進行針對「自然度」與「相似度」的MOS分數之主觀評測。實驗結果顯示，在同語言部分，若使用目標語者10分鐘，3分鐘與30秒語料進行測試，自然度平均MOS分數依序為3.45分，3.02分與2.23分，相似度平均MOS分數依序為3.38分，2.99分與2.10分；而在跨語言部分，若使用目標語者6分鐘與3分鐘語料進行測試，自然度平均MOS分數依序為2.90分與2.70分，相似度平均MOS分數依序為2.84分與2.54分。由實驗結果，可以顯示我們確實初步達成一個可以用任何人的聲音，合成出任何台語語句的台語語音合成系統。
英文摘要	Taiwanese has been listed as an endangered language by the United Nations and is urgent for passing on. Therefore, this study wants to find out how to make a Taiwanese speech synthesis system that can synthesize any Taiwanese sentences via anyone's voice. To achieve this goal, we first (1) built a large-scale Taiwanese Across Taiwan (TAT) corpus, with in total of 204 speakers and about 140 hours of speech. Among those speakers, two men and women, each one has especially about 10 hours of speech recorded for the purpose of speech synthesis, then (2) establish a Chinese Text-to-Taiwanese speech synthesis system based on the Tacotron2 speech synthesis architecture, plus with a frontend sequence-to-sequence-based Chinese characters to Taiwan Minnanyu Luomazi Pinyin (shortened as Tâi-lô) machine translation module and the backend WaveGlow real-time speech generator, and finally, (3) constructed a Taiwanese voice conversion system based on the concatenated speech recognition and speech synthesis framework where two voice conversion functions had been implemented including (1) same-language: Taiwanese to Taiwanese voice conversion, and (2) multi-language: Chinese to Taiwanese voice conversion. In order to evaluate the Taiwanese voice conversion system, we publically recruited 29 subjects from the Internet to conduct two kinds of scoring task: same-language and cross-language voice conversion and carried out the subjective ''naturalness'' and ''similarity'' mean opinion score (MOS) evaluations respectively. The test result shows that in the Intra-lingual session, the average naturalness MOS is 3.45, 3.02 and 2.23 points, and average similarity MOS score’s 3.38, 2.99 and 2.10 points while using 10 minutes, 3 minutes, and 30 seconds target speech, respectively; in cross-lingual part, the average naturalness MOS score is 2.90 and 2.70 points; average similarity MOS score is 2.84 and 2.54 points while using 6 minutes and 3 minutes target speech, respectively. From those results, it shows that our proposed system indeed could synthesize any Taiwanese sentences via anyone's voice.
起訖頁	89-137
關鍵詞	台文語音語料庫、台語語音合成、台語語音轉換、Taiwanese Across Taiwan、Taiwanese Speech Synthesis、Taiwanese Voice Conversion
刊名	中文計算語言學期刊
期數	202212 (27:2期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	A Chinese Dimensional Valence-Arousal-Irony Detection on Sentence-level and Context-level Using Deep Learning Model