中文轉客文文轉音系統中的客語斷詞處理之研究

黃豐隆; 余明興; 林昕緯; 林義証

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	中文轉客文文轉音系統中的客語斷詞處理之研究
並列篇名	Research on Hakka Word Segmentation Processes in Chinese-to-Hakka Text-to-Speech System
作者	黃豐隆、余明興、林昕緯、林義証
中文摘要	語言（Language）是文化傳承與推廣的首要工具，尤其是少數族群的語言，如：台灣的客語或原住民語言。臺灣的客家族群約佔總人口七分之一，為閩南語語系外之第二大族群。根據近年來相關臺灣客語使用狀況調查報告指出，阻礙客語傳承之主因是：不太會講。由於台灣學習環境使然，導致連客籍家庭的學童亦少能以客語說話、交談，具有聽、說客語能力者逐年下降，能說客語的人口大量減少，台灣出現客語失聲、客家文化失傳之危機。我們為了建置線上客語的數位學習系統，已開發出以大量合成單元為基礎的客語四縣腔及海陸腔的中文轉客文的文轉音系統（Hakka Text-to-Speech, HTTS），以及相關的應用系統，如：線上國客雙語有聲詞典、國客雙語有聲地圖社群系統…等。我們的系統，主要是提供不太會講客語或不會講客語的使用者來使用、學習客語。因此系統的輸入為「中文文句」，輸出為「客語語音」。這樣的操作設計，學習者或使用者能不需額外再學習客語輸入法、客語拼音，只需使用最熟悉的中文，即可透過本系統來學習客語。為了更進一步改善與提升文轉音的效果，本論著重在改善系統中的客語文句分析模組的客語斷詞處理。在系統中，使用者輸入中文文句後，透過我們提出的客語斷詞方法，能將「中文文句」轉換為「客語文句及斷詞和詞性標記結果」。透過這個提升後的斷詞與詞性標記結果，來得到更佳的文句分析結果、提升文轉音中的文意正確性，如：韻律階層的求取、停頓類型的求取及讀音的求取。本論文提出混合型的N-Gram序列分數算法，搭配中文斷詞模組及動態規劃演算法的客語斷詞方法。在嚴重資料稀疏的客語語料下，對中文轉客語斷詞結果的精確率有80.78%。相較於傳統中文詞直翻客語詞的方法，已提升不少。
英文摘要	Language is a major tool for cultural inheritance especially for the minority nationality, for example Hakka and aborigine language in Taiwan. As second ethnic besides Minnan dialect, the population of Hakka in Taiwan is one seventh. According to the recently reports of Hakka usage survey in Taiwan, the difficulties to inherit the culture of Hakka is missed in spoken Hakka language, the reason is the environments for learning and has led to the results of descending population for communicating by Hakka. It will become crucial for the cultural inheritance of Hakka. Therefore, we has developed the Text-to-Speech method and system for Hakka language, and our goal is building environments for leaning the Hakka language, our some applied system such as: “Web Hakka Phonetic Dictionary” and “Blogging System of Bilingual Language by Integrating Mobile Cells and Google Map”,etc. Our system is provided for users who interested in Hakka language, who can input the Chinese texts and system will output the speech of Hakka, users need not to learn the typing and phonetic writing of Hakka, and can take the advantage to learning Hakka with familiar language. For the advanced improvements of Hakka Text-to-Speech, this article will emphasis on the word segmentation processing of Hakka text. In our system, when user enter the Chinese text, our proposed methods can convert the Chinese text to Hakka text and assign the part-of-speech for each Hakka text segments. By the better performance of text segments and part-of-speech in Hakka, We can improvements the Hakka text analysis module. We proposed an hybrid N-gram sequence score, and Chinese word segmentation module developed by the dynamic programming algorithm, in the data-sparseness of Hakka corpus, the accuracy of Chinese to Hakka word segmentation is 80.78%.
起訖頁	58-77
關鍵詞	Hakka Text-to-Speech、Hakka Word Segmentation、Dynamic Programming、Hakka Text Analysis
刊名	ROCLING論文集
期數	2014 (2014期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	利用核依賴估計來進行多軌自動混音
該期刊-下一篇	基於發音知識以建構頻譜HMM之國語語音合成方法

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱