  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

Research on Hakka Word Segmentation Processes in Chinese-to-Hakka Text-to-Speech System
作者 黃豐隆余明興林昕緯林義証
語言(Language)是文化傳承與推廣的首要工具,尤其是少數族群的語言,如:台灣的客語或原住民語言。臺灣的客家族群約佔總人口七分之一,為閩南語語系外之第二大族群。根據近年來相關臺灣客語使用狀況調查報告指出,阻礙客語傳承之主因是:不太會講。由於台灣學習環境使然,導致連客籍家庭的學童亦少能以客語說話、交談,具有聽、說客語能力者逐年下降,能說客語的人口大量減少,台灣出現客語失聲、客家文化失傳之危機。我們為了建置線上客語的數位學習系統,已開發出以大量合成單元為基礎的客語四縣腔及海陸腔的中文轉客文的文轉音系統(Hakka Text-to-Speech, HTTS),以及相關的應用系統,如:線上國客雙語有聲詞典、國客雙語有聲地圖社群系統…等。我們的系統,主要是提供不太會講客語或不會講客語的使用者來使用、學習客語。因此系統的輸入為「中文文句」,輸出為「客語語音」。這樣的操作設計,學習者或使用者能不需額外再學習客語輸入法、客語拼音,只需使用最熟悉的中文,即可透過本系統來學習客語。為了更進一步改善與提升文轉音的效果,本論著重在改善系統中的客語文句分析模組的客語斷詞處理。在系統中,使用者輸入中文文句後,透過我們提出的客語斷詞方法,能將「中文文句」轉換為「客語文句及斷詞和詞性標記結果」。透過這個提升後的斷詞與詞性標記結果,來得到更佳的文句分析結果、提升文轉音中的文意正確性,如:韻律階層的求取、停頓類型的求取及讀音的求取。本論文提出混合型的N-Gram序列分數算法,搭配中文斷詞模組及動態規劃演算法的客語斷詞方法。在嚴重資料稀疏的客語語料下,對中文轉客語斷詞結果的精確率有80.78%。相較於傳統中文詞直翻客語詞的方法,已提升不少。
Language is a major tool for cultural inheritance especially for the minority nationality, for example Hakka and aborigine language in Taiwan. As second ethnic besides Minnan dialect, the population of Hakka in Taiwan is one seventh. According to the recently reports of Hakka usage survey in Taiwan, the difficulties to inherit the culture of Hakka is missed in spoken Hakka language, the reason is the environments for learning and has led to the results of descending population for communicating by Hakka. It will become crucial for the cultural inheritance of Hakka. Therefore, we has developed the Text-to-Speech method and system for Hakka language, and our goal is building environments for leaning the Hakka language, our some applied system such as: “Web Hakka Phonetic Dictionary” and “Blogging System of Bilingual Language by Integrating Mobile Cells and Google Map”,etc. Our system is provided for users who interested in Hakka language, who can input the Chinese texts and system will output the speech of Hakka, users need not to learn the typing and phonetic writing of Hakka, and can take the advantage to learning Hakka with familiar language. For the advanced improvements of Hakka Text-to-Speech, this article will emphasis on the word segmentation processing of Hakka text. In our system, when user enter the Chinese text, our proposed methods can convert the Chinese text to Hakka text and assign the part-of-speech for each Hakka text segments. By the better performance of text segments and part-of-speech in Hakka, We can improvements the Hakka text analysis module. We proposed an hybrid N-gram sequence score, and Chinese word segmentation module developed by the dynamic programming algorithm, in the data-sparseness of Hakka corpus, the accuracy of Chinese to Hakka word segmentation is 80.78%.
起訖頁 58-77
關鍵詞 Hakka Text-to-SpeechHakka Word SegmentationDynamic ProgrammingHakka Text Analysis
刊名 ROCLING論文集  
期數 2014 (2014期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 利用核依賴估計來進行多軌自動混音
該期刊-下一篇 基於發音知識以建構頻譜HMM之國語語音合成方法




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄