Design of an Input Method for Taiwanese Hokkien using Unsupervized Word Segmentation for Language Modeling
作者 Pierre Magistry (Pierre Magistry)
This paper presents the challenges and the methodology followed in the design of a new Input Method (IME) for the Taiwanese (Hokkien) language. We first describe the context, the motivations and some of the main issues related to the input of text in Taiwanese on modern computer systems and mobile devices. Then we present the available resources which our system is based on. We will describe the whole architecture of our system. But since the cornerstone of modern IME is the Language Model (LM), the main Natural Language Processing issue on which we will focus in this paper is the estimation of a LM in the case of this under-resourced language. The solution we propose to rely on unsupervised word segmentation which preserves some degree of ambiguity.
關鍵詞 Unsupervized Word SegmentationLanguage ModelingInput MethodTaiwanese
刊名 ROCLING論文集  
期數 2016 (2016期)
出版單位 中華民國計算語言學學會
