英文摘要 |
This paper presents the challenges and the methodology followed in the design of a new Input Method (IME) for the Taiwanese (Hokkien) language. We first describe the context, the motivations and some of the main issues related to the input of text in Taiwanese on modern computer systems and mobile devices. Then we present the available resources which our system is based on. We will describe the whole architecture of our system. But since the cornerstone of modern IME is the Language Model (LM), the main Natural Language Processing issue on which we will focus in this paper is the estimation of a LM in the case of this under-resourced language. The solution we propose to rely on unsupervised word segmentation which preserves some degree of ambiguity. |