英文摘要 |
LangGeh orthography is a new writing style proposed by. For Han family languages such Taiwanese or Mandarin that uses Chinese character, LangGeh proposes writing with spaces in-between, using simple short phrase as a unit. This is in contrast to word-based orthography in English and sentence-based orthography in traditional Mandarin. Easy to add spaces, LangGeh has the advangtages of reducing ambiguity, easier to read, and easier for text processing in Chinese characters. Using the LangGeh orthography, we produce a parallel corpus in Taiwanese and Madarin, about 150 thousand characters each. We then explore the extraction of “phrase dictionary” from the parallel corpus, and begin the study of statistical translation between Taiwanese and Mandarin. |