Strategies of Processing Japanese Names and Character Variants in Traditional Chinese Text

Chuan-Jie Lin; Jia-Cheng Zhan; Yen-Heng Chen; Chien-Wei Pao

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Strategies of Processing Japanese Names and Character Variants in Traditional Chinese Text
作者	Chuan-Jie Lin (Chuan-Jie Lin)、Jia-Cheng Zhan (Jia-Cheng Zhan)、Yen-Heng Chen (Yen-Heng Chen)、Chien-Wei Pao (Chien-Wei Pao)
中文摘要	This paper proposes an approach to identify word candidates that are not Traditional Chinese, including Japanese names (written in Japanese Kanji or Traditional Chinese characters) and word variants, when doing word segmentation on Traditional Chinese text. When handling personal names, a probability model concerning formats of names is introduced. We also propose a method to map Japanese Kanji into the corresponding Traditional Chinese characters. The same method can also be used to detect words written in character variants. After integrating generation rules for various types of special words, as well as their probability models, the F-measure of our word segmentation system rises from 94.16% to 96.06%. Another experiment shows that 83.18% of the 862 Japanese names in a set of 109 human-annotated documents can be successfully detected.
起訖頁	87-108
關鍵詞	Semantic Chinese Word Segmentation、Japanese Name Identification、Character Variants
刊名	中文計算語言學期刊
期數	201209 (17:3期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Enhancement of Feature Engineering for Conditional Random Field Learning in Chinese Word Segmentation Using Unlabeled Data
該期刊-下一篇	Evaluation of TTS Systems in Intelligibility and Comprehension Tasks: a Case Study of HTS-2008 and Multisyn Synthesizers