中文摘要 |
In this paper we report results of a supervised machine-learning approach to
Chinese word segmentation. A maximum entropy tagger is trained on manually
annotated data to automatically assign to Chinese characters, or hanzi, tags that
indicate the position of a hanzi within a word. The tagged output is then converted
into segmented text for evaluation. Preliminary results show that this approach is
competitive against other supervised machine-learning segmenters reported in
previous studies, achieving precision and recall rates of 95.01% and 94.94%
respectively, trained on a 237K-word training set. |