英文摘要 |
Previous approaches to Chinese word segmentation includes maximal matching heuristic, morphological rules, and POS tag statistics. This paper proposes to estimate the word occurrence probabilities with some 'unlikelihood' scores based only on word lengths. Also, the problem of maximizing likelihood is shown to be equivalent to the graph problem of shortest path, whose edges stands for words with their corresponding unlikelihood scores. |