中文摘要 |
The output of Chinese word segmentation can vary according to different
linguistic definitions of words and different engineering requirements, and no
single standard can satisfy all linguists and all computer applications. Most of
the disagreements in language processing come from the segmentation of
morphologically derived words (MDWs). This paper presents a system that can
be conveniently customized to meet various user-defined standards in the
segmentation of MDWs. In this system, all MDWs contain word trees where the
root nodes correspond to maximal words and leaf nodes to minimal words.
Each non-terminal node in the tree is associated with a resolution parameter
which determines whether its daughters are to be displayed as a single word or
separate words. Different outputs of segmentation can then be obtained from
the different cuts of the tree, which are specified by the user through the
different value combinations of those resolution parameters. We thus have a
single system that can be customized to meet different segmentation
specifications. |