英文摘要 |
This paper aims to further probe into the problems of ambiguities in automatic identification of determinative-measure compounds (DMs) in Chinese. It is known that Chinese DMs are identifiable by regular expression rules. However, rule matching only partially solve structural and lexical ambiguities. In this paper, a deep analyses based on corpus data was studied. With the subtle analyses of error identification and disambiguation of DM compounds, we classified three types of ambiguities, i.e. structural, sense, and functional ambiguities. We also proposed resolution principles to eliminate the problems and thus to improve word segmentation and POS (Part-Of-Speech) tagging. |