英文摘要 |
We deal with the identification of the determinative-measure compounds (DMs) in parsing Mandarin Chinese in this paper. The number of possible DMs is infinite, and cannot be listed exhaustively in a lexicon. However, the set of DMs can be described by regular expressions, and can be recognized by a finite automaton. We propose to identify DMs by regular expression before parsing. After investigating large linguistic data, we find that DMs are formed compositionally and hierarchically from the simpler constituents. Based upon this fact, some grammar rules are constructed to combine determinatives and measures. Moreover, a parser is also formed to implement these rules. By doing so, almost all of the unlisted DMs are recognized. However, if only the DM recognition procedure is fired, many ambiguous results appear, too. Yet with our word segmentation process, these amiguities are greatly reduced. |