英文摘要 |
The handling of out-of-vocabulary (OOV) words is one of the key points to a high performance lexical analysis in natural language processing. Among all OOV words, named entities (NE) are the most productive ones. They generally constitute the most meaningful parts of sentences (persons, affairs, time, places, and objects). In this paper, we propose a three-phase“generation, filtering, and recovery”system to address the NER problem. A set of stochastic models is first used to generate all possible NE candidates. Then we treat candidate filtering as an ambiguity resolution problem. To resolve ambiguities, we adopt a maximal-matching-rule-driven lexical analyzer. Last, a pattern matching method is applied to detect and recover abnormalities in the results of the previous two phases. Pure lexical information is exploited in our system. We get a high recall of 96% with personal names (PER), satisfiable recall of 88%, 89%, and 80% with transliteration names (TRA), location names (LOC), and organization names (ORG), respectively. The overall precision and excluding rate is over 90% and 99%. |