中文摘要 |
Anaphora is a common phenomenon in discourses as well as an important research
issue in the applications of natural language processing. In this paper, anaphora
resolution is achieved by employing WordNet ontology and heuristic rules. The
proposed system identifies both intra-sentential and inter-sentential antecedents of
anaphors. Information about animacy is obtained by analyzing the hierarchical
relations of nouns and verbs in the surrounding context. The identification of
animacy entities and pleonastic-it usage in English discourses are employed to
promote resolution accuracy.
Traditionally, anaphora resolution systems have relied on syntactic, semantic
or pragmatic clues to identify the antecedent of an anaphor. Our proposed method
makes use of WordNet ontology to identify animate entities as well as essential
gender information. In the animacy agreement module, the property is identified by
the hypernym relation between entities and their unique beginners defined in
WordNet. In addition, the verb of the entity is also an important clue used to reduce
the uncertainty. An experiment was conducted using a balanced corpus to resolve
the pronominal anaphora phenomenon. The methods proposed in [Lappin and
Leass, 94] and [Mitkov, 01] focus on the corpora with only inanimate pronouns
such as “it” or “its”. Thus the results of intra-sentential and inter-sentential
anaphora distribution are different. In an experiment using Brown corpus, we found
that the distribution proportion of intra-sentential anaphora is about 60%. Seven
heuristic rules are applied in our system; five of them are preference rules, and two
are constraint rules. They are derived from syntactic, semantic, pragmatic
conventions and from the analysis of training data. A relative measurement
indicates that about 30% of the errors can be eliminated by applying heuristic
module. |