中文摘要 |
In this paper we describe a method of classifying facts (information) into categories or levels; where each level signifies a different degree of difficulty of extracting the fact from a piece of text containing it. Based on this classification mechanism, we propose a method of evaluating a domain by assigning to it a “domain number” based on the levels of a set of standard facts present in the articles of that domain. We then use the classification mechanism to analyze the performances of three MUC systems (BBN, NYU, and SRI) based on their ability to extract a set of standard facts (at different levels) from two different MUC domains. This analysis is then extended to analyze the role of coreferencing in the performance of message understanding systems. The evaluation of a domain based on the “domain number” assigned to it is a big step up from methods used earlier (which used vocabulary size, average sentence length, the number of sentences per document, etc.). Moreover, the use of the classification mechanism as a tool to analyze the performance of message understanding systems provides a deeper insight into these systems than the one provided by obtaining the precision and recall statistics of each system. |