英文摘要 |
Semantic Role Labeling (SRL) has significant impact on many application systems, such as Machine Translation, Information Extraction, Question-Answering, Text Summarization and Text Data Mining. Therefore research on SRL is important for natural language understanding, and so far a number of algorithms, mostly statistically oriented, have been proposed in this field. Statistical algorithms must deal with the problem of data sparseness. In our initial study, we found that most words appear only a small number of times, and other words are absent completely in the training set. Only a small number of frequent words supply sufficient data for training. To solve this problem, we developed a backoff model based on HowNet. In this study, we demonstrate the benefit of applying the knowledge from HowNet to Semantic Role Labeling by experimenting with four selected Chinese words. Our system employs a statistical approach, which was trained on 208 sentences and tested on 89 sentences. We extracted various lexical and syntactic features, including the phrase type of each constituent, the headword, and the position and distance from the predicate to the constituent in question and voice. Comparing the result with knowledge support of HowNet to the result without it, we found distinct improvement when using HowNet. The study also reveals that the system can be improved by applying more information from HowNet, introducing full parsing information, enriching the feature set, and using more appropriate probability estimation model. |