英文摘要 |
In designing grammar-checking systems, the pattern matching algorithm, although failing to handle complex errors, is still widely adopted today. This is because when compared with the method of employing full scale parsing, pattern matching is efficient in detecting local errors with much less computer time and memory. However, the patterns used in the pattern matching approach are usually hand-tuned, and thus suffer from inadequacy in handling correlations among patterns. These error patterns may conflict or overlap with each other. Therefore, an automatic rule selection method, called Sequential Forward Selection (SFS), is proposed in this paper to tackle these problems. SFS uses objective performance measures to automatically search the suboptimal ruleset from all the possible combinations of rules. With SFS, the effectiveness of each rule can be measured, and problematic patterns can be identified systematically and efficiently for the linguist to fine-tune. Therefore, the error patterns can be revised efficiently. In our tests based on a corpus of 1956 sentences, the false rate decreases by 11.8% (from 26.4% to 14.8%) if the suboptimal rule set (81) selected by SFS is adopted, instead of the whole rule set (127). With this suboptimal rule set, the recognition rate decreases only by 3.9% (from 38.9% to 35%). |