英文摘要 |
This paper describes our on-going project on grammatical inference for Chinese. We here emphasize on the design of our sem-syn initial grammar that is a set of stochastic context-free rules and whose probabilistic parameters will be iteratively re-estimated in a corpus-based inference technique. Manually developing and maintaining a grammar for a NLP system has long been regarded as a painful and endless job. Besides, this conventional approach usually results in a grammar with limited coverage. With large bodies of text corpora available on computers, Corpus-based grammatical inference (GI) techniques seem to provide a promising solution to the problems. An initial grammar is one of the important components in GI techniques and its function is to facilitate the inference process to proceed. In this paper, we describe the design of our sem-syn initial grammar and how it corresponds to the information given in Sinica Corpus on which our inference system is based. We also give a brief introduction to our Chinese grammatical inference system, showing how the system will use the sem-syn initial grammar to generalize structure from the Corpus. |