英文摘要 |
In segmentation of Chinese, two competing approaches have been often used separately: the rule-based approach and the statistical approach. Each approach has its advantages and disadvantages. In this paper we describe a hybrid approach which unifies them in a single flexible segmentation process in which items stored in the dictionary or identified by heuristic rules are assigned a default probability. By varying the default probability value, the hybrid approach can cover a wide range of approaches from the purely statistical one to the purely rule-based one. Our experiments on two corpora show that by a proper setting of the default probability, the hybrid approach gives much better results than statistical or rule-based approaches alone. A text retrieval system is then adapted to the segmented Chinese texts. Preliminary results of the retrieval system are reported. |