英文摘要 |
In this paper, a statistical language model that can model both remote and local dependencies is proposed. This model takes into account the relationship between the predicted word and its preceding words without considering the order of the preceding words. Two primary parameters, the reliability coefficient and the combination factor, are proposed to achieve a better performance of the language model. The reliability coefficients identify the reliabilities of the remote dependencies to the predicted word. The combination factor gives a weight to the combination of the local dependency and the remote dependency. The language model was tested on the task of word clustering and compared to the traditional N-gram language model. A large corpus provided by Academia Sinica, Taiwan, containing 5 million words was used for training and testing. The experimental results show that the proposed model takes littler computation and achieves a better. performance for large N compared to the traditional N-gram language model. |