中文摘要 |
We propose a novel statistical translation model to improve translation selection of
collocation. In the statistical approach that has been popularly applied for
translation selection, bilingual corpora are used to train the translation model.
However, there exists a formidable bottleneck in acquiring large-scale bilingual
corpora, in particular for language pairs involving Chinese. In this paper, we
propose a new approach to training the translation model by using unrelated
monolingual corpora. First, a Chinese corpus and an English corpus are parsed
with dependency parsers, respectively, and two dependency triple databases are
generated. Then, the similarity between a Chinese word and an English word can
be estimated using the two monolingual dependency triple databases with the help
of a simple Chinese-English dictionary. This cross-language word similarity is
used to simulate the word translation probability. Finally, the generated translation
model is used together with the language model trained with the English
dependency database to realize translation of Chinese collocations into English. To
demonstrate the effectiveness of this method, we performed various experiments
with verb-object collocation translation. The experiments produced very promising
results. |