英文摘要 |
In this paper, we propose a new method for bilingual collocation extraction from a parallel corpus to provide phrasal translation memory. The method integrates statistical and linguistic information for effective extraction of collocations. The linguistic information includes parts of speech, chunks, and clauses. With an implementation of the method, we obtain first an extended list of collocations from monolingual corpora such as British National Corpus (BNC). Subsequently, we exploit the list to identify English collocations in Sinorama Parallel Corpus (SPC). Finally, we use word alignment techniques to retrieve the translation equivalent of English collocations from the bilingual corpus, so as to provide phrasal translation memory for machine translation system. Based on the strength of chunk and clause analyses, we are able to extract a large number of collocations and translations with much less time and effort than those required by N-gram analysis or full parsing. Furthermore, we also consider longer collocation pattern such as a preposition involved in VN collocation. In the future, we plan to extend the method to other types of collocation. |