英文摘要 |
This paper describes the procedures involved in developing EXEC, a web-based system which can automatically extract English collocations and their Chinese-English bilingual examples from parallel corpora. The system draws on statistics, dependency parsing, and Chinese-English parallel corpora of more than 13 million English words and 27 million Chinese characters. By taking a word as well as the parts-of-speech of the word and its collocate as input, the system can automatically generate collocation candidates based on syntactic dependency relations as well as statistical information regarding mutual information, t-scores, and log likelihood ratios. In conjunction with a Chinese-English bilingual concordancer, it can further extract English sentences containing identified collocations along with their Chinese translations. Our evaluations suggest that the proposed system performs reasonably well in terms of accuracy and efficiency. EXEC can be used in facilitating automatic compilation of bilingual collocation dictionaries as well as in overcoming the L2 language barrier for Chinese learners of English.
本文描述英中雙語搭配語自動編纂線上系統 EXEC 的設計流程。 EXEC 由一千三百萬英文詞及二千七百萬中文字的中英雙語平行語料 庫建立而成,結合英語搭配語檢索和中英雙語檢索功能。EXEC 利用 統計以及具有依存關係的英文句法剖析器擷取英文搭配語。使用者在 查詢時輸入關鍵詞和關鍵詞的詞性以及所搜尋的搭配語的詞性,程式 依據英文句法剖析器的依存關係和 mutual information、t-score、log likelihood ratio 等統計訊息自動擷取可能的英文搭配語,並連結包含英 文搭配語的英文例句及中文翻譯。實驗顯示 EXEC 在擷取的正確率和 辭典的涵蓋率都超過 80%且可以很有效率地自動從平行語料擷取英文 搭配語、例句、及中文翻譯。 |