Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria

Chuang, Thomas C.; Yeh, Kevin C.

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria
作者	Chuang, Thomas C. (Chuang, Thomas C.)、Yeh, Kevin C. (Yeh, Kevin C.)
中文摘要	We present a new approach to aligning sentences in bilingual parallel corpora based on punctuation, especially for English and Chinese. Although the length-based approach produces high accuracy rates of sentence alignment for clean parallel corpora written in two Western languages, such as French-English or German-English, it does not work as well for parallel corpora that are noisy or written in two disparate languages such as Chinese-English. It is possible to use cognates on top of the length-based approach to increase the alignment accuracy. However, cognates do not exist between two disparate languages, which limit the applicability of the cognate-based approach. In this paper, we examine the feasibility of exploiting the statistically ordered matching of punctuation marks in two languages to achieve high accuracy sentence alignment. We have experimented with an implementation of the proposed method on parallel corpora, the Chinese-English Sinorama Magazine Corpus and Scientific American Magazine articles, with satisfactory results. Compared with the length-based method, the proposed method exhibits better precision rates based on our experimental reuslts. Highly promising improvement was observed when both the punctuation-based and length-based methods were adopted within a common statistical framework. We also demonstrate that the method can be applied to other language pairs, such as English-Japanese, with minimal additional effort.
起訖頁	95-122
關鍵詞	Sentence alignment、Cognate alignment、Machine translation
刊名	中文計算語言學期刊
期數	200503 (10:1期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Chinese Main Verb Identification: From Specification to Realization
該期刊-下一篇	Similarity Based Chinese Synonym Collocation Extraction