專利雙語語料之中、英對照詞自動擷取

曾元顯; 劉昭麟; 莊則敬

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	專利雙語語料之中、英對照詞自動擷取
並列篇名	Automatic Term Pair Extraction from Bilingual Patent Corpus
作者	曾元顯 (Yuen-Hsien Tseng)、劉昭麟、莊則敬
中文摘要	針對50多萬筆的專利中英雙語語料，本文提出兩種翻譯對照詞彙的自動擷取方案，一種是精確導向、另一種是召回導向。在精確導向的方案中，我們提出了一種詞彙擷取方法，並比較了六種詞彙對列作法，以實際資料驗證，得出可供參考的經驗。我們發現EM（Expectation Maximization）方法效果最好，但其最花時間，也難以找出多對多的同義翻譯。而即便是最差的MI（mutual information）法，其排序在前頭的正確詞對跟EM法不同，因此可以作為輔助的詞對擷取方法，為後續合併或混用多種對列方式的研究，開啟了可能性。在召回導向的方案中，我們提出了簡單的想法與有效的實做，可從雙語對列語料庫中召回大量的新詞對，供後續應用，讓既有的上百萬條雙語詞庫，再增加約20%的新詞對。
英文摘要	This paper proposes two approaches to extract translation term pairs from Chinese-English bilingual corpus with more than 500,000 patents. One approach is precision-oriented, in which we compare six term alignment methods. Based on our experiments, we find that the EM (Expectation Maximization) method is the best. However, it is time-consuming and hard to extract many-to-many translations for the same concept. While the MI (mutual information) method performs worst, the term pairs extracted may be totally different from those by EM. This may inspire subsequent researches to study the possibility of hybrid term alignment methods. The other approach is recall-oriented, in which a simple idea was proposed. With an efficient implementation, 20% more term pairs were extracted based on an existing lingual lexicon which already has more than one million term pairs merged from several sources.
起訖頁	279-292
關鍵詞	專利語料庫、機器翻譯、專利跨語分析、詞彙對列、新詞擷取、Patent Corpus、Machine Translation、Cross-lingual Patent Analysis、Term Alignment、Term Extractuon
刊名	ROCLING論文集
期數	2009 (2009期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	併合式倒頻譜統計正規化技術於強健性語音辨識之研究
該期刊-下一篇	多媒體互動課程對華語時貌標記「了」學習成效之研究

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱