  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

Automatic Term Pair Extraction from Bilingual Patent Corpus
作者 曾元顯 (Yuen-Hsien Tseng)劉昭麟莊則敬
針對50多萬筆的專利中英雙語語料,本文提出兩種翻譯對照詞彙的自動擷取方案,一種是精確導向、另一種是召回導向。在精確導向的方案中,我們提出了一種詞彙擷取方法,並比較了六種詞彙對列作法,以實際資料驗證,得出可供參考的經驗。我們發現EM(Expectation Maximization)方法效果最好,但其最花時間,也難以找出多對多的同義翻譯。而即便是最差的MI(mutual information)法,其排序在前頭的正確詞對跟EM法不同,因此可以作為輔助的詞對擷取方法,為後續合併或混用多種對列方式的研究,開啟了可能性。在召回導向的方案中,我們提出了簡單的想法與有效的實做,可從雙語對列語料庫中召回大量的新詞對,供後續應用,讓既有的上百萬條雙語詞庫,再增加約20%的新詞對。
This paper proposes two approaches to extract translation term pairs from Chinese-English bilingual corpus with more than 500,000 patents. One approach is precision-oriented, in which we compare six term alignment methods. Based on our experiments, we find that the EM (Expectation Maximization) method is the best. However, it is time-consuming and hard to extract many-to-many translations for the same concept. While the MI (mutual information) method performs worst, the term pairs extracted may be totally different from those by EM. This may inspire subsequent researches to study the possibility of hybrid term alignment methods. The other approach is recall-oriented, in which a simple idea was proposed. With an efficient implementation, 20% more term pairs were extracted based on an existing lingual lexicon which already has more than one million term pairs merged from several sources.
起訖頁 279-292
關鍵詞 專利語料庫機器翻譯專利跨語分析詞彙對列新詞擷取Patent CorpusMachine TranslationCross-lingual Patent AnalysisTerm AlignmentTerm Extractuon
刊名 ROCLING論文集  
期數 2009 (2009期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 併合式倒頻譜統計正規化技術於強健性語音辨識之研究
該期刊-下一篇 多媒體互動課程對華語時貌標記「了」學習成效之研究




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄