中文摘要 |
命題術語(Propositional Term)表達文章中重要概念且引導讀者文章脈絡之發展。這篇論文以學術論文摘要為實驗對象進行命題術語擷取,研究中整合條件隨機域(Conditional Random Fields, CRFs)以及結合聯繫測量(Combined Association Measure, CAM)兩種方法,考量詞彙內部凝聚力和文脈兩大類訊息,截取出的命題術語不再侷限於名詞片語型態,且可由單詞或多詞所構成。在命題術語擷取的過程中,將其視為一種序列資料標籤的任務,並利用IOB編碼方式識別命題述語的邊界,CRF考量多層次構成命題述語的特徵,負責初步命題術語偵測,再利用CAM計算詞彙凝聚力,藉以加強確認命題術語詞彙的邊界。實驗結果顯示 ,本研究所提出的方法比以往述語偵測方法在效能上有明顯增進,其中,CRF明顯增進非完美術語詞彙邊界辨識(Imperfect hits)的召回率,而CAM則有效修正術語詞彙邊界。 |
英文摘要 |
Propositional terms in a research abstract (RA) generally convey the most important information for readers to quickly glean the contribution of a research article. This paper considers propositional term extraction from RAs as a sequence labeling task using the IOB (Inside, Outside, Beginning) encoding scheme. In this study, conditional random fields (CRFs) are used to initially detect the propositional terms, and the combined association measure (CAM) is applied to further adjust the term boundaries. This method can extract beyond simply NP-based propositional terms by combining multi-level features and inner lexical cohesion. Experimental results show that CRFs can significantly increase the recall rate of imperfect boundary term extraction and the CAM can further effectively improve the term boundaries. |