月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
語言暨語言學 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
基於知網的中文語義論元標注
並列篇名
Semantic Role Labeling in Chinese Using HowNet
作者 Xia Wang (Xia Wang)
中文摘要
語義論元標注對於許多應用系統,如:機器翻譯、信息提取、問答系統、數據挖掘等都有重大的影響。語義論元標注的研究對自然語言理解的重要性可想而知。迄今為止,這個領域的研究者提出了各種算法,其中多數是基於統計的。基於統計的算法必須處理數據稀疏的問題。在我們的初步研究中,發現大多數詞都是低頻詞,有的甚至沒有在訓練語料中出現,只有極少數的高頻詞有充足的語料進行訓練。為了解決這個問題,我們採用了基於知網的回退模型。我們選擇了四個中文動詞進行了實驗。本實驗採用了208句的訓練語料和89句的測試語料。我們從訓練文本中抽取了各種詞彙和句法特徵,包括論元的短語類型、中心詞、論元相對於謂詞的位置和距離等。實驗結果證明,把知網的知識用於語義論元標注,能很好的改善標注的準確率。
英文摘要
Semantic Role Labeling (SRL) has significant impact on many application systems, such as Machine Translation, Information Extraction, Question-Answering, Text Summarization and Text Data Mining. Therefore research on SRL is important for natural language understanding, and so far a number of algorithms, mostly statistically oriented, have been proposed in this field. Statistical algorithms must deal with the problem of data sparseness. In our initial study, we found that most words appear only a small number of times, and other words are absent completely in the training set. Only a small number of frequent words supply sufficient data for training. To solve this problem, we developed a backoff model based on HowNet. In this study, we demonstrate the benefit of applying the knowledge from HowNet to Semantic Role Labeling by experimenting with four selected Chinese words. Our system employs a statistical approach, which was trained on 208 sentences and tested on 89 sentences. We extracted various lexical and syntactic features, including the phrase type of each constituent, the headword, and the position and distance from the predicate to the constituent in question and voice. Comparing the result with knowledge support of HowNet to the result without it, we found distinct improvement when using HowNet. The study also reveals that the system can be improved by applying more information from HowNet, introducing full parsing information, enriching the feature set, and using more appropriate probability estimation model.
起訖頁 449-461
關鍵詞 語義論元標注知網統計方法Semantic Role LabelingHowNetstatistical approach
刊名 語言暨語言學  
期數 200804 (9:2期)
出版單位 中央研究院語言學研究所
該期刊-上一篇 漢語複合詞理解難易度的計算
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄