月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Mitigating Impacts of Word Segmentation Errors on Collocation Extraction in Chinese
並列篇名
Mitigating Impacts of Word Segmentation Errors on Collocation Extraction in Chinese
作者 Yongfu Liao (Yongfu Liao)Shu-Kai Hsieh (Shu-Kai Hsieh)
中文摘要
隨著網路的盛行,自動斷詞與標記的大規模語料庫逐漸普及。自動化不可避免地引入一些斷詞與標記的錯誤,並可能對下游任務產生負面影響。搭配詞的自動抽取是一項受斷詞品質影響的任務。本文探討一些方法試圖減輕斷詞錯誤對漢語搭配詞抽取之影響。我們嘗試了一個結合多個共現訊息的簡單線性模型,試圖減少抽取出之搭配詞含有的斷詞錯誤。實驗結果顯示,此模型無法區分搭配詞是否為斷詞錯誤所導致。因此,我們使用了FastText詞向量的訊息進行了另一個案例研究。結果顯示,由斷詞錯誤所產生的假搭配詞與真正的搭配詞,其之間的語義相似性具有不同的特徵。未來研究可嘗試在搭配詞抽取中加入詞向量的訊息。
英文摘要
The prevalence of the web has brought about the construction of many large-scale, automatically segmented and tagged corpora, which inevitably introduces errors due to automation and are likely to have negative impacts on downstream tasks. Collocation extraction from Chinese corpora is one such task that is profoundly influenced by the quality of word segmentation. This paper explores methods to mitigate the negative impacts of word segmentation errors on collocation extraction in Chinese. In particular, we experimented with a simple model that aims to combine several association measures linearly to avoid retrieving false collocations resulting from word segmentation errors. The results of the experiment show that this simple model could not differentiate between true collocations and false collocations resulting from word segmentation errors. An ad hoc case study incorporating information from FastText word vectors is also conducted. The results show that collocates resulting from correct and erroneous word segmentation have different profiles in terms of the semantic similarities between the collocates. The incorporation of word vector information to differentiate between true and false collocations is suggested for future work.
起訖頁 1-13
關鍵詞 搭配詞抽取中文斷詞詞向量Collocation ExtractionChinese Word SegmentationWord Vector
刊名 ROCLING論文集  
期數 2020 (2020期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 NSYSU+CHT團隊於2020遠場語者驗證比賽之語者驗證系統
該期刊-下一篇 情感分析於投資溫度評分之應用
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄