Building a Bracketed Corpus Using Φ[feb4]Statistics

Lee, Yue-shi; Chen, Hsin-hsi

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Building a Bracketed Corpus Using Φ[feb4]Statistics
作者	Lee, Yue-shi (Lee, Yue-shi)、Chen, Hsin-hsi (Chen, Hsin-hsi)
中文摘要	Research based on treebanks is ongoing for many natural language applications. However, the work involved in building a large-scale treebank is laborious and time-consuming. Thus, speeding up the process of building a treebank has become an important task. This paper proposes two versions of probabilistic chunkers to aid the development of a bracketed corpus. The basic version partitions part-of-speech sequences into chunk sequences, which form a partially bracketed corpus. Applying the chunking action recursively, the recursive version generates a fully bracketed corpus. Rather than using a treenank as a training corpus, a corpus, which is tagged with part-of-speech information only, is used. The experimental results show that the probabilistic chunker has a correct rate of more than 94% in producing a partially bracketed corpus and also gives very encouraging results in generating a fully bracketed corpus. These two versions of chunkers are simple but effective and can also be applied to many natural language applocations.
起訖頁	1-23
關鍵詞	自然語言應用、電腦語言
刊名	中文計算語言學期刊
期數	199708 (2:2期)
出版單位	中華民國計算語言學學會
該期刊-下一篇	Longest Tokenization