基於貝氏定理自動分析語料庫與標定文步

張瓊文; 徐嘉連; 張俊盛

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	基於貝氏定理自動分析語料庫與標定文步
並列篇名	A Bayesian approach to determine move tags in corpus
作者	張瓊文 (Chiung-Wen Chang)、徐嘉連 (Jia-Lien Hsu)、張俊盛
中文摘要	利用科技幫助語言學習，是一個重要的研究議題，英文是現今人們主要的溝通語言，對於非英語體系的國家，學習英語（從聽力、閱讀到寫作）是一件困難的事情。尤其在寫作方面，由於英文文法跟中文文法上的差異，導致在學習英文寫作時，常常會將組成句子的架構搞混，使得在學習寫作有較大的困難。英文學術論文寫作，不同於一般文章寫作，通常有明確的架構與段落，如「簡介」、「相關文獻」、「方法」、「結果」等，此結構稱為「文步」。此外，學術論文寫作與一般寫作有些許的不同，在寫作的用詞上就有些差異，因此，為了幫助需要寫學術論文的同學們，我們參考學術論文的文步架構，設計文步分類器訓練語言模組，擷取在特定文步使用的字詞。在語言處理方面，學者們依照文步架構，提出自動化分析，但是在訓練語言模組中通常需要大量人工標註資料，為了降低人工標註的部分，我們將專家整理歸納的詞彙，透過機器學習與迭代（bootstraping）的方法達到學習效果，再利用訓練過的語言模型，預測文章句子當中的文步。在本研究中，我們提出一套系統，以貝氏方法（Bayesian approach）做語言文步分析，此系統分為兩部分，一為訓練階段（Training phase），另為測試階段（Testing phase）。在訓練階段中，透過大量的文本（Corpus）建立學習模型，採用專門蒐集學術論文簡介的語料集（Cite-SeerX）與初始規則（Initial pattern）做為分析的依據，利用貝氏方法判斷語料庫中每篇簡介裡的句子所屬的文步（move），當句子被標定完文步之後，利用迭代的方法更新貝氏模型，達到學習效果。而在測試模型中，將訓練階段得到的結果，給予一篇新的簡介，一樣透過貝氏方法，預測文步，經過測試階段，我們得到文步預測精確率為56%。
英文摘要	English of Academic Writing (EAW) is essential to the research community for sharing knowledge. Research documents using EAW, especially the abstract and introduction, may follow a simple and succinct picture of the organizational patterns, called move. This paper introduces a method for computational analysis of move structures, the Background-Purpose- Method-Result-Conclusion in this paper, in abstracts and introductions of research documents, instead of manually time-consuming and labor-intensive analysis process. In our approach, sentences in a given abstract and introduction are automatically analyzed and labeled with a specific move (i.e., B-P-M-R-C in this paper) to reveal various rhetorical functions. As a result, it is expected that the automatic analytical tool for move structures will facilitate non-native speakers or novice writers to be aware of appropriate move structures and internalize relevant knowledge to improve their writing. In this paper, we propose a Bayesian approach to determine move tags for research articles. The approach consists of two phases, training phase and testing phase. In the training phase, we build a Bayesian model based on a couples of given initial patterns and the corpus, a subset of CiteSeerX. In the beginning, the priori probability of Bayesian model solely relies on initial patterns. Subsequently, with respect to the corpus, we process each document one by one: extract features, determine tags, and update the Bayesian model iteratively. In the testing phase, we compare our results with tags which are manually assigned by the experts. In our experiments, the promising accuracy of the proposed approach reaches 56%.
起訖頁	87-99
關鍵詞	學術英文寫作、輔助寫作、文步分析、Academic English Writing、Assisted Writing、Move Tag Analysis
刊名	ROCLING論文集
期數	2015 (2015期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	可讀性預測於中小學國語文教科書及優良課外讀物之研究
該期刊-下一篇	調變頻譜分解之改良於強健性語音辨識

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱