Transliteration Extraction from Classical Chinese Buddhist Literature Using Conditional Random Fields with Language Models

Yu-Chun Wang; Karol Chia-Tien Chang; Richard Tzong-Han Tsai; Jieh Hsiang

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Transliteration Extraction from Classical Chinese Buddhist Literature Using Conditional Random Fields with Language Models
作者	Yu-Chun Wang (Yu-Chun Wang)、Karol Chia-Tien Chang (Karol Chia-Tien Chang)、Richard Tzong-Han Tsai (Richard Tzong-Han Tsai)、Jieh Hsiang (Jieh Hsiang)
英文摘要	Extracting plausible transliterations from historical literature is a key issue in historical linguistics and other research fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguists and digital humanities researchers, this paper proposes a transliteration extraction method based on the conditional random field method with features based on the language models and the characteristics of the Chinese characters used in transliterations. To evaluate our method, we compiled an evaluation set from two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also constructed a baseline approach with a suffix array based extraction method and phonetic similarity measurement. Our method significantly outperforms the baseline approach, and the method achieves recall of 0.9561 and precision of 0.9444. The results show our method is very effective for extracting transliterations in classical Chinese texts.
起訖頁	25-38
關鍵詞	Ttransliteration Extraction、Classical Chinese、Buddhist Literation、Langauge Model、Conditional Random Fields、CRF
刊名	中文計算語言學期刊
期數	201409 (19:3期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	BCCWJ-TimeBank: Temporal and Event Information Annotation on Japanese Text
該期刊-下一篇	Modeling Human Inference Process for Textual Entailment Recognition