基於對照表以及語言模型之簡繁字體轉換

李民祥; 吳世弘; 楊秉哲; 谷圳

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	基於對照表以及語言模型之簡繁字體轉換
並列篇名	Chinese Characters Conversion System based on Lookup Table and Language Model
作者	李民祥、吳世弘、楊秉哲、谷圳
中文摘要	中國大陸與台灣的文字同屬於華文字體，但字體上卻分為簡體字與繁體字。中國大陸與台灣近年來在中文書籍及網路上皆有大量的資訊交流。基於閱讀習慣，文字勢必需要執行簡繁轉換後才利於雙方的讀者閱讀。傳統的簡繁轉換擁有簡體一字對繁體多字的歧異問題以及兩岸用語不同的問題。因此，本研究設計一個具有擴展性的簡繁轉換系統，透過擷取維基百科新增對照表內容來改善兩岸用語不同的問題，以及使用語言模型改善簡體字一個字對繁體字多個字的歧異問題。此系統可以降低各種中文電子書籍執行簡繁轉換後人工校正的成本。具有彈性的架構使得系統可以持續擴充改進。
英文摘要	The character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of information exchange between China and Taiwan through books and Internet. To provide readers a convenient reading environment, the character conversion between simplified and traditional Chinese is necessary. The conversion between simplified and traditional Chinese characters has two problems: one-to-many ambiguity and term usage problems. Since there are many traditional Chinese characters that have only one corresponding simplified character, when converting simplified Chinese into traditional Chinese, the system will face the one-to-many ambiguity. Also, there are many terms that have different usages between the two Chinese societies. This paper focus on designing an extensible conversion system, that can take the advantage of community knowledge by accumulating lookup tables through Wikipedia to tackle the term usage problem and can integrate language model to disambiguate the one-to-many ambiguity. The system can reduce the cost of proofreading of character conversion for books, e-books, or online publications. The extensible architecture makes it easy to improve the system with new training data.
起訖頁	113-127
關鍵詞	簡繁轉換、語言模型、維基百科、對照表、Chinese character conversion、Language model、Wikipedia、Lookup table
刊名	ROCLING論文集
期數	2010 (2010期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	以語文特徵為基之中學閱讀測驗短文分級
該期刊-下一篇	以共現資訊為基礎增進中學英漢翻譯試題與解答之詞彙對列

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱