唐宋詩之詞匯自動分析及應用

俞士汶; 胡俊峰

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	唐宋詩之詞匯自動分析及應用
並列篇名	Word-based Statistical Analysis of Chinese Ancient Poetry
作者	俞士汶、胡俊峰
中文摘要	本文介紹了唐宋詩之詞匯的自動提取、自動分析技術及其在古代詩詞電腦輔助研究中的一些應用的實例。文章著重介紹了利用統計的方法對唐宋詩語料進行詞匯獲取的具體演算法及步驟。給出了「共現度」、「結合強度」等統計參數的計算方法，並與傳統的「互信息」方法進行了比較。在對全唐詩（481萬字）、宋代部分名家詩（160萬字）的語料完成切分及詞性標注的基礎上，系統又對唐宋詩詞匯的使用情況進行了統計分析。提取了詞匯共現信息、詞匯對仗信息、作者詞匯特徵信息以及詞匯引用的時代變遷分布等多種統計信息。在此基礎上，進一步對唐宋詩詩句的相似性檢索、詩人風格檢索、意象索引技術等研究課題進行了探索。
英文摘要	This paper is concerned with the automatic extraction of multi-character words from a corpus of ancient Chinese poetry and with some applications at word level. A detailed description of the word-extraction algorithm is given and compared with the mutual-information method. The study has been based on a 4.8 million-character corpus of Tang Dynasty poetry and a 1.6 million-character corpus of Song Dynasty poetry. Statistical analysis to date includes collocation, word-to-author analysis information, etc. Further research would include sentencesimilarity retrieval and a semantic index.
起訖頁	631-647
關鍵詞	詞匯自動提取、搭配、詩句的相似性檢索、意象索引、中國古代詩歌、automatic word extraction、collocation、sentence similarity retrieval、semantic index、ancient Chinese poetry
刊名	語言暨語言學
期數	200307 (4:3期)
出版單位	中央研究院語言學研究所
該期刊-上一篇	普通法的普通用法：法律詞匯於法律文本和一般文本的語義差別比較
該期刊-下一篇	漢語述結式的組配約束及 “v＋a＋n” 歧義格式分析