中文摘要 |
電子化文本具備簡單資訊擷取功能,已是醫學文獻應用在臨床之新趨勢(如實證醫學)。古代醫家留下大量中醫古籍文獻直到現今,如果我們能夠給予這些書籍電子化並具備簡單資訊擷取功能,將是花費少作用大的考量。在本研究中,中醫藥典籍文獻之原始電子文件來源,將取自於從1997 年已開始建立之TCMET 資料庫,類別辭典庫由中醫理、法、方、藥的觀念建構而成。將中醫藥典籍文本及關鍵詞之索引及權重,經項目頻率(tf) 及反轉文件頻率(idf) 計算後製成表格。一個共有89,462 個關鍵詞的理法方藥詞庫含頻率比重,已放置在TCMET 網際網路上供參考及閱覽。在本文的最後,我們以4 個關鍵詞為作文獻查詢及排序進行測試,藉文獻反轉檔以計算出關鍵詞之比重,從文獻擷取排序之結果發現,詞頻權重的確能協助中醫理法方藥概念之篩選。本研究的另一個目的,是為了建立一個先導計畫平台,並將過去數年間之中醫辭典與文獻全文整合起來,可期望作為未來數年間中醫藥典籍文本擷取技術提昇之基礎。Electronic text with simple information retrieval function is emerging as a trend over the area of medicine. Since, there was a large volume of Traditional Chinese Medicinal Literatures (TCML) left behind from our ancestors, it is a cost and effectiveness issue while we make use of them electronically and apply them wherever need eTCML with handy ranking functions. In this study, TCML was taken from the Traditional Chinese Medicinal Electronic Text (TCMET) database which has been created successfully and working smoothly as an eTCML retrieval system since 1997 in our laboratory. Furthermore, a cluster of thesaurus has been constructed basing upon the concept of Principle-Method- Recipe-Medicine (PMRM) and acting as the comparison terms. A conventional information retrieval tool by using the term frequency (tf ) and inverted document frequency (idf ) is applied to claim for the PMRM document ranking. Through tf and idf innovation, the relationship between documents and keywords can be indexed and weighted. The resulting table of the PMRM thesauri with 89,462 keywords has already been posted on the TCMET website for references. Additionally, a query with 4 sets of keyword was used to test the documental ranking function from the inverted file scorings which gave us some valuable retrieval information about the TCM concepts in PMRM and led us to rule out the others. The objective of this study is trying to integrate the TCM thesauri and TCML full-text database for promoting our eTCML projects in order to build an advanced TCMET retrieval system in future years. |