月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
圖書資訊學刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
An Approach to Retrieval of OCR Degraded Text
並列篇名
OCR 雜訊文件之檢索
作者 曾元顯 (Yuen-Hsien Tseng)
中文摘要
The major problem with retrieval of OCR text is the unpredictable distortion of characters due to recognition errors. Because users have no ideas of such distortion, the terms they query can hardly match the terms stored in the OCR text exact1y. Thus retrieval effectiveness is significant1y reduced , especially for low-quality input. To reduce the losses from retrieving such noisy OCR text, a fault-tolerant retrieval strategy based on automatic keyword extraction and fuzzy matching is proposed. In this strategy,terms, correct or not, and their term frequencies are extracted from the noisy text and presented for browsing and selection in response to users' initial queries. With the understanding of the real terms stored in the noisy text and of their estimated frequency distributions, users may then choose appropriate terms for a more effective searching. A text retrieval system based on this strategy has been built. Examples to show the effectiveness are demonstrated. Finally, some OCR issues for further enhancing retrieval effectiveness are discussed.
起訖頁 153-168
關鍵詞 Optical character recognition, information retrieval, fault-tolerant retrieval,keyword extraction, fuzzy maiching
刊名 圖書資訊學刊  
期數 199812 (13期)
出版單位 國立臺灣大學圖書資訊學系
該期刊-上一篇 從館際互借到電子文獻傳遞服務
該期刊-下一篇 我國學術資訊網路使用及資訊倫理教育之研究
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄