White Page Construction from Web Pages for Finding People on the Internet

Chen, Hsin-hsi; Bian, Guo-wei

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	White Page Construction from Web Pages for Finding People on the Internet
作者	Chen, Hsin-hsi (Chen, Hsin-hsi)、Bian, Guo-wei (Bian, Guo-wei)
中文摘要	This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents includes proper nouns, E-mail addresses and home page URLs. Natural language processing techniques are employed to identify and classify proper nouns, which are usually unknown words. The information (i.e., home pages' URLs or e-mail addresses) for those proper nouns appearing in the anchor parts can be easily extracted using the associated anchor tags. For those proper nouns in the non-anchor part of a web page, different kinds of clues, such as the spelling method, adjacency principle and HTML tags, are used to relate proper nouns to their corresponding E-mail addresses and/or URLs. Based on the semantics of content and HTML tags, the extracted information is more accurate than the results obtained using traditional search engines. The results can be used to construct white pages for Internet/Intranet users or to build databases for finding people and organizations on the Internet. Such searching services are very useful for human communication and dissemination of information.
起訖頁	75-100
關鍵詞	Proper name identification、Information extraction、White pages、World wide web
刊名	中文計算語言學期刊
期數	199802 (3:1期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Towards a Representation of Verbal Semantics--An Approach Based on Near-Synonyms
該期刊-下一篇	Human Judgement as a Basis for Evaluation of Discourse-Connective-Based Full-Text Abstraction in Chinese