PTT網站餐廳美食類別擷取之研究

鍾智宇; 周建龍; 張嘉惠

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	PTT網站餐廳美食類別擷取之研究
並列篇名	A Study of Restaurant Information and Food Type Extraction from PTT
作者	鍾智宇、周建龍、張嘉惠
中文摘要	隨著資訊科技與網際網路的快速發展，從自然語言中擷取所需資訊（Information Extraction）技術也愈顯重要，本研究希望針對國內最大的電子佈告欄系統（BBS, Bulletin Board System）「PTT」中的「Food」版發展出一套自動化擷取文章中餐廳相關資訊並判斷餐廳類別的方法，讓餐廳資訊的取得更加快速且便利。本文架構主要分為三個部分，第一部分為餐廳相關資訊擷取，透過PTT Crawler擷取PTT Food版上的文章進行格式化處理，並藉由關鍵字比對的方式擷取特定文章標題，以及正規表達式（Regular Expression）擷取內文包含的餐廳名稱、電話、地址及URL資訊。第二部分則是文章標題作為餐廳類別（例：咖啡、涮涮鍋、台式料理）的擷取來源，隨機挑選10,000筆標題資料針對隱含其中的餐廳類別進行人工標記；最後再透過WIDM實驗室研究室整合了條件式隨機域（Conditional Random Field, CRF）所開發的WIDM NER TOOL分別進行監督式學習與半監督式學習的實驗，並從實驗結果得知利用此法在餐廳類別的擷取可獲得不錯的效果。
英文摘要	In this study, we hope to develop a system to automatically extract restaurant type from the FOOD board of PTT, the largest BBS web site in Taiwan. This paper is divided into three parts. The first part is pre-processing, where we crawl articles from the PTT FOOD board and extract title、restaurant name、telephone 、address and URL information via regular expressions. The second part is restaurant type labeling from title data. We used WIDM NER TOOL to train a model for restaurant type extraction. The last part of the article is experiment. We randomly selected 10,000 titles for manual labeling and testing. We used the labeled data for supervised learning and included unlabeled data for Semi-Supervised learning. Finally we got a good result using this method in restaurant type extraction.
起訖頁	183-196
關鍵詞	機器學習、Tri-Training、Distant Learning、命名實體辨識、Machine Learning、Tri-Training、Distant Learning、Named Entity Recognition
刊名	ROCLING論文集
期數	2017 (2017期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	改進的向量空間可適性濾波器用於聲學回聲消除
該期刊-下一篇	基於半監督式學習之廣播節目語音逐字稿自動轉寫系統

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱