英文摘要 |
This paper presents a system that could automatically extract new POIs from Web. First, we use special queries (e.g. Taipei+New Open) to find Web pages that might contain addresses for new stores. For web pages that contain addresses, we then apply store name recognition model to extract possible POIs. Finally, we train a model to find the most possible POI for the address found in the page. In this paper, we focus on POI name recognition and POI relation prediction. For POI recognition, we use store names from yellow pages as seed to prepare the training data via distant learning. Through entity selection and data processing, we obtain a model with 0.816 F1-measure as opposed to 0.432 F1-measure for a dictionary-based baseline. As for POI relation prediction, we compare three different strategies for negative example preparation. The best model could get 0.754 accuracy. We combine two POI recognition models with three classification models to test the overall performance. The best combination could extract 49 POIs every day with a single IP. |