英文摘要 |
In the era of blog, more and more useful information is shared on blogs. Mining text on blogs has become one of important and novel research directions in the filed of web mining. For an automatic blog text mining system, it is necessary to locate the tags which describe the main concepts of blog text effectively and efficiently. This research uses the technique of relative proportion of text and tag in order to find the most possible tag for main blog text. More particularly, we use the concept of DOM (document object model) through the java crawler to analyze the relationship between text and tag. According to our experiments, our automatic blog text extraction mechanism is able to extract the main text of blog effectively and efficiently. |