英文摘要 |
In the E-Commerce era, many agents roam over Internet to find best prices, cluster related product information, etc. Agents have to visit targeted web pages periodically to update information. If agents visit pages too frequently then they end up reloading existing information. On the other hand, if agents visit web pages too infrequently, collected data may be out of date. To minimize out-of-date errors, agents temp to visit a site as soon as possible. However, to minimize network traffic and database update cost, system administrators temp to reduce the visit as much as possible. To the best of our knowledge, no research has have been directed to finding a scientific approach to solve the dilemma. In the paper, we propose to visit web pages according to past update patterns. That is, a page should be visited as soon as it is expected to be changed, but should not be visited in any other time. To discover the update patterns, we propose to use sequential association rules of data mining methodology. Association rules can find patterns implicitly associated with update temporal patterns. In the paper, each web page will be associated with a sequence of binary digits denoting whether the page is updated in last agent fetching slot. We designed an algorithm to mine patterns from the sequence of binary digits. The patterns will be composed of large item sequences and related association rules. The rule states under some preconditions, the web page will be changed in next time slot. If a precondition matches current situation then an agent will be sent to fetch the page. Besides computing patterns for existing pages, the system will also update its database dynamically to consider the factors of newly inserted pages and deleted pages. |