首页    期刊浏览 2024年12月11日 星期三
登录注册

文章基本信息

  • 标题:Focused Crawler Research for Business Intelligence Acquisition
  • 本地全文:下载
  • 作者:Peng Xin ; Qin Qiuli
  • 期刊名称:International Journal of Hybrid Information Technology
  • 印刷版ISSN:1738-9968
  • 出版年度:2013
  • 卷号:6
  • 期号:6
  • 出版社:SERSC
  • 摘要:The internet has become indispensable part of people's life. For enterprises, there are mass of valuable information in the internet. It not only includes competitor information, but also includes customer's evaluation of products. These information is an important source of business intelligence. This paper aims to build a focused crawler to filter business intelligence from vast amounts of information in the internet. The crawler takes a certain number of web pages as seed. Then extract URLs in these pages, and parse main text of every URL. After that, the crawler calculates relevancy between every main text and the crawler's topic based on VSM (vector space model) and TF-IDF (Term Frequency-Inverse Document Frequency). If a web page is relevant, it will be saved; otherwise, it will be discarded. At last, an experiment is done to test the performance of crawler. It can be seen that the recall rate and accuracy of the crawler is very high though the result of this experiment
国家哲学社会科学文献中心版权所有