首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Web Database Sampling Based on Dependency of Keywords
  • 本地全文:下载
  • 作者:Zhang Rui ; Wang Feng ; Lin Peiguang
  • 期刊名称:The Open Cybernetics & Systemics Journal
  • 电子版ISSN:1874-110X
  • 出版年度:2015
  • 卷号:9
  • 期号:1
  • 页码:375-383
  • DOI:10.2174/1874110X01509010375
  • 出版社:Bentham Science Publishers Ltd
  • 摘要:

    The Information Era has witnessed a huge number of sources from websites. The abundance of useful data surrounding us has made it possible for integration systems to improve the quality of the integrated data. However, how to choose proper data sources efficiently to extract data with high coverage and low redundancy is still a hot topic in the area. Sampling the databases hiding behind the websites makes it possible to obtain the characteristics of the web databases, and further to choose appropriate sources when collecting data for integration and query optimization. In this paper we construct a sampling model to represent data characteristics of web databases based on posing keyword queries on the deep web query interface. The dependency of text attribute keywords within the data source is used to construct the dependent-relational probability matrix, which indicate the sample distribution and is used for keyword extension to fetch more sampling data and get new characteristics of the actual data. Further, we provide an efficiency method to evaluate the similarity between the sample databases and the real web databases. We evaluate the proposed method in real world dataset and the results show that our method can sample the web data sources with high similarity.

国家哲学社会科学文献中心版权所有