首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:Extracting Web Data Based On Partial Tree Alignment Using Fivatech
  • 本地全文:下载
  • 作者:D.Pramod Krishna ; T.Swarna Latha ; T.Rajasekhar Reddy
  • 期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
  • 印刷版ISSN:2277-6451
  • 电子版ISSN:2277-128X
  • 出版年度:2012
  • 卷号:2
  • 期号:3
  • 出版社:S.S. Mishra
  • 摘要:In this paper studies the problem of extracting structured data from Web pages. The objective of the proposed research is to automatically extract data items/fields from records, and store the extracted data in a database. We formally define a template, and propo se a model that describes how values are encoded into pages using a template. For this purpose a new method to perform the task automatically. It consists of two steps, (1) automatically identify such data records in a page, and (2) automatically align and extract data items from the data records. In this paper we are using a partial tree alignment as a DOM tree in fivatech framework. Based on above two steps an unsupervised, page level data extraction approach is used to deduce schema and Template for each individual Deep Web site.
  • 关键词:Data Record Extraction; Partial Tree Alignment; Wrapper; Web Data Extraction
国家哲学社会科学文献中心版权所有