首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:Unsupervised Web Data Extraction Using Trinary Trees
  • 本地全文:下载
  • 作者:N. M. Sawant ; V. V. Pottigar ; P. B. Lamkane
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2016
  • 卷号:7
  • 期号:4
  • 页码:2068-2070
  • 出版社:TechScience Publications
  • 摘要:Internet present a huge collection of usefulinformation so proposed technique which work oninformation extraction from web document has becomeresearch area. Data extraction is the act of process ofretrieving data of data sources for further dataprocessing or data migration. The proposed techniquework on two or more web documents generated by thesame server-side template and learns a regularexpression that models it and can later be used toextract data from similar documents. The techniqueintroduced some shared pattern that do provide anyrelevant data. The proposed technique will be comparedwith others in literature as large collection of webdocument.
  • 关键词:Web Data Extraction; Automatic wrapper;generation; Web Crawler; Unsupervised learning
国家哲学社会科学文献中心版权所有