首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:Efficient Content Extraction Using Hybrid Technique
  • 本地全文:下载
  • 作者:G. Naveen Sundar ; Sheba Gaikwad
  • 期刊名称:International Journal of Computer Science & Technology
  • 印刷版ISSN:2229-4333
  • 电子版ISSN:0976-8491
  • 出版年度:2012
  • 卷号:3
  • 期号:4
  • 页码:372-375
  • 语种:English
  • 出版社:Ayushmaan Technologies
  • 摘要:Content extraction is the process of identifying the main content or removing the additional contents. The main problem in extracting the content from the web page is the newer architecture of web pages and the diversity in the structure of web pages. Many content extraction strategies are based on DOM tree representation, feature extraction or tag ratios of HTML web page and estimating useful content from it. This paper describes a comparative study on various content extraction algorithms.
  • 关键词:Data Mining;Information Extraction;Content Extraction;HTML; Open Source Intelligence;Information Filtering
国家哲学社会科学文献中心版权所有