首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:News Content Extraction from Web Content using PCA Classifier
  • 本地全文:下载
  • 作者:Neha.M ; Ancy Thomas
  • 期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
  • 印刷版ISSN:2320-9798
  • 电子版ISSN:2320-9801
  • 出版年度:2016
  • 卷号:4
  • 期号:4
  • 页码:8085
  • DOI:10.15680/IJIRCCE.2016.0404319
  • 出版社:S&S Publications
  • 摘要:Web content extraction is a key technology for enabling an array of applications aimed at understanding the web. This project aims to extract less structured web content, like news articles, that appear only once in noisy WebPages. Our approach classifies text blocks by initially removing noise, then segmenting visual and text units by extracting features and PCA - based feature transformation for classification
  • 关键词:WebPages; Visual Unit; Text Unit; Extracting Features; PCA
国家哲学社会科学文献中心版权所有