期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2014
卷号:5
期号:4
页码:25-27
语种:English
出版社:Ayushmaan Technologies
摘要:The objective of FoCUS is to merely rummage appropriate forum content from the web with nominal overhead. Forum threads comprise information content that is the objective of forum crawlers. While forums have altered arrangements or styles and are power-driven by various forum software packages, they continuously have related implicit navigation lanes associated by precise URL types to lead users from entry pages to thread pages. Robust page type classifiers can be accomplished from as limited as 5 noted forums and applied to an enormous set of unseen forums.
关键词:EIT Path;Forum Crawling;ITF Regex;Page Classification;Page Type;URL Pattern Learning;URL Type