首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:FEATURES EXTRACTION ALGORITHM FROM SGML FOR CLASSIFICATION
  • 本地全文:下载
  • 作者:Zailani Abdullah ; Muhammad Suzuri Hitam
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2007
  • 卷号:3
  • 期号:2
  • 页码:72-78
  • 出版社:Journal of Theoretical and Applied
  • 摘要:The basic phases in text categorization include preprocessing features, extracting relevant features against the features in a database, and finally categorizing a set of documents into predefined categories. Most of the researches in text categorization are focusing more on the development of algorithms and computer techniques. An algorithm for pre-processing features is seem to be like a "black-box" and ignored by them. Thus, it is significant and worthwhile to develop an algorithm for preprocessing features and finally can be used by other beginners before going in depth in the field of text categorization. This research proposes an algorithm for preprocessing features with capability of Microsoft .NET framework technology. The actual implementation shows that, this algorithm can extract interested features from the standard corpus of collection and upload into a relational database.
  • 关键词:Preprocessing ; text categorization ; algorithm ; .net
国家哲学社会科学文献中心版权所有