首页    期刊浏览 2024年12月13日 星期五
登录注册

文章基本信息

  • 标题:Incorporating Distributional Features on phrases in Text Categorization
  • 本地全文:下载
  • 作者:Sravan Yadav Eadala ; Dr. M Janaki Meena
  • 期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
  • 印刷版ISSN:2277-6451
  • 电子版ISSN:2277-128X
  • 出版年度:2013
  • 卷号:3
  • 期号:1
  • 出版社:S.S. Mishra
  • 摘要:Text mining is the process of discovering new, previously unknown information, from a usually large amount of different unstructured textual resources. Text Categorization is the task of assigning predefined categories to natural language text. This process of Text Categorization comes in preprocessing stage of Text Mining process. Feature can be a unit or weight assigned to represent a document. Feature Selection is a technique of selecting subset of features that best derives to characterize a document. Features for Text Categorization could be done with words, phrases or sentences that occur in training documents. Using bag of words, abundant information cannot be represented fully, since features selected may be redundant and irrelevant. By considering statistical methods, better features could be selected, that are dependent to a category. Moreover, position of the appearances of features plays a vital role in selecting good features. So, the distributional features, which include compactness of the appearances and position of the first appearance, had been incorporated on statistical methods. In this paper, performance had been evaluated by incorporating distributional features on statistical methods and compared with other feature selection techniques, for both words as well as phrases
  • 关键词:Distributional features; Text Categorization; Data Mining; Text Mining and Statistical Methods
国家哲学社会科学文献中心版权所有