首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:A Machine Learning Algorithm in Automated Text Categorization of Legacy Archives
  • 本地全文:下载
  • 作者:Dali Wang ; Ying Bai ; David Hamblin
  • 期刊名称:Computer Science & Information Technology
  • 电子版ISSN:2231-5403
  • 出版年度:2019
  • 卷号:9
  • 期号:7
  • 页码:1-8
  • DOI:10.5121/csit.2019.90701
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:The goal of this research is to develop an algorithm to automatically retrieve critical information from raw data files in NASA’s airborne measurement data archive. The product has to meet specific metrics in term of accuracy, robustness and usability, as the initial decision-tree based development has shown limited applicability due to its resource intensive characteristics. We have developed an innovative solution that is much less resource intensive while offering comparable performance. As with many practical applications, the data available are noisy and correlated; and there is a wide range of features that are associated with the information to be retrieved. The proposed algorithm uses a decision tree to select features and determine their weights. A weighted Naive Bayes is used due to the presence of highly correlated inputs. The development has been successfully deployed in an industrial scale, and the results show that the development is well-balanced in term of performance and resource requirements.
  • 关键词:Machine Learning; Classification; Naïve Bayes; Decision Tree
国家哲学社会科学文献中心版权所有