首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:On the Application of Principal Component Analysis to Classification Problems
  • 本地全文:下载
  • 作者:Jianwei Zheng ; Cyril Rakovski
  • 期刊名称:Data Science Journal
  • 电子版ISSN:1683-1470
  • 出版年度:2021
  • 卷号:20
  • 期号:1
  • 页码:1-6
  • DOI:10.5334/dsj-2021-026
  • 语种:English
  • 出版社:Ubiquity Press
  • 摘要:Principal Component Analysis (PCA) is a commonly used technique that uses the correlation structure of the original variables to reduce the dimensionality of the data. This reduction is achieved by considering only the first few principal components for a subsequent analysis. The usual inclusion criterion is defined by the proportion of the total variance of the principal components exceeding a predetermined threshold. We show that in certain classification problems, even extremely high inclusion threshold can negatively impact the classification accuracy. The omission of small variance principal components can severely diminish the performance of the models. We noticed this phenomenon in classification analyses using high dimension ECG data where the most common classification methods lost between 1 and 6% of accuracy even when using 99% inclusion threshold. However, this issue can even occur in low dimension data with simple correlation structure as our numerical example shows. We conclude that the exclusion of any principal components should be carefully investigated.
  • 关键词:PCA;Dimensionality Reduction;Power;Classification Accuracy
国家哲学社会科学文献中心版权所有