首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications
  • 本地全文:下载
  • 作者:Chia-Hsiu Chen ; Kenichi Tanaka ; Masaaki Kotera
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2020
  • 卷号:12
  • 期号:1
  • 页码:1-16
  • DOI:10.1186/s13321-020-0417-9
  • 出版社:BioMed Central
  • 摘要:Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine’s inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.
  • 关键词:QSPR ; Quantitative structure–property ; Fluorescence ; Liquid crystal ; Ensemble learning ; Blending ; Decision tree ; Random forest ; Extremely randomized trees
国家哲学社会科学文献中心版权所有