文章基本信息

标题：Semi-Supervised Learning for Quantitative Structure-Activity Modeling
本地全文：下载
作者：Jurica Levatić ; Sašo Džeroski ; Fran Supek 等
期刊名称：Informatica
印刷版ISSN：1514-8327
电子版ISSN：1854-3871
出版年度：2013
卷号：37
期号：2
出版社：The Slovene Society Informatika, Ljubljana
摘要：In this study, we compare the performance of semi-supervised and supervised machine learning methods applied to various problems of modeling Quantitative Structure Activity Relationship (QSAR) in sets of chemical compounds. Semi-supervised learning utilizes unlabeled data in addition to labeled data with the goal of building better predictive models than can be learned by using labeled data alone. Typically, labeled QSAR datasets contain tens to hundreds of compounds, while unlabeled data are easily accessible via public databases containing thousands of chemical compounds: this makes QSAR modeling an attractive domain for the application of semi-supervised learning. We tested four different semi-supervised learning algorithms on three different datasets and compared them to five commonly used supervised learning algorithms. While adding unlabeled data does help for certain pairings of dataset and method, semi-supervised learning is not clearly superior to supervised learning across the QSAR classification problems addressed by this study.
关键词：semi-supervised learning; supervised learning; QSAR; drug design; machine learning