文章基本信息

标题：Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
本地全文：下载
作者：Kim, Sun ; Kim, Won ; Wei, Chih-Hsuan 等
期刊名称：Database
印刷版ISSN：1758-0463
电子版ISSN：1758-0463
出版年度：2012
卷号：2012
DOI：10.1093/database/bas042
出版社：Oxford University Press
摘要：The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein–protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team.