首页    期刊浏览 2025年03月05日 星期三
登录注册

文章基本信息

  • 标题:PubMed Phrases, an open set of coherent phrases for searching biomedical literature
  • 本地全文:下载
  • 作者:Sun Kim ; Lana Yeganova ; Donald C. Comeau
  • 期刊名称:Scientific Data
  • 电子版ISSN:2052-4463
  • 出版年度:2018
  • 卷号:5
  • DOI:10.1038/sdata.2018.104
  • 语种:English
  • 出版社:Nature Publishing Group
  • 摘要:In biomedicine, key concepts are often expressed by multiple words (e.g., 鈥榸inc finger protein鈥?. Previous work has shown treating a sequence of words as a meaningful unit, where applicable, is not only important for human understanding but also beneficial for automatic information seeking. Here we present a collection of PubMed庐 Phrases that are beneficial for information retrieval and human comprehension. We define these phrases as coherent chunks that are logically connected. To collect the phrase set, we apply the hypergeometric test to detect segments of consecutive terms that are likely to appear together in PubMed. These text segments are then filtered using the BM25 ranking function to ensure that they are beneficial from an information retrieval perspective. Thus, we obtain a set of 705,915 PubMed Phrases. We evaluate the quality of the set by investigating PubMed user click data and manually annotating a sample of 500 randomly selected noun phrases. We also analyze and discuss the usage of these PubMed Phrases in literature search.
国家哲学社会科学文献中心版权所有