首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Mining academic publications to automatically identify data sources
  • 本地全文:下载
  • 作者:Athanasios Anastasiou ; Karen Tingay
  • 期刊名称:International Journal of Population Data Science
  • 电子版ISSN:2399-4908
  • 出版年度:2018
  • 卷号:3
  • 期号:2
  • 页码:1-1
  • DOI:10.23889/ijpds.v3i2.532
  • 出版社:Swansea University
  • 摘要:BackgroundDiscovering suitable datasets is an important part of health research, particularly for projects working with cohort data, but with the proliferation of so many national and international initiatives, it is becoming increasingly difficult for research teams to locate real world datasets that are most relevant to their project objectives. MethodsTo assist researchers in this, we developed bibInsight, a data analysis platform to identify potentially useful data sources and more generally enable large scale research over bibliographical datasets. Data source names were identified from a broad, topic specific literature search. Context-specific terms like “annual”, “longitudinal”, and “prospective” were used to train a classifier that identified potential datasets. ResultsThe classifier was able to identify 1588 of 1961 abstracts as containing cohort-relevant information: a precision of approximately 80%. Further analysis such as topic analysis, geographical mapping, and collaboration networks can refine and prioritise the search results to determine the most relevant data source(s) for a research project. ConclusionsA very large amount of information, including data source description and use, remains unexploited in unstructured bibliographical datasets. Here, we used a thematic search to provide a more manageable starting point towards locating disease specific datasets.
国家哲学社会科学文献中心版权所有