首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Combining deterministic and probabilistic matching to reduce data linkage errors in hospital administrative data
  • 本地全文:下载
  • 作者:Gareth Hagger-Johnson ; Katie Harron ; Rob Aldridge
  • 期刊名称:International Journal of Population Data Science
  • 电子版ISSN:2399-4908
  • 出版年度:2017
  • 卷号:1
  • 期号:1
  • 页码:1-1
  • DOI:10.23889/ijpds.v1i1.316
  • 出版社:Swansea University
  • 摘要:ABSTRACTObjectivesData linkage algorithms are used to link together multiple episodes of care belonging to the same patient. For example, the HESID algorithm is used to generate Hospital Episode Statistics (HES) in England. HESID is a deterministic algorithm, requiring identifiers to agree or disagree at each step. Data linkage errors occur when episodes belonging to two patients are incorrectly linked (a false match) or when episodes belonging to the same patient are not linked (a missed match). This typically occurs because patient identifiers (e.g. NHS number, postcode) contain errors or have missing data. We previously showed that HESID has a low false match rate (0.2%) but a high missed match rate (4.1%) when applied to paediatric intensive care data. This biased the true readmission rate, particularly for some patient groups including ethnic minorities. The aim of our study was to evaluate whether an additional step involving probabilistic matching would lower the missed match rate in HES without increasing the false matched rate. ApproachWe simulated three datasets having the same characteristics as HES, for three age groups expected to have different levels of postcode stability (at age 0/1, 5/6 and 18/19). We compared the deterministic algorithm to a probabilistic algorithm, and then to a deterministic algorithm with an additional probabilistic step. In sensitivity analyses, we evaluated the algorithms under different data quality scenarios. ResultsResults show that deterministic followed by probabilistic matching is the best solution for reducing missed matches, particularly in scenarios where errors in patient identifiers are more common. ConclusionData linkage algorithms need to be evaluated against good quality reference standard data sets. For hospital data in England, the Personal Demographics Service (PDS) could be used to evaluate our approach, because it contains many of the same patient identifiers used in HES. Reducing data linkage error will improve monitoring of hospital activity in England.
国家哲学社会科学文献中心版权所有