摘要:Objective To assess the use of medical claims records for surveillance and epidemiological inference through a case study that examines how ecological and social determinants and measurement error contribute to spatial heterogeneity in reports of influenza-like illness across the United States. Introduction Traditional infectious disease epidemiology is built on the foundation of high quality and high accuracy data on disease and behavior. Digital infectious disease epidemiology, on the other hand, uses existing digital traces, re-purposing them to identify patterns in health-related processes. Medical claims are an emerging digital data source in surveillance; they capture patient-level data across an entire population of healthcare seekers, and have the benefits of medical accuracy through physician diagnoses, and fine spatial and temporal resolution in near real-time. Our work harnesses the large volume and high specificity of diagnosis codes in medical claims to improve our understanding of the mechanisms driving spatial variation in reported influenza activity each year. The mechanisms hypothesized to drive these patterns are as varied as: environmental factors affecting transmission or virus survival, travel flows between different populations, population age structure, and socioeconomic factors linked to healthcare access and quality of life. Beyond process mechanisms, the nature of surveillance data collection may affect our interpretation of spatial epidemiological patterns [1], particularly since influenza is a non-reportable disease with non-specific symptoms ranging from asymptomatic to severe. Considering the ways in which medical claims are generated, biases may arise from healthcare-seeking behavior, insurance coverage, and medical claims database coverage in study populations. Methods Using aggregated U.S. medical claims for influenza-like illness (ILI) from the 2001-2002 through 2008-2009 flu seasons [2], we developed a Bayesian hierarchical modeling framework to estimate the importance of both ecological and social determinants and measurement-related factors on observed county-level variation of influenza disease burden across the United States. Integrated Nested Laplace Approximation (INLA) techniques for Bayesian inference were used to render our questions computationally tractable due to the high spatial resolution of our data (Figure 1) and the multiplicity of models in our analysis [3]. Linking data from a variety of publicly available sources, we determined the strength, directionality, and consistency of these factors over multiple flu seasons. Results We found that measurement-related factors – healthcare-seeking behavior, insurance coverage, and medical claims database coverage – were strong predictors of greater ILI intensity across seasons. Secondarily, poverty and specific humidity were negatively associated with ILI intensity for several seasons. Finally, by incorporating mechanistic and measurement factors into our model, our model predictions present an improved map of influenza-like illness in the United States for the flu seasons in our study period. Conclusions We present a flexible modeling approach that applies to different medical claims diagnosis codes and disease surveillance data and demonstrates the utility of Bayesian hierarchical models for large- scale ecological analyses. Our results increase our knowledge of the spatial distribution of influenza and the underlying processes that drive these patterns, promote finer spatial targeting for different types of interventions, and enable the interpolation of burden in areas difficult to surveil through traditional public health. Moreover, they highlight the relative contributions of surveillance data collection and ecological processes to spatial variation in disease, and highlight the importance of considering measurement biases when using surveillance data for epidemiological inference.