期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2022
卷号:119
期号:4
DOI:10.1073/pnas.2108097119
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:Significance
We revisit the problem of ensuring statistically valid inferences across diverse target populations from a single source of training data. Our approach builds a surprising technical connection between the inference problem and a technique developed for algorithmic fairness, called “multicalibration.” We derive a correspondence between the fairness goal, to protect subpopulations from miscalibrated predictions, and the statistical goal, to ensure unbiased estimates on target populations. We derive a single-source estimator that provides inferences in any downstream target population, whose performance is comparable to the popular target-specific approach of propensity score reweighting. Our approach can extend the benefits of evidence-based decision-making to communities that do not have the resources to collect high-quality data on their own.
The gold-standard approaches for gleaning statistically valid conclusions from data involve random sampling from the population. Collecting properly randomized data, however, can be challenging, so modern statistical methods, including propensity score reweighting, aim to enable valid inferences when random sampling is not feasible. We put forth an approach for making inferences based on available data from a source population that may differ in composition in unknown ways from an eventual target population. Whereas propensity scoring requires a separate estimation procedure for each different target population, we show how to build a single estimator, based on source data alone, that allows for efficient and accurate estimates on any downstream target data. We demonstrate, theoretically and empirically, that our target-independent approach to inference, which we dub “universal adaptability,” is competitive with target-specific approaches that rely on propensity scoring. Our approach builds on a surprising connection between the problem of inferences in unspecified target populations and the multicalibration problem, studied in the burgeoning field of algorithmic fairness. We show how the multicalibration framework can be employed to yield valid inferences from a single source population across a diverse set of target populations.