摘要:Health disparities and solutions are heterogeneous within and among racial and ethnic groups, yet existing administrative databases lack the granularity to reflect important sociocultural distinctions. We measured the efficacy of a natural-language–processing algorithm to identify a specific immigrant group. The algorithm demonstrated accuracy and precision in identifying Somali patients from the electronic medical records at a single institution. This technology holds promise to identify and track immigrants and refugees in the United States in local health care settings. Characterizing and closing the gap of racial and ethnic health disparities is a national priority, 1 (p4) but disparities and solutions are heterogeneous for different groups. 2 For example, a specific health-related assessment and intervention may take very different forms when applied to a Somali American community than to an ancestral African American community. This example reveals an important limitation of health disparities research: existing regional and national databases lack the granularity to reflect this sociocultural heterogeneity. Therefore, assessment of disease prevalence and intervention impact is compromised by the labeling of both communities in our example as African American in existing databases. Adding this texture to administrative databases has been recommended, but implementation is costly and many years away. 3 Natural-language processing (NLP) holds the potential to bypass these limitations. NLP is an informatics discipline that allows computers to process and understand human languages. Application of NLP to the health care arena is an active area of research with escalating opportunity for impact in the context of a national mandate to expand electronic medical record (EMR) infrastructure. A recent demonstration project showed that NLP review of a health care system EMR outperformed administrative databases in documenting postoperative complications. 4 We tested the hypothesis that application of NLP to EMRs can identify a subset racial/ethnic group for the purposes of eventually documenting and tracking health disparities. Persons from Somalia compose the largest African refugee population in the United States, with a particular concentration in Minnesota. Furthermore, data support the existence of health care disparities among this population. 5,6 Therefore, we designed our NLP tool to identify this population.