Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities


Commercial health plans need member racial/ethnic information to address disparities, but often lack it. We incorporate the U.S. Census Bureau’s latest surname list into a previous Bayesian method that integrates surname and geocoded information to better impute self-reported race/ethnicity. We validate this approach with data from 1,921,133 enrollees of a national health plan. Overall, the new approach correlated highly with self-reported race-ethnicity (0.76), which is 19% more efficient than its predecessor (and 41% and 108% more efficient than single-source surname and address methods, respectively, P < 0.05 for all). The new approach has an overall concordance statistic (area under the Receiver Operating Curve or ROC) of 0.93. The largest improvements were in areas where prior performance was weakest (for Blacks and Asians). The new Census surname list accounts for about three-fourths of the variance explained in the new estimates. Imputing Native American and multiracial identities from surname and residence remains challenging.

DOI: 10.1007/s10742-009-0047-1

6 Figures and Tables

Citations per Year

400 Citations

Semantic Scholar estimates that this publication has 400 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Elliott2009UsingTC, title={Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities}, author={Marc N. Elliott and Peter A. Morrison and Allen M Fremont and Daniel F. McCaffrey and Philip Pantoja and Nicole Lurie}, journal={Health Services and Outcomes Research Methodology}, year={2009}, volume={9}, pages={69-83} }