- Published 2003 in DG.O

Statistical models fit to data often require extensive and challenging re-estimation before achieving final form. For example, outliers can adversely affect fits. In other cases involving spatial data, a cluster may exist for which the model is incorrect, also adversely affecting the fit to the “good” data. In both cases, estimate residuals must be checked and rechecked until the data are cleaned and the appropriate model found. In this article, we demonstrate an algorithm that fits models to the largest subset of the data that is appropriate. Specifically, if a hypothesized linear regression model fits ninety percent of the data, our algorithm can not only find an excellent fit as if only that “good” data were presented, but will also highlight the ten percent of the “bad” data that is not fit. Our work in digital government has focused on mapping data. Thus we illustrate how models fit to census track data work, and how the data in the “bad” set can be viewed spatially through ArcView or other tools. This approach greatly simplifies the task of modeling spatial data, and makes us of advanced map visualization tools to understand the nature of subsets of the data for which the model is not appropriate.

@inproceedings{Scott2003FindingOI,
title={Finding Outliers in Models of Spatial Data},
author={David W. Scott and J. Blair Christian},
booktitle={DG.O},
year={2003}
}