Subspace Methods for Anomaly Detection in High Dimensional As- tronomical Databases


RÉSUMÉ) Modern astronomical surveys, in particular cross-matched databases from virtual observatories, are very large datasets (hundred of thousands to millions and even billions of objects), which are highdimensional (from a dozen variables up to a few hundred) and which often contain large numbers of missing values (due to sources emitting light at different wavelengths and faint sources not being detected in all filter passbands). The objects most interesting for astronomers are typically very rare, very faint and have one or several features that set them apart from the other sources in the survey. Indeed common stars and galaxies are fairly well-understood and it are objects right at the detection limits of the different surveys or objects that have peculiar astrophysical properties which drive much of the astrophysical research. Therefore anomaly detection tools are vital for finding such potential interesting sources. However the size of the datasets involved, the high dimensionality and above all the large numbers of missing values present severe challenges to existing anomaly detection methods. We propose a novel approach which works by computing, for each object, anomaly scores in lower dimensional subspaces and then combining these scores to a unique score for each source. Working in subspaces allows us to work around the curse of dimensionality and deal very intuitively with missing values. As a result our method allows direct comparisons of sources, even if they have been observed in quite different sets of variables. We will discuss several ways of combining anomaly scores and look at various properties of our approach. The proposed approach is very flexible and can be used with most anomaly score computation methods. Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session STS004) p.2040

5 Figures and Tables

Cite this paper

@inproceedings{Henrion2011SubspaceMF, title={Subspace Methods for Anomaly Detection in High Dimensional As- tronomical Databases}, author={Marc Henrion and Daniel J. Mortlock and David J and Axel Gandy}, year={2011} }