Annotated Minimum Volume Sets for Nonparametric Anomaly Discovery

Abstract

We consider an anomaly detection problem, wherein a combination of typical and anomalous data are observed and it is necessary to identify the anomalies in this particular dataset without recourse to labeled exemplars. We take as our goal to produce an annotated ranking of the observations, indicating the relative priority for each to be examined further as a possible anomaly, while making no assumptions on the distribution of typical data. We propose a framework in which each observation is linked to a corresponding minimum volume set and, implicitly adopting a hypothesis testing perspective, each set is associated with a test. An inherent ordering of these sets yields a natural ranking, while the association of each test with a false discovery rate yields an appropriate annotation. The combination of minimum volume set methods with false discovery rate principles, in the context of data contaminated by anomalies, is new and estimation of the key underlying quantities requires that a number of issues be addressed. We offer some solutions to the relevant estimation problems, and illustrate the proposed methodology on synthetic and computer network traffic data.

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@article{Scott2007AnnotatedMV, title={Annotated Minimum Volume Sets for Nonparametric Anomaly Discovery}, author={Clayton D. Scott and Eric D. Kolaczyk}, journal={2007 IEEE/SP 14th Workshop on Statistical Signal Processing}, year={2007}, pages={234-238} }