• Corpus ID: 927435

Robust Random Cut Forest Based Anomaly Detection on Streams

@inproceedings{Guha2016RobustRC,
  title={Robust Random Cut Forest Based Anomaly Detection on Streams},
  author={Sudipto Guha and Nina Mishra and Gourav Roy and Okke Schrijvers},
  booktitle={ICML},
  year={2016}
}
In this paper we focus on the anomaly detection problem for dynamic data streams through the lens of random cut forests. [] Key Method We show how the sketch can be efficiently updated in a dynamic data stream. We demonstrate the viability of the algorithm on publicly available real data.

Figures and Tables from this paper

PIDForest: Anomaly Detection via Partial Identification
TLDR
PIDForest is presented: a random forest based algorithm that finds anomalies based on the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values, and provides a succinct explanation for why a point is labelled anomalous.
Anomaly Detection Forest
TLDR
A new anomaly detection algorithm, the Anomaly Detection Forest, optimized for the one-class learning problem, an ensemble of binary trees, where each tree is trained on a random subset and where the location of empty leaves define the anomaly score attributed to a data point.
Flow-based anomaly detection
TLDR
The combination of flow models and Bernstein quantile estimator allows OneFlow to find a parametric form of bounding region, which can be useful in various applications including describing shapes from 3D point clouds.
Isolation forests: looking beyond tree depth
TLDR
Experiments here show that using information about the size of the feature space taken and the number of points assigned to it can result in improved results in many situations without any modification to the tree structure, especially in the presence of categorical features.
Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams
TLDR
This work contextualises these methods in a probabilistic framework which it is called the Mondrian \Polya{} Forest for estimating the underlying probability density function generating the data and enabling greater interpretability than prior work.
A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]
TLDR
This paper categorizes existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques and presents an interesting application example, i.e., forest re risk prediction, and concludes the paper with future research directions for researchers and industry.
Improve black-box sequential anomaly detector relevancy with limited user feedback
TLDR
Inspired by a fact that anomalies are of different types, the approach identifies these types and utilizes user feedback to assign relevancy to types and yields significant improvements on precision and recall over a range of anomaly detectors.
PIDForest: Anomaly Detection and Certification via Partial Identification
TLDR
PIDForest is presented, a random forest based algorithm that finds anomalies based on a definition that captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values.
rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
TLDR
Outlier monitoring can be used to identify malfunctioning industrial equipment, flag quality assurance problems, and alert supervisors to hazardous conditions in industrial and infrastructure control systems.
Online Anomaly Detection Leveraging Stream-Based Clustering and Real-Time Telemetry
TLDR
This work implements an anomaly detection engine that leverages DenStream, an unsupervised clustering technique, and applies it to features collected from a large-scale testbed comprising tens of routers traversed up to 3Terabit/s worth of real application traffic, and results testify that DenStream achieves detection results on par with RRCF, the best performing algorithm.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Fast Anomaly Detection for Streaming Data
This paper introduces Streaming Half-Space-Trees (HS-Trees), a fast one-class anomaly detector for evolving data streams. It requires only normal data for training and works well when anomalous data
Streaming Anomaly Detection Using Randomized Matrix Sketching
TLDR
A novel (unsupervised) anomaly detection framework which can be used to detect anomalies in a streaming fashion by making only one pass over the data while utilizing limited storage, and theoretically proves that the algorithm compares favorably with an offline approach based on expensive global singular value decomposition (SVD) updates.
Detecting Change in Data Streams
Systematic construction of anomaly detection benchmarks from real data
TLDR
A methodology for transforming existing classification data sets into ground-truthed benchmark data sets for anomaly detection, which produces data sets that vary along three important dimensions: point difficulty, relative frequency of anomalies, and clusteredness.
Isolation-Based Anomaly Detection
TLDR
This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.
A Geometric Framework for Unsupervised Anomaly Detection
TLDR
A new geometric framework for unsupervised anomaly detection is presented, which are algorithms that are designed to process unlabeled data to detect anomalies in sparse regions of the feature space.
Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark
TLDR
The Numenta Anomaly Benchmark (NAB) is proposed, which attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data.
Clustering Data Streams: Theory and Practice
TLDR
This work describes a streaming algorithm that effectively clusters large data streams and provides empirical evidence of the algorithm's performance on synthetic and real data streams.
An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams
TLDR
This paper uses relative entropy, also called the Kullback-Leibler distance, to measure the difference between two given distributions, which generalizes Kulldorff’s spatial scan statistic, allowing us to quantitatively identify specific regions in space where large changes have occurred.
FindOut: Finding Outliers in Very Large Datasets
TLDR
A novel deviation (or outlier) detection approach, termed FindOut, based on wavelet transform is introduced, which can successfully identify outliers from large datasets.
...
1
2
3
4
...