Corpus ID: 927435

Robust Random Cut Forest Based Anomaly Detection on Streams

@inproceedings{Guha2016RobustRC,
  title={Robust Random Cut Forest Based Anomaly Detection on Streams},
  author={Sudipto Guha and Nina Mishra and Gourav Roy and Okke Schrijvers},
  booktitle={ICML},
  year={2016}
}
In this paper we focus on the anomaly detection problem for dynamic data streams through the lens of random cut forests. [...] Key Method We show how the sketch can be efficiently updated in a dynamic data stream. We demonstrate the viability of the algorithm on publicly available real data.Expand
PIDForest: Anomaly Detection via Partial Identification
TLDR
PIDForest is presented: a random forest based algorithm that finds anomalies based on the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values, and provides a succinct explanation for why a point is labelled anomalous. Expand
Anomaly Detection Forest
TLDR
A new anomaly detection algorithm, the Anomaly Detection Forest, optimized for the one-class learning problem, an ensemble of binary trees, where each tree is trained on a random subset and where the location of empty leaves define the anomaly score attributed to a data point. Expand
Flow-based anomaly detection
TLDR
The combination of flow models and Bernstein quantile estimator allows OneFlow to find a parametric form of bounding region, which can be useful in various applications including describing shapes from 3D point clouds. Expand
Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams
TLDR
This work contextualises these methods in a probabilistic framework which it is called the Mondrian \Polya{} Forest for estimating the underlying probability density function generating the data and enabling greater interpretability than prior work. Expand
A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]
TLDR
This paper categorizes existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques and presents an interesting application example, i.e., forest re risk prediction, and concludes the paper with future research directions for researchers and industry. Expand
Improve black-box sequential anomaly detector relevancy with limited user feedback
TLDR
Inspired by a fact that anomalies are of different types, the approach identifies these types and utilizes user feedback to assign relevancy to types and yields significant improvements on precision and recall over a range of anomaly detectors. Expand
PIDForest: Anomaly Detection and Certification via Partial Identification
TLDR
PIDForest is presented, a random forest based algorithm that finds anomalies based on a definition that captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Expand
rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
TLDR
Outlier monitoring can be used to identify malfunctioning industrial equipment, flag quality assurance problems, and alert supervisors to hazardous conditions in industrial and infrastructure control systems. Expand
Online Anomaly Detection Leveraging Stream-Based Clustering and Real-Time Telemetry
TLDR
This work implements an anomaly detection engine that leverages DenStream, an unsupervised clustering technique, and applies it to features collected from a large-scale testbed comprising tens of routers traversed up to 3Terabit/s worth of real application traffic, and results testify that DenStream achieves detection results on par with RRCF, the best performing algorithm. Expand
SDOstream: Low-Density Models for Streaming Outlier Detection
TLDR
SDOstream is a distance-based outlier detection algorithm for stream data that uses low-density models, therefore operating in linear time and avoiding the limitations of sliding windows and instance-based methods. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Fast Anomaly Detection for Streaming Data
This paper introduces Streaming Half-Space-Trees (HS-Trees), a fast one-class anomaly detector for evolving data streams. It requires only normal data for training and works well when anomalous dataExpand
Streaming Anomaly Detection Using Randomized Matrix Sketching
TLDR
A novel (unsupervised) anomaly detection framework which can be used to detect anomalies in a streaming fashion by making only one pass over the data while utilizing limited storage, and theoretically proves that the algorithm compares favorably with an offline approach based on expensive global singular value decomposition (SVD) updates. Expand
Detecting Change in Data Streams
TLDR
A novel method for the detection and estimation of change that assumes that the points in the stream are independently generated, but otherwise makes no assumptions on the nature of the generating distribution. Expand
Systematic construction of anomaly detection benchmarks from real data
TLDR
A methodology for transforming existing classification data sets into ground-truthed benchmark data sets for anomaly detection, which produces data sets that vary along three important dimensions: point difficulty, relative frequency of anomalies, and clusteredness. Expand
Isolation-Based Anomaly Detection
TLDR
This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods. Expand
A Geometric Framework for Unsupervised Anomaly Detection
TLDR
A new geometric framework for unsupervised anomaly detection is presented, which are algorithms that are designed to process unlabeled data to detect anomalies in sparse regions of the feature space. Expand
Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark
TLDR
The Numenta Anomaly Benchmark (NAB) is proposed, which attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. Expand
Clustering Data Streams: Theory and Practice
TLDR
This work describes a streaming algorithm that effectively clusters large data streams and provides empirical evidence of the algorithm's performance on synthetic and real data streams. Expand
An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams
An important problem in processing large data streams is detecting changes in the underlying distribution that generates the data. The challenge in designing change detection schemes is making themExpand
FindOut: Finding Outliers in Very Large Datasets
TLDR
A novel deviation (or outlier) detection approach, termed FindOut, based on wavelet transform is introduced, which can successfully identify outliers from large datasets. Expand
...
1
2
3
4
...