• Corpus ID: 220768553

A Mathematical Assessment of the Isolation Tree Method for Outliers Detection in Big Data

@article{Morales2020AMA,
  title={A Mathematical Assessment of the Isolation Tree Method for Outliers Detection in Big Data},
  author={Fernando A. Morales and Jorge M. Ram'irez and Edgar A. Ramos},
  journal={arXiv: Methodology},
  year={2020}
}
In this paper, the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection is presented. We show that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved using the Law of Large Numbers. A couple of counterexamples are presented to show that the original method is inconclusive and no quality certificate can be given, when using it as a means to detect… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 30 REFERENCES

Isolation-Based Anomaly Detection

TLDR
This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.

Outlier detection by active learning

TLDR
This paper presents a novel approach to outlier detection based on classification, which is superior to other methods based on the same reduction to classification, but using standard classification methods, and shows that it is competitive to the state-of-the-art outlier Detection methods in the literature.

Isolation Forest

TLDR
The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement.

Mining distance-based outliers in near linear time with randomization and a simple pruning rule

TLDR
This work shows that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used.

Algorithms for Mining Distance-Based Outliers in Large Datasets

TLDR
This paper provides formal and empirical evidence showing the usefulness of DB-outliers and presents two simple algorithms for computing such outliers, both having a complexity of O(k N’), k being the dimensionality and N being the number of objects in the dataset.

Fast Outlier Detection in High Dimensional Spaces

TLDR
A new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight, is proposed, which scales linearly both in the dimensionality and the size of the data set.

Discovering cluster-based local outliers

Unsupervised Learning With Random Forest Predictors

TLDR
The RF dissimilarity is useful for detecting tumor sample clusters on the basis of tumor marker expressions and can be described with simple thresholding rules in this application.

Pattern Recognition and Machine Learning (Information Science and Statistics)

Looking for competent reading resources? We have pattern recognition and machine learning information science and statistics to read, not only read, but also download them or even check out online.

Graph Theory and Its Applications

INTRODUCTION TO GRAPH MODELS Graphs and Digraphs Common Families of Graphs Graph Modeling Applications Walks and Distance Paths, Cycles, and Trees Vertex and Edge Attributes: More Applications