• Corpus ID: 24842849

Arrays of (locality-sensitive) Count Estimators (ACE): High-Speed Anomaly Detection via Cache Lookups

  title={Arrays of (locality-sensitive) Count Estimators (ACE): High-Speed Anomaly Detection via Cache Lookups},
  author={Chen Luo and Anshumali Shrivastava},
Anomaly detection is one of the frequent and important subroutines deployed in large-scale data processing systems. Even being a well-studied topic, existing techniques for unsupervised anomaly detection require storing significant amounts of data, which is prohibitive from memory and latency perspective. In the big-data world existing methods fail to address the new set of memory and latency constraints. In this paper, we propose ACE (Arrays of (locality-sensitive) Count Estimators) algorithm… 

Figures and Tables from this paper

Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

FLASH is a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms, by leveraging a LSH style randomized indexing procedure and combining it with several principled techniques.

Unique Entity Estimation with Application to the Syrian Conflict

This work proposes an efficient (near-linear time) estimation algorithm based on locality sensitive hashing that provides an accurate estimate of the documented, identifiable deaths in the ongoing Syrian conflict and empirically shows its superiority over the state-of-the-art estimators on three real applications.

STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge

In an exhaustive experimental comparison for linear regression models on real-world datasets, it is found that STORM allows accurate regression models to be trained and can estimate a carefully chosen surrogate loss for the least-squares objective.

Scaling-up Split-Merge MCMC with Locality Sensitive Sampling (LSS)

A novel class of split-merge proposals which are significantly more informative than random sampling but at the same time efficient to compute is designed, which is around 6X faster than the state-of-the-art sampling methods on two large real datasets KDDCUP and PubMed.

Lsh-Sampling breaks the Computational chicken-and-egg Loop in adaptive stochastic Gradient estimation

This paper provides the first demonstration of a scheme, Locality sensitive hashing sampled Stochastic Gradient Descent (LGD), which leads to superior gradient estimation while keeping the sampling cost per iteration similar to that of the uniform sampling.

ROSE: Robust Caches for Amazon Product Search

ROSE is introduced, a RObuSt cachE, a system that is tolerant to misspellings and typos while retaining the look-up cost of traditional caches, and is deployed in the Amazon Search Engine and produced a significant improvement over the existing solutions across several key business metrics.

Deep Learning and Its Application to LHC Physics

The connections between machine learning and high-energy physics data analysis are explored, followed by an introduction to the core concepts of neural networks, examples of the key results demonstrating the power of deep learning for analysis of LHC data, and discussion of future prospects and concerns.



Benchmarking Algorithms for Detecting Anomalies in Large Datasets

This research benchmarks the following algorithms based on their anomaly detection capabilities and their poly-logarithmic time-space complexity: Isolation Forest, Random Forest, ORCA, Artificial Neural Networks and C4.5.

Fast locality-sensitive hashing

A new and simple method to speed up the widely-used Euclidean realization of LSH by the use of randomized Hadamard transforms in a non-linear setting and shows that using the new LSH in nearest-neighbor applications can improve their running times by significant amounts.

A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data

A novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data and introduces a theoretical analysis of the quality of approximation to guarantee the reliability of the estimation algorithm.

Toward Supervised Anomaly Detection

It is argued that semi-supervised anomaly detection needs to ground on the unsupervised learning paradigm and devise a novel algorithm that meets this requirement and it is shown that the optimization problem has a convex equivalent under relatively mild assumptions.

Improved Densification of One Permutation Hashing

A new densification procedure is provided which is provably better than the existing scheme and has the same cost of $O(d + KL)$ for query processing, thereby making it strictly preferable over the existing procedure.

A New Unbiased and Efficient Class of LSH-Based Samplers and Estimators for Partition Function Computation in Log-Linear Models

This paper proposes a new sampling scheme and an unbiased estimator that estimates the partition function accurately in sub-linear time and demonstrates the effectiveness of the proposed approach against other state-of-the-art estimation techniques including IS and the efficient variant of Gumbel-Max sampling.

Simple and Efficient Weighted Minwise Hashing

This work proposes a simple rejection type sampling scheme based on a carefully designed red-green map, where the number of rejected sample has exactly the same distribution as weighted minwise sampling, and hopes that it will replace existing implementations in practice.

Anomaly detection: A survey

This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

The heart of the proposed hash function is a "rotation" scheme which densifies the sparse sketches of one permutation hashing in an unbiased fashion thereby maintaining the LSH property, which makes the obtained sketches suitable for hash table construction.

Anomaly detection by combining decision trees and parametric densities

The proposed method combines the advantages of classification trees with the benefit of a more accurate representation of the outliers, which yields to more precise decision boundaries and a deterministic classification result.