Time series anomaly discovery with grammar-based compression

@inproceedings{Senin2015TimeSA,
  title={Time series anomaly discovery with grammar-based compression},
  author={Pavel Senin and Jessica Lin and Xing Wang and Tim Oates and S. Gandhi and Arnold P. Boedihardjo and Crystal Chen and Susan Frankenstein},
  booktitle={EDBT},
  year={2015}
}
The problem of anomaly detection in time series has recently received much attention. [...] Key Method Our algorithm discretizes continuous time series values into symbolic form, infers a contextfree grammar, and exploits its hierarchical structure to effectively and efficiently discover algorithmic irregularities that we relate to anomalies. The approach taken is based on the general principle of Kolmogorov complexity where the randomness in a sequence is a function of its algorithmic incompressibility. Since…Expand
Ensemble Grammar Induction For Detecting Anomalies in Time Series
TLDR
It is demonstrated that the proposed ensemble approach can outperform existing grammar-induction-based approaches with different criteria for selection of parameter values and achieve performance similar to that of the state-of-the-art distance-based anomaly detection algorithm. Expand
GrammarViz 3.0
TLDR
GrammarViz 3.0 is presented—a software package that provides implementations of proposed algorithms and graphical user interface for interactive variable-length time series pattern discovery and introduces an experimental procedure for automated discretization parameter selection that builds upon the minimum cardinality maximum cover principle and aids the time series recurrent and anomalous pattern discovery. Expand
Unsupervised Subsequence Anomaly Detection in Large Sequences
Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches that have been proposed so far in the literature haveExpand
Unsupervised and scalable subsequence anomaly detection in large data series
TLDR
NormA is a novel approach, suitable for domain-agnostic anomaly detection, based on a new data series primitive, which permits to detect anomalies based on their (dis)similarity to a model that represents normal behavior. Expand
SAND: Streaming Subsequence Anomaly Detection
TLDR
This work proposes SAND, a novel online method suitable for domain-agnostic anomaly detection that outperforms by a large margin the current state-of-the-art algorithms in terms of accuracy while achieving orders of magnitude speedups. Expand
Exact variable-length anomaly detection algorithm for univariate and multivariate time series
TLDR
A multivariate anomaly detection algorithm which detects anomalies and identifies the dimensions and locations of the anomalous subsequences and can successfully detect the correct anomalies without requiring any prior knowledge about the data is introduced. Expand
SCHEDA: Lightweight euclidean-like heuristics for anomaly detection in periodic time series
TLDR
SCHEDA (Sampled Causal Heuristics for Euclidean Distance Approximation), a collection of three heuristics designed to approximate the euclidean anomaly score with a low computational footprint in time series with long-term dependencies, is proposed. Expand
Sequitur-based Inference and Analysis Framework for Malicious System Behavior
TLDR
This work presents a grammar inference system based on Sequitur, a greedy compression algorithm that constructs a context-free grammar (CFG) from string-based input data that enables the identification of relevant patterns in sequential corpora of arbitrary quantity and size. Expand
Anomaly detection and motif discovery in symbolic representations of time series
TLDR
This document proposes a benchmark of anomaly detection algorithms using data from Cloud monitoring software and presents a few algorithms using this representation for anomaly detection and motif discovery, also known as pattern mining, in such data. Expand
Semantic Discord: Finding Unusual Local Patterns for Time Series
TLDR
A new definition of semantic discord is introduced, which incorporates the context information from larger subsequences containing the anomaly candidates and demonstrates that the method significantly outperforms the state-of-the-art methods in locating anomalies by extensive experiments. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series
TLDR
GrammarViz 2.0 is presented, an interactive tool that implements algorithms for grammar-driven mining and visualization of variable length time series patterns1. Expand
Mining motifs in massive time series databases
TLDR
This paper carefully motivate, then introduces, a nontrivial definition of time series motifs, and proposes an efficient algorithm to discover them, and demonstrates the utility and efficiency of the approach on several real world datasets. Expand
Finding Motifs in Time Series
TLDR
An efficient motif discovery algorithm for time series would be useful as a tool for summarizing and visualizing massive time series databases and could be used as a subroutine in various other data mining tasks, including the discovery of association rules, clustering and classification. Expand
Assumption-Free Anomaly Detection in Time Series
TLDR
This demonstration will show an online anomaly detection system that does not need to be customized for individual domains, yet performs with exceptionally high precision/recall, based on the recently introduced idea of time series bitmaps. Expand
HOT SAX: efficiently finding the most unusual time series subsequence
TLDR
The utility of discords with objective experiments on domains as diverse as Space Shuttle telemetry monitoring, medicine, surveillance, and industry, and the effectiveness of the discord discovery algorithm with more than one million experiments, on 82 different datasets from diverse domains are demonstrated. Expand
TR 09-004 Detecting Anomalies in a Time Series Database
TLDR
This work evaluates a large number of semi-supervised anomaly detection techniques for time series data on a large variety of data sets obtained from a broad spectrum of application domains, and provides useful insights regarding the effectiveness of different techniques based on the experimental evaluation. Expand
Clustering by compression
TLDR
Evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors is reported. Expand
Finding Time Series Discords Based on Haar Transform
TLDR
An algorithm which can dynamically determine the word size for compression of subsequences is proposed, based on some properties of the Haar wavelet transformation. Expand
Linear-time, incremental hierarchy inference for compression
TLDR
It is proved that SEQUITUR operates in time linear in n, the length of the input sequence, despite its ability to build a hierarchy as deep as log(n), and it is shown that a sequence can be compressed incrementally, improving on the non-incremental algorithm that was described by Nevill-Manning et al., and making on-line compression feasible. Expand
Towards parameter-free data mining
TLDR
This work shows that recent results in bioinformatics and computational theory hold great promise for a parameter-free data-mining paradigm, and shows that this approach is competitive or superior to the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets. Expand
...
1
2
3
4
5
...