On the need for time series data mining benchmarks: a survey and empirical demonstration

@article{Keogh2002OnTN,
  title={On the need for time series data mining benchmarks: a survey and empirical demonstration},
  author={Eamonn J. Keogh and Shruti Kasetty},
  journal={Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining},
  year={2002}
}
  • Eamonn J. Keogh, Shruti Kasetty
  • Published 23 July 2002
  • Computer Science
  • Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been… 

Figures and Tables from this paper

An efficient and accurate method for evaluating time series similarity
TLDR
This work proposes a new algorithm, called the Fast Time Series Evaluation (FTSE) method, which can be used to evaluate threshold value techniques, including LCSS and EDR, and extends the ε threshold-based scoring techniques to include arbitrary match rewards and gap penalties.
An empirical Bayes approach to detect anomalies in dynamic multidimensional arrays
  • D. Agarwal
  • Computer Science
    Fifth IEEE International Conference on Data Mining (ICDM'05)
  • 2005
TLDR
An empirical Bayes method is used which works by fitting a two component Gaussian mixture to deviations at current time to suppress deviations that are merely the consequence of sharp changes in the marginal distributions.
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, DMKD 2003, San Diego, California, USA, June 13, 2003
TLDR
This talk will discuss the basic pseudo-random sketching mechanism for building stream synopses and the ongoing work that exploits sketchsynopses to build an approximate SQL (multi) query processor.
An empirical evaluation of similarity measures for time series classification
Learning Actions in Complex Software Systems
TLDR
This paper proposes a new method to relate frequent patterns in a given time series to changes recorded in the event's history and calculates confidence and support of frequent patterns that contribute to changes to identify a set of rules for automating changes.
Dictionary-Based Compression for Long Time-Series Similarity
TLDR
A new time-series similarity measure called the Dictionary Compression Score (DCS) is developed, which uses the well-known Kolmogorov Complexity in information theory and the Lempel-Ziv compression framework as a basis to calculate similarity scores.
The Motif Tracking Algorithm
TLDR
The Motif Tracking Algorithm is introduced, a novel immune inspired pattern identification tool that is able to identify unknown motifs of a non specified length which repeat within time series data.
A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering
TLDR
This work introduces a new technique based on a bit level approximation of the data that allows raw data to be directly compared to the reduced representation, while still guaranteeing lower bounds to Euclidean distance.
Accelerating the discovery of unsupervised-shapelets
TLDR
This work exploits and extends a recently introduced concept in time series data mining called shapelets and introduces two novel optimization procedures to significantly speed up the unsupervised-shapelet discovery process and allow it to be cast as an anytime algorithm.
Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures
TLDR
The indexing technique can be used to index star light curves, an important type of astronomical data, without modification and with all the most popular distance measures including Euclidean distance, dynamic time warping and Longest Common Subsequence.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 63 REFERENCES
Supporting fast search in time series for movement patterns in multiple scales
TLDR
A pre-computation and indexing method to facilitate fast evaluation of pattern queries in user-speciied scales and some experiments performed on a real-life data set are reported to show the eeciency and the scalability of the algorithms.
Machine learning as an experimental science
TLDR
Machine learning is a scientific discipline and, like the fields of AI and computer science, has both theoretical and empirical aspects, making it more akin to physics and chemistry than astronomy or sociology.
Search for Patterns in Compressed Time Series
TLDR
Experiments show the effectiveness of this technique for indexing of stock prices, weather data and electroencephalograms and for compression of time series and retrieval of series similar to a given pattern.
Fast Time Sequence Indexing for Arbitrary Lp Norms
TLDR
This paper presents a novel and fast indexing scheme for time sequences, when the distance function is any of arbitrary Lp norms including the popular Euclidean distance (L2 norm), and achieves significant speedups over the state of the art.
Supporting content-based searches on time series via approximation
  • Changzhou Wang, X. Wang
  • Computer Science
    Proceedings. 12th International Conference on Scientific and Statistica Database Management
  • 2000
TLDR
The paper introduces two specific approximation methods, one is wavelet based and the other line-fitting based, and shows that both approximation methods significantly reduce the query processing time without introducing intolerable errors.
Mining for similarities in aligned time series using wavelets
TLDR
This work proposes using a wavelet transformation of a time series to produce a natural set of features which describe properties of the sequence, both at various locations and at varying time granularities, and demonstrates how the features allow a flexible analysis of different aspects of the similarity.
MALM: a framework for mining sequence database at multiple abstraction levels
Efficient Pruning Methods for Separate-and-Conquer Rule Learning Systems
TLDR
This paper presents a solution in the form of new pruning techniques that dramatically improve the runtime of rule induction methods with no loss in accuracy: formal analysis shows an improvement in asymp-totic time complexity, and experiments show an order-of-magnitude speedup.
Efficient Similarity Search In Sequence Databases
TLDR
An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.
A Probabilistic Approach to Fast Pattern Matching in Time Series Databases
TLDR
The proposed approach provides a natural framework to support user-customizable "query by content" on time series data, taking prior domain information into account in a principled manner.
...
1
2
3
4
5
...