On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

@article{Keogh2004OnTN,
  title={On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration},
  author={Eamonn J. Keogh and Shruti Kasetty},
  journal={Data Mining and Knowledge Discovery},
  year={2004},
  volume={7},
  pages={349-371}
}
In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of “improvement” that would have been… 
Querying and mining of time series data: experimental comparison of representations and distance measures
TLDR
An extensive set of time series experiments are conducted re-implementing 8 different representation methods and 9 similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains to provide a unified validation of some of the existing achievements.
Time-series data mining
TLDR
A survey of the techniques applied for time-series data mining, namely representation techniques, distance measures, and indexing methods, is provided.
Experimental comparison of representation methods and distance measures for time series data
TLDR
An extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains gives an overview of these different techniques and presents comparative experimental findings regarding their effectiveness.
Using derivatives in time series classification
TLDR
A new distance function based on a derivative is proposed, which considers the general shape of a time series rather than point-to-point function comparison, and is used in classification with the nearest neighbor rule.
1 Chapter 1 Pattern Recognition in Time Series
TLDR
This chapter discusses the state-of-the-art techniques for time series pattern recognition, the process of mapping an input representation for an entity or relationship to an output category in order to solve the challenges of handling time series databases.
The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances
TLDR
This work implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets, indicating that only nine of these algorithms are significantly more accurate than both benchmarks.
(Not) Finding Rules in Time Series: A Surprising Result with Implications for Previous and Future Research
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in
Recent Advances in Mining Time Series Data
TLDR
This talk will summarize the latest advances in mining time series data, including new representations of time seriesData mining, and discuss the migration from static problems to online problems.
Current Trends in Time Series Representation
TLDR
An overview of the recent approaches and trends in the field of time series data mining representations is presented and it seems that the main interest has been gradually drawn in manipulating streaming data and / or multivariate time series.
A review on time series data mining
...
...

References

SHOWING 1-10 OF 82 REFERENCES
Mining the stock market (extended abstract): which measure is best?
TLDR
The approach is to cluster the stocks according to various measures and compare the results to the ”groundtruth” clustering based on the Standard and Poor 500 Index and reveal several interesting facts about the similarity measures used for stock-market data.
Mining The Stock Market : Which Measure Is Best ?
TLDR
The approach is to cluster the stocks according to various measures and compare the results to the ”groundtruth” clustering based on the Standard and Poor 500 Index and reveal several interesting facts about the similarity measures used for stock-market data.
Mining for similarities in aligned time series using wavelets
TLDR
This work proposes using a wavelet transformation of a time series to produce a natural set of features which describe properties of the sequence, both at various locations and at varying time granularities, and demonstrates how the features allow a flexible analysis of different aspects of the similarity.
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches
TLDR
This paper formalizes problems of identifying various 'representative' trends in time series data by using a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector.
Variable length queries for time series data
TLDR
A new indexing technique that works well for variable length queries that is to store index structures at different resolutions for a given dataset by exploiting the power of wavelets.
Event detection from time series data
TLDR
An iterative algorithm is proposed that fits a model to a time segment, and uses a likelihood criterion to determine if the segment should be partitioned further, i.e. if it contains a new changepoint.
AIM: Approximate Intelligent Matching for Time Series Data
TLDR
This paper introduces a new problem, the approximate partial matching of a query sequence in a time series database and investigates an intelligent subsequence similarity matching of time series queries based on efficient graph traversal.
Fast Subsequence Matching in Time-Series Databases
We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance.
Locally adaptive dimensionality reduction for indexing large time series databases
TLDR
This article introduces a new dimensionality reduction technique, which it is shown how APCA can be indexed using a multidimensional index structure, and proposes two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching.
The Haar Wavelet Transform in the Time Series Similarity Paradigm
TLDR
This work presents a simple and powerful technique which allows for the rapid evaluation of similarity between time series in large data bases, based on the orthonormal decomposition of the time series into the Haar basis, and demonstrates that this approach is capable of providing estimates of the local slope of theTime series in the sequence of multi-resolution steps.
...
...