Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping

@article{Rakthanmanon2012SearchingAM,
  title={Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping},
  author={Thanawin Rakthanmanon and Bilson J. L. Campana and Abdullah Al Mueen and Gustavo E. A. P. A. Batista and M. Brandon Westover and Qiang Zhu and Jesin Zakaria and Eamonn J. Keogh},
  journal={KDD : proceedings. International Conference on Knowledge Discovery \& Data Mining},
  year={2012},
  volume={2012},
  pages={262 - 270}
}
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work… 

Figures from this paper

Similarity search in multiple high speed time series streams under dynamic time warping
  • Bui Cong Giao, D. T. Anh
  • Computer Science
    2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS)
  • 2015
TLDR
An efficient method is introduced, which similarly searches numerous time- series queries over multiple streaming time-series under Dynamic Time Warping and obtains the same accuracy as similarity search in static time series.
Data Mining a Trillion Time Series Subsequences Under Dynamic Time Warping
TLDR
This work shows that by using a combination of four novel ideas, in large datasets the authors can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms.
A Fast Method for Motif Discovery in Large Time Series Database under Dynamic Time Warping
TLDR
This work proposes a fast method for time series motif discovery which uses Dynamic Time Warping distance, a better measure than Euclidean distance, and shows that this method performs very efficiently on large time serried datasets while brings out high accuracy.
Parallelization of searching and mining time series data using Dynamic Time Warping
  • A. Shabib, A. Narang, D. Sitaram
  • Computer Science
    2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
  • 2015
TLDR
This paper considers 2 methods of parallelizing the UCR Dynamic Time Warping algorithm, a multi-core implementation, followed by a cluster implementation using Spark and shows how to compute distributed lower bounds efficiently in Spark and achieve nearly linear speedup with DTW in a Spark computation as well.
Speed Up Similarity Search of Time Series Under Dynamic Time Warping
TLDR
The proposed framework of two-stage similarity search for time series includes an improved lower bounding distance, which can be used to discard plenty of dissimilar series to get a set of candidate sequences and explores early abandoning strategy to avoid the full calculation of DTW.
Indexing and classifying gigabytes of time series under time warping
TLDR
TSS is developed, a novel algorithm for Time Series Indexing which combines a hierarchy of K-means clustering with DTW-based lower-bounding that makes it possible to classify time series orders of magnitude faster than the state of the art.
Scalable Algorithm for Subsequence Similarity Search in Very Large Time Series Data on Cluster of Phi KNL
TLDR
This paper proposes a novel parallel algorithm for subsequence similarity search in very large time series data on computing cluster with nodes based on the Intel Xeon Phi Knights Landing (KNL) many-core processors and shows that it is highly scalable.
Discovering sub-patterns from time series using a normalized cross-match algorithm
TLDR
A normalized-CrossMatch approach is proposed that extends CM to enforce normalization while maintaining the same performance capabilities for time series data stream mining.
On-line and dynamic time warping for time series data mining
  • Hailin Li
  • Computer Science
    Int. J. Mach. Learn. Cybern.
  • 2015
TLDR
The results of numerical experiments demonstrate that the proposed approach comparing to DTW measures the similarity of time series fast and validly, which improves the performance of the algorithm applied to the field of timeseries data mining.
Speeding up dynamic time warping distance for sparse time series data
TLDR
A new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse timeseries and is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
A disk-aware algorithm for time series motif discovery
TLDR
This work leverage off previous work on pivot-based indexing to introduce a disk-aware algorithm to find time series motifs exactly in multi-gigabyte databases which contain on the order of tens of millions of time series.
Online discovery and maintenance of time series motifs
TLDR
This paper develops the first online motif discovery algorithm which monitors and maintains motifs exactly in real time over the most recent history of a stream and allows useful extensions of the algorithm to deal with arbitrary data rates and discovering multidimensional motifs.
iSAX: indexing and mining terabyte sized time series
TLDR
This work shows how a novel multi-resolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature, allowing for the exact mining of truly massive real world datasets.
Time series shapelets: a new primitive for data mining
TLDR
A new time series primitive, time series shapelets, is introduced, which can be interpretable, more accurate and significantly faster than state-of-the-art classifiers.
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
TLDR
The most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets support the claim that there is a need for a set oftime series benchmarks and more careful empirical evaluation in the data mining community.
Scaling and time warping in time series querying
TLDR
This work introduces the first technique which can handle both DTW and US simultaneously, and involves search pruning by means of a lower bounding technique and multi-dimensional indexing to speed up the search.
Efficient Processing of Warping Time Series Join of Motion Capture Data
TLDR
A two-step filter-and-refine algorithm to support efficient l-e-join of time series, called Warping Time Series Join (WTSJ), and a block-based time series summarization method, based on which the block-wise e-matching matrix is first computed.
Using Multiple Indexes for Efficient Subsequence Matching in Time-Series Databases
TLDR
This paper quantitatively examines the performance degradation caused by the window size effect, and formally proves the optimality as well as the effectiveness of the algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings.
FTW: fast similarity search under the time warping distance
TLDR
Experiments on real and synthetic sequence data sets reveals that FTW is significantly faster than the best existing method, up to 222 times, and efficiently prunes a significant number of the search cost.
Embedding-based subsequence matching in time-series databases
TLDR
Good trade-offs between retrieval accuracy and retrieval efficiency are obtained for both methods, and the results are competitive with respect to current state-of-the-art methods.
...
1
2
3
4
5
...