A symbolic representation of time series, with implications for streaming algorithms

@inproceedings{Lin2003ASR,
  title={A symbolic representation of time series, with implications for streaming algorithms},
  author={Jessica Lin and Eamonn J. Keogh and Stefano Lonardi and Bill Yuan-chi Chiu},
  booktitle={DMKD '03},
  year={2003}
}
The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox is the fact that the vast majority of work on streaming data explicitly assumes that the data is discrete, whereas the vast majority of time series data is real valued.Many researchers have also considered transforming real valued time series into… Expand
Experiencing SAX: a novel symbolic representation of time series
TLDR
The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series. Expand
Current Trends in Time Series Representation
Time series data generation has been exploded in almost every domain such as in business, industry, medicine, science or entertainment. Consequently, there is an increasing need for analysingExpand
Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases
TLDR
VizTree is a time series pattern discovery and visualization system based on augmenting suffix trees that provides novel interactive solutions to many pattern discovery problems, including the discovery of frequently occurring patterns (motif discovery), surprising patterns (anomaly detection), and query by content. Expand
Mining Time Series Data
TLDR
This chapter gives a high-level survey of time series Data Mining tasks, with an emphasis on time series representations. Expand
Towards Optimal Symbolization for Time Series Comparisons
TLDR
This work presents a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications. Expand
Feature-Based Dividing Symbolic Time Series Representation for Streaming Data Processing
TLDR
A symbolic representation method of streaming time series based on VTP- diving with sliding window and a similarity measurement algorithm for the proposed representation method which lower bounding the Euclidean distance on the original data is proposed. Expand
Data representation for time series data mining: time domain approaches
In most time series data mining, alternate forms of data representation or data preprocessing is required because of the unique characteristics of time series, such as high dimension the number ofExpand
Towards a Faster Symbolic Aggregate Approximation Method
TLDR
This paper presents a new method that improves the performance of SAX by adding to it another exclusion condition that increases the exclusion power, and conducts experiments which show that the new method is faster than SAX. Expand
Genetic Algorithms-Based Symbolic Aggregate Approximation
TLDR
It is shown that this assumption of Gaussianity oversimplifies the problem and can result in very large errors in time series mining tasks, and an alternative scheme is presented, based on the genetic algorithms (GASAX), to find the breakpoints. Expand
The Parallel and Distributed Future of Data Series Mining
  • Themis Palpanas
  • Computer Science
  • 2017 International Conference on High Performance Computing & Simulation (HPCS)
  • 2017
TLDR
This work describes past efforts in designing techniques for indexing and mining truly massive collections of data series, based on indexing techniques for fast similarity search, an operation that lies at the core of many mining algorithms, and discusses novel techniques that adaptively create data series indexes. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
TLDR
The most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets support the claim that there is a need for a set oftime series benchmarks and more careful empirical evaluation in the data mining community. Expand
Finding Motifs in Time Series
TLDR
An efficient motif discovery algorithm for time series would be useful as a tool for summarizing and visualizing massive time series databases and could be used as a subroutine in various other data mining tasks, including the discovery of association rules, clustering and classification. Expand
Finding surprising patterns in a time series database in linear time and space
TLDR
A novel technique is introduced that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data. Expand
Finding recurrent sources in sequences
TLDR
This work defines the (k,h)-segmentation problem and shows that it is NP-hard in the general case, and gives approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √5 for theL2 error measure, and generalize the results to higher dimensions. Expand
Estimating Rarity and Similarity over Data Stream Windows
In the windowed data stream model, we observe items coming in over time. At any time t, we consider the window of the last N observations at-(N - 1), at-(N - 2), . . . , at, each ai ? {1, . . . , u};Expand
Fast Subsequence Matching in Time-Series Databases
We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance.Expand
TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data
TLDR
A novel wavelet based tree structure, termed TSA-tree, is introduced, which improves the efficiency of multi-level trend and surprise queries on time sequence data and two alternative techniques to reduce the size of the OTSA-tree even further while maintaining an acceptable query precision are proposed. Expand
Monotony of surprise and large-scale quest for unusual words.
TLDR
An extensive analysis of monotonicities of exceptionally frequent or rare words in bio-sequences for a broader variety of scores supports the construction of data structures and algorithms capable of performing global detection of unusual substrings in time and space linear in the subject sequences, under various probabilistic models. Expand
Discovering similar multidimensional trajectories
TLDR
This work formalizes non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Expand
Efficient time series matching by wavelets
  • K. Chan, A. Fu
  • Computer Science
  • Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)
  • 1999
TLDR
This paper proposes to use Haar Wavelet Transform for time series indexing and shows that Haar transform can outperform DFT through experiments, and proposes a two-phase method for efficient n-nearest neighbor query in time series databases. Expand
...
1
2
3
4
5
...