# A symbolic representation of time series, with implications for streaming algorithms

@inproceedings{Lin2003ASR, title={A symbolic representation of time series, with implications for streaming algorithms}, author={Jessica Lin and Eamonn J. Keogh and Stefano Lonardi and Bill Yuan-chi Chiu}, booktitle={DMKD '03}, year={2003} }

The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox is the fact that the vast majority of work on streaming data explicitly assumes that the data is discrete, whereas the vast majority of time series data is real valued.Many researchers have also considered transforming real valued time series into… Expand

#### Figures, Tables, and Topics from this paper

#### 1,788 Citations

Experiencing SAX: a novel symbolic representation of time series

- Computer Science
- Data Mining and Knowledge Discovery
- 2007

The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series. Expand

Current Trends in Time Series Representation

- 2007

Time series data generation has been exploded in almost every domain such as in business, industry, medicine, science or entertainment. Consequently, there is an increasing need for analysing… Expand

Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases

- Computer Science
- Inf. Vis.
- 2005

VizTree is a time series pattern discovery and visualization system based on augmenting suffix trees that provides novel interactive solutions to many pattern discovery problems, including the discovery of frequently occurring patterns (motif discovery), surprising patterns (anomaly detection), and query by content. Expand

Mining Time Series Data

- Mathematics, Computer Science
- Data Mining and Knowledge Discovery Handbook
- 2010

This chapter gives a high-level survey of time series Data Mining tasks, with an emphasis on time series representations. Expand

Towards Optimal Symbolization for Time Series Comparisons

- Computer Science
- 2013 IEEE 13th International Conference on Data Mining Workshops
- 2013

This work presents a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications. Expand

Feature-Based Dividing Symbolic Time Series Representation for Streaming Data Processing

- Computer Science
- 2018 9th International Conference on Information Technology in Medicine and Education (ITME)
- 2018

A symbolic representation method of streaming time series based on VTP- diving with sliding window and a similarity measurement algorithm for the proposed representation method which lower bounding the Euclidean distance on the original data is proposed. Expand

Data representation for time series data mining: time domain approaches

- Computer Science
- 2017

In most time series data mining, alternate forms of data representation or data preprocessing is required because of the unique characteristics of time series, such as high dimension the number of… Expand

Towards a Faster Symbolic Aggregate Approximation Method

- Computer Science
- ICSOFT
- 2010

This paper presents a new method that improves the performance of SAX by adding to it another exclusion condition that increases the exclusion power, and conducts experiments which show that the new method is faster than SAX. Expand

Genetic Algorithms-Based Symbolic Aggregate Approximation

- Computer Science
- DaWaK
- 2012

It is shown that this assumption of Gaussianity oversimplifies the problem and can result in very large errors in time series mining tasks, and an alternative scheme is presented, based on the genetic algorithms (GASAX), to find the breakpoints. Expand

The Parallel and Distributed Future of Data Series Mining

- Computer Science
- 2017 International Conference on High Performance Computing & Simulation (HPCS)
- 2017

This work describes past efforts in designing techniques for indexing and mining truly massive collections of data series, based on indexing techniques for fast similarity search, an operation that lies at the core of many mining algorithms, and discusses novel techniques that adaptively create data series indexes. Expand

#### References

SHOWING 1-10 OF 54 REFERENCES

On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

- Computer Science
- KDD '02
- 2002

The most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets support the claim that there is a need for a set oftime series benchmarks and more careful empirical evaluation in the data mining community. Expand

Finding Motifs in Time Series

- Computer Science
- KDD 2002
- 2002

An efficient motif discovery algorithm for time series would be useful as a tool for summarizing and visualizing massive time series databases and could be used as a subroutine in various other data mining tasks, including the discovery of association rules, clustering and classification. Expand

Finding surprising patterns in a time series database in linear time and space

- Mathematics, Computer Science
- KDD
- 2002

A novel technique is introduced that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data. Expand

Finding recurrent sources in sequences

- Mathematics, Computer Science
- RECOMB '03
- 2003

This work defines the (k,h)-segmentation problem and shows that it is NP-hard in the general case, and gives approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √5 for theL2 error measure, and generalize the results to higher dimensions. Expand

Estimating Rarity and Similarity over Data Stream Windows

- Computer Science
- ESA
- 2002

In the windowed data stream model, we observe items coming in over time. At any time t, we consider the window of the last N observations at-(N - 1), at-(N - 2), . . . , at, each ai ? {1, . . . , u};… Expand

Fast Subsequence Matching in Time-Series Databases

- 1994

We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance.… Expand

TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data

- Computer Science
- Proceedings. 12th International Conference on Scientific and Statistica Database Management
- 2000

A novel wavelet based tree structure, termed TSA-tree, is introduced, which improves the efficiency of multi-level trend and surprise queries on time sequence data and two alternative techniques to reduce the size of the OTSA-tree even further while maintaining an acceptable query precision are proposed. Expand

Monotony of surprise and large-scale quest for unusual words.

- Computer Science
- 2003

An extensive analysis of monotonicities of exceptionally frequent or rare words in bio-sequences for a broader variety of scores supports the construction of data structures and algorithms capable of performing global detection of unusual substrings in time and space linear in the subject sequences, under various probabilistic models. Expand

Discovering similar multidimensional trajectories

- Computer Science
- Proceedings 18th International Conference on Data Engineering
- 2002

This work formalizes non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Expand

Efficient time series matching by wavelets

- Computer Science
- Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)
- 1999

This paper proposes to use Haar Wavelet Transform for time series indexing and shows that Haar transform can outperform DFT through experiments, and proposes a two-phase method for efficient n-nearest neighbor query in time series databases. Expand