A symbolic representation of time series, with implications for streaming algorithms

  title={A symbolic representation of time series, with implications for streaming algorithms},
  author={Jessica Lin and Eamonn J. Keogh and Stefano Lonardi and Bill Yuan-chi Chiu},
  booktitle={Workshop on Research Issues on Data Mining and Knowledge Discovery},
The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox is the fact that the vast majority of work on streaming data explicitly assumes that the data is discrete, whereas the vast majority of time series data is real valued.Many researchers have also considered transforming real valued time series into… 

Experiencing SAX: a novel symbolic representation of time series

The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.

Current Trends in Time Series Representation

An overview of the recent approaches and trends in the field of time series data mining representations is presented and it seems that the main interest has been gradually drawn in manipulating streaming data and / or multivariate time series.

Transitional SAX Representation for Knowledge Discovery for Time Series

This paper proposes a new symbolic representation method called transitional symbolic aggregate approximation that incorporates transitional information into symbolic aggregate approximations and is able to preserve meaningful information, including dynamic trend transitions in segmented time series, while still reducing dimensionality.

Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases

VizTree is a time series pattern discovery and visualization system based on augmenting suffix trees that provides novel interactive solutions to many pattern discovery problems, including the discovery of frequently occurring patterns (motif discovery), surprising patterns (anomaly detection), and query by content.

A Novel Time Series Representation Approach for Dimensionality Reduction

It has been shown that using the ASAR representation, the data mining process is accelerated the most, and the experimental results have shown that ASAR achieved the highest reduction in the dimensions.

Towards Optimal Symbolization for Time Series Comparisons

This work presents a novel quantizer based upon optimization of comparison fidelity and a computationally tractable algorithm for its implementation on big datasets, providing the potential of increased real world performance across a wide range of existing data mining algorithms and applications.

Feature-Based Dividing Symbolic Time Series Representation for Streaming Data Processing

A symbolic representation method of streaming time series based on VTP- diving with sliding window and a similarity measurement algorithm for the proposed representation method which lower bounding the Euclidean distance on the original data is proposed.

Data representation for time series data mining: time domain approaches

In most time series data mining, alternate forms of data representation or data preprocessing is required because of the unique characteristics of time series, such as high dimension, presence of random noise, and nonlinear relationship of the data elements.

Towards a Faster Symbolic Aggregate Approximation Method

This paper presents a new method that improves the performance of SAX by adding to it another exclusion condition that increases the exclusion power, and conducts experiments which show that the new method is faster than SAX.

The Parallel and Distributed Future of Data Series Mining

  • Themis Palpanas
  • Computer Science
    2017 International Conference on High Performance Computing & Simulation (HPCS)
  • 2017
This work describes past efforts in designing techniques for indexing and mining truly massive collections of data series, based on indexing techniques for fast similarity search, an operation that lies at the core of many mining algorithms, and discusses novel techniques that adaptively create data series indexes.



On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

The most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets support the claim that there is a need for a set oftime series benchmarks and more careful empirical evaluation in the data mining community.

Finding Motifs in Time Series

An efficient motif discovery algorithm for time series would be useful as a tool for summarizing and visualizing massive time series databases and could be used as a subroutine in various other data mining tasks, including the discovery of association rules, clustering and classification.

Locally adaptive dimensionality reduction for indexing large time series databases

This work introduces a new dimensionality reduction technique which it is shown how APCA can be indexed using a multidimensional index structure, and proposes two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching.

Finding surprising patterns in a time series database in linear time and space

A novel technique is introduced that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data.

Finding recurrent sources in sequences

This work defines the (k,h)-segmentation problem and shows that it is NP-hard in the general case, and gives approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √5 for theL2 error measure, and generalize the results to higher dimensions.

Estimating Rarity and Similarity over Data Stream Windows

In the windowed data stream model, we observe items coming in over time. At any time t, we consider the window of the last N observations at-(N - 1), at-(N - 2), . . . , at, each ai ? {1, . . . , u};

Fast Subsequence Matching in Time-Series Databases

We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance.

TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data

A novel wavelet based tree structure, termed TSA-tree, is introduced, which improves the efficiency of multi-level trend and surprise queries on time sequence data and two alternative techniques to reduce the size of the OTSA-tree even further while maintaining an acceptable query precision are proposed.

Monotony of surprise and large-scale quest for unusual words

An extensive analysis of monotonicities of exceptionally frequent or rare words in bio-sequences supports the construction of data structures and algorithms capable of performing global detection of unusual substrings in time and space linear in the subject sequences, under various probabilistic models.

Discovering similar multidimensional trajectories

This work formalizes non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences.