Scalable, Variable-Length Similarity Search in Data Series: The ULISSE Approach

@article{Linardi2018ScalableVS,
  title={Scalable, Variable-Length Similarity Search in Data Series: The ULISSE Approach},
  author={Michele Linardi and Themis Palpanas},
  journal={Proc. VLDB Endow.},
  year={2018},
  volume={11},
  pages={2236-2248}
}
Data series similarity search is an important operation and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length. Our… 
Scalable data series subsequence matching with ULISSE
TLDR
This work proposes ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range), and introduces a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length.
ParIS+: Data Series Indexing on Multi-Core Architectures
TLDR
This work proposes ParIS and ParIS+, the first disk-based data series indices carefully designed to inherently take advantage of multi-core architectures, in order to accelerate similarity search processing times.
Truly Scalable Data Series Similarity Search
TLDR
The results of two comprehensive data series experimental evaluations form the foundations of the development of a novel index that can efficiently support both exact and approximate data series similarity search, as well as progressive query answering with bound guarantees.
Effective and Efficient Variable-Length Data Series Analytics
TLDR
These are the first solutions that inherently support scalable and variable-length similarity search in data series, applied to sequence/subsequences matching, motif and discord discovery problems and are up to orders of magnitude faster than the alternatives.
Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search
TLDR
A taxonomy of similarity search techniques that reconciles the terminology used in these two domains is proposed, modifications to data series indexing techniques enabling them to answer approximate similarity queries with quality guarantees are described, and a thorough experimental evaluation is conducted.
Data Series Indexing Gone Parallel
  • Botao Peng
  • Computer Science
    2020 IEEE 36th International Conference on Data Engineering (ICDE)
  • 2020
TLDR
This work presents the first data series indexing solutions, for both on-disk and in-memory data, that are designed to inherently take advantage of multi-core architectures, in order to accelerate similarity search processing times.
MESSI: In-Memory Data Series Indexing
TLDR
MESSI is the first to answer exact similarity search queries on 100GB datasets in ~50msec (30-75msec across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
Fast data series indexing for in-memory data
TLDR
MESSI is the first to answer exact similarity search queries on 100GB datasets in 50 ms (30–75 ms across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
SING: Sequence Indexing Using GPUs
TLDR
SING is an in-memory index that uses CPU+GPU co-processing (as well as SIMD, multi-core and multi-socket architectures), in order to accelerate similarity search and achieves exact similarity search query times as low as 32msec on 100GB datasets, which enables interactive data exploration on very large data series collections.
Evolution of a Data Series Index
TLDR
This work describes techniques for indexing and efficient similarity search in truly massive collections of data series, focusing on the iSAX family of dataseries indexes, and presents their design characteristics.
...
...

References

SHOWING 1-10 OF 43 REFERENCES
ULISSE: ULtra Compact Index for Variable-Length Similarity Search in Data Series
TLDR
This work proposes ULISSE, the first data series index structure designed for answering similarity search queries of variable length, and introduces a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length.
Variable length queries for time series data
TLDR
A new indexing technique that works well for variable length queries that is to store index structures at different resolutions for a given dataset by exploiting the power of wavelets.
Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes
TLDR
Coconut is an inverted, sortable data series summarization that organizes data series based on a z-order curve, keeping similar series close to each other in the sorted order and is able to use bulk-loading techniques that rely on sorting to quickly build a contiguous index using large sequential disk I/Os.
The TS-tree: efficient time series search and retrieval
TLDR
The TS-tree (time series tree) is proposed, an index structure for efficient time series retrieval and similarity search that outperforms existing approaches like the R*-tree or the quantized A-tree by exploiting inherent properties of time series quantization and dimensionality reduction.
A compact multi-resolution index for variable length queries in time series databases
TLDR
CMRI is proposed, which uses adaptive piecewise constant approximation (APCA) representation as dimensionality reduction technique, and which occupies much less space without requiring compression, and is found to be an efficient and scalable indexing technique for large time series databases.
Query Workloads for Data Series Indexes
TLDR
This work shows that random workloads are inherently not suitable for the task at hand and argues that there is a need for carefully generating a query workload, and proposes a method for generating workloads with the desired properties.
Fast Subsequence Matching in Time-Series Databases
We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance.
ADS: the adaptive data series index
TLDR
This paper presents the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections, using approximate and exact query algorithms with both synthetic and real data sets.
DPiSAX: Massively Distributed Partitioned iSAX
TLDR
This work proposes a parallel indexing solution that gracefully scales to billions of time series, and a parallel query processing strategy that, given a batch of queries, efficiently exploits the index.
Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases
TLDR
This work introduces a new dimensionality reduction technique which it is called Piecewise Aggregate Approximation (PAA), and theoretically and empirically compare it to the other techniques and demonstrate its superiority.
...
...