Learn More
We propose an indexing technique for fast retrieval of s imilar s ubsequences us ing t ime w arping di stances. A time wa rping d istance is a mo re suitable similarity meas ure t han t he Eucl idean di stance in many a pplications, where sequences may be of different lengths o r different sampling rates. Our indexing technique uses a dis k-based suffix(More)
This paper proposes an indexing technique for fast retrieval of similar subsequences using the time warping distance. The time warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a(More)
Exact match queries, wildcard match queries, and kmismatch queries are widely used in various molecular biology applications including the searching of ESTs (Expressed Sequence Tags) and DNA transcription factors. In this paper, we suggest an efficient indexing and processing mechanism for such queries. Our indexing method places a sliding window at every(More)
This paper deals with the problem of <i>shape-based retrieval</i> in time-series databases. The shape-based retrieval is defined as the operation that searches for the (sub)sequences whose shapes are similar to that of a given query sequence. In this paper, we propose an effective and efficient approach for shape-based retrieval of subsequences. We first(More)
Several indexing techniques have been proposed to process similarity queries in sequence databases. Most of them focus on finding similar sequences of the same length using the Euclidean distance metric. However, in some applications where the elements of sequences may be sampled at different rates, the time warping distance is a more suitable similarity(More)
In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and(More)
As next-generation sequencing technology made rapid and cost-effective sequencing available, the importance of computational approaches in finding and analyzing copy number variations (CNVs) has been amplified. Furthermore, most genome projects need to accurately analyze sequences with fairly low-coverage read data. It is urgently needed to develop a method(More)
This paper addresses the problem of timestamped event sequence matching, a new type of similar sequence matching that retrieves the occurrences of interesting patterns from timestamped sequence databases. The sequential-scan-based method, the trie-based method, and the method based on the iso-depth index are well-known approaches to this problem. In this(More)