Learn More
We propose a novel index structure, A-tree (Approximation tree), for similarity search of high-dimensional data. The basic idea of the A-tree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and thus affect the tree configuration both quantitatively and(More)
Time-series data naturally arise in countless domains, such as meteorology, astrophysics, geology, multimedia, and economics. Similarity search is very popular, and DTW (Dynamic Time Warping) is one of the two prevailing distance measures. Although DTW incurs a heavy computation cost, it provides scaling along the time axis. In this paper, we propose FTW(More)
The recent explosion in the adoption of search engines and new media such as blogs and Twitter have facilitated faster propagation of news and rumors. How quickly does a piece of news spread over these media? How does its popularity diminish over time? Does the rising and falling pattern follow a simple universal law? In this paper, we propose SpikeM, a(More)
The goal is to monitor multiple numerical streams, and determine which pairs are correlated with lags, as well as the value of each such lag. Lag correlations (and anti-correlations) are frequent, and very interesting in practice: For example, a decrease in interest rates typically precedes an increase in house sales by a few months; higher amounts of(More)
Given a collection of complex, time-stamped events, how do we find patterns and anomalies? Events could be meetings with one or more persons and one or more agenda items at zero or more locations (e.g., teleconferences), or they could be publications with authors, keywords, publishers, etc. In such settings, we want to find time stamps that look similar to(More)
Data stream processing has recently attracted an increasing amount of interest. The goal of this paper is to monitor numerical streams, and to find subsequences that are similar to a given query sequence, under the DTW (Dynamic Time Warping) distance. Applications include word spotting, sensor pattern matching, and monitoring of bio-medical signals (e.g.,(More)
We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words emerge and disappear over periods of a few days. Thus, in the proposed model, current topic-specific(More)
Given huge collections of time-evolving events such as web-click logs, which consist of multiple attributes (e.g., URL, userID, times- tamp), how do we find patterns and trends? How do we go about capturing daily patterns and forecasting future events? We need two properties: (a) effectiveness, that is, the patterns should help us understand the data,(More)
Given a large collection of co-evolving multiple time-series, which contains an unknown number of patterns of different durations, how can we efficiently and effectively find typical patterns and the points of variation? How can we statistically summarize all the sequences, and achieve a meaningful segmentation? In this paper we present AutoPlait, a fully(More)