Learn More
We investigate techniques for analysis and retrieval of object trajectories in a two or three dimensional space. Such kind of data usually contain a great amount of noise, that makes all previously used metrics fail. Therefore, here we formalize non-metric similarity functions based on the Longest Common Subsequence (LCSS), which are very robust to noise(More)
We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of(More)
For the discovery of similar patterns in 1D time-series, it is very typical to perform a normalization of the data (for example a transformation so that the data follow a zero mean and unit standard deviation). Such transformations can reveal latent patterns and are very commonly used in datamining applications. However, when dealing with multidimensional(More)
Although most time-series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need for a single index structure that can support multiple distance measures. Our specific area of interest is the efficient retrieval and analysis of trajectory similarities. Trajectory datasets are very(More)
The matching of two-dimensional shapes is an important problem with applications in domains as diverse as biometrics, industry, medicine and anthropology. The distance measure used must be invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortions are relatively easy to handle, either in the(More)
The past decade has seen a wealth of research on time series representations, because the manipulation, storage, and indexing of large volumes of raw time series data is impractical. The vast majority of research has concentrated on representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the(More)
The ever-increasing number of intrusions in public and commercial networks has created the need for high-speed archival solutions that continuously store streaming network data to enable forensic analysis and auditing. However, " turning back the clock " for post-attack analyses is not a trivial task. The first major challenge is that the solution has to(More)
We present data representations, distance measures and organizational structures for fast and efficient retrieval of similar shapes in image databases. Using the Hough Transform we extract shape signatures that correspond to important features of an image. The new shape descriptor is robust against line discontinuities and takes into consideration not only(More)
We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property of wavelets. The dilemma of choosing the initial centers is mitigated by initializing the centers at each approximation level, using the final centers returned by the coarser(More)