Sanjay Chawla

Learn More
Record linkage is an important data integration task that has many practical uses for matching, merging and duplicate removal in large and diverse databases. However, a quadratic scalability for the brute force approach necessitates the design of appropriate indexing or blocking techniques. We design and evaluate an efficient and highly scalable blocking(More)
The detection of outliers in spatio-temporal traffic data is an important research problem in the data mining and knowledge discovery community. However to the best of our knowledge, the discovery of relationships, especially causal interactions, among detected traffic outliers has not been investigated before. In this paper we propose algorithms which(More)
We propose a measure, spatial local outlier measure (SLOM) which captures the local behaviour of datum in their spatial neighborhood. With the help of SLOM, we are able to discern local spatial outliers which are usually missed by global techniques like "three standard deviations away from the mean". Furthermore, the measure takes into account the local(More)
We present a new space-efficient approach, (SparseDTW ), to compute the Dynamic Time Warping (DTW ) distance between two time series that always yields the optimal result. This is in contrast to other known approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of(More)
Of all the data mining techniques, outlier detection seems closest to the definition of “discovering nuggets of information” in large databases. When an outlier is detected, and determined to be genuine, it can provide insights, which can radically change our understanding of the underlying process. The purpose of the research underlying this thesis was to(More)
As mobile devices proliferate and networks become more locationaware, the corresponding growth in spatio-temporal data will demand analysis techniques to mine patterns that take into account the semantics of such data. Association Rule Mining (ARM) has been one of the more extensively studied data mining techniques, but it considers discrete transactional(More)
We propose a novel two-step mining and optimization framework for inferring the root cause of anomalies that appear in road traffic data. We model road traffic as a time-dependent flow on a network formed by partitioning a city into regions bounded by major roads. In the first step we identify link anomalies based on their deviation from their historical(More)
Modeling spatial context (e.g., autocorrelation) is a key challenge in classification problems that arise in geospatial domains. Markov random fields (MRF) is a popular model for incorporating spatial context into image segmentation and land-use classification problems. The spatial autoregression (SAR) model, which is an extension of the classical(More)