Learn More
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each(More)
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role to play in spatial data mining. To this end, we develop a new clustering method called CLAHANS which is based on randomized search. We also develop two spatial(More)
Existing studies on time series are based on two categories of distance functions. The first category consists of the Lp-norms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shifting but are non-metric. The first contribution of this(More)
This paper deals with nding outliers (exceptions) in large, multidimensional datasets. The iden-tiication of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce , credit card fraud, and even the analysis of performance statistics of professional athletes. Existing methods that we have seen for nding outliers(More)
This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. Existing methods that we have seen for finding(More)
—Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, we propose a new clustering method called CLARANS, whose aim is to identify spatial structures that may be present in the data. Experimental results indicate that,(More)
Software developers are often faced with modification tasks that involve source which is spread across a code base. Some dependencies between source code, such as those between source code written in different languages, are difficult to determine using existing static and dynamic analyses. To augment existing analyses and to help developers identify(More)
Of all scientiic investigations into reasoning with uncertainty and chance, probability theory is perhaps the best understood paradigm. Nevertheless, all studies conducted thus far into the semantics of quantitative logic programming(cf.) have restricted themselves to non-probabilistic semantical characterizations. In this paper, we take a few steps towards(More)
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suuers from the following serious shortcomings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In eeect, this model functions as a black-box, admitting little user interaction in(More)
In this paper, we attempt to approximate and index a <i>d-</i> dimensional (<i>d</i> &#8805; 1) spatio-temporal trajectory with a low order continuous polynomial. There are many possible ways to choose the polynomial, including (continuous)Fourier transforms, splines, non-linear regressino, etc. Some of these possiblities have indeed been studied beofre. We(More)