• Publications
  • Influence
LOF: identifying density-based local outliers
TLDR
This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.
Algorithms for Mining Distance-Based Outliers in Large Datasets
TLDR
This paper provides formal and empirical evidence showing the usefulness of DB-outliers and presents two simple algorithms for computing such outliers, both having a complexity of O(k N’), k being the dimensionality and N being the number of objects in the dataset.
Efficient and Effective Clustering Methods for Spatial Data Mining
TLDR
The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms.
LOF: identifying density-based local outliers
TLDR
This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.
CLARANS: A Method for Clustering Objects for Spatial Data Mining
  • R. Ng, Jiawei Han
  • Computer Science
    IEEE Trans. Knowl. Data Eng.
  • 1 September 2002
TLDR
A new clustering method is proposed, called CLARANS, whose aim is to identify spatial structures that may be present in the data, and two spatial data mining algorithms that aim to discover relationships between spatial and nonspatial attributes are developed.
Exploratory mining and pruning optimizations of constrained associations rules
TLDR
An architecture that opens up the black-box, and supports constraint-based, human-centered exploratory mining of associations, and introduces and analyzes two properties of constraints that are critical to pruning: anti-monotonicity and succinctness.
Distance-based outliers: algorithms and applications
TLDR
Outlier detection can be done efficiently for large datasets, and for k-dimensional datasets with large values of k, and it is shown that outlier detection is a meaningful and important knowledge discovery task.
Predicting source code changes by mining change history
TLDR
An approach that applies data mining techniques to determine change patterns can be used to recommend potentially relevant source code to a developer performing a modification task and can reveal valuable dependencies by applying to the Eclipse and Mozilla open source projects.
Indexing spatio-temporal trajectories with Chebyshev polynomials
TLDR
The Chebyshev polynomials are explored as a basis for approximating and indexing d-dimenstional trajectories and the key analytic result is the Lower Bounding Lemma, which shows that the Euclidean distance between two d-dimensional trajectories is lower bounded by the weighted Euclideans distance between the two vectors of ChebysHEv coefficients.
...
1
2
3
4
5
...