• Publications
  • Influence
Anomaly detection: A survey
TLDR
This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
A Comparison of Document Clustering Techniques
TLDR
This paper compares the two main approaches to document clustering, agglomerative hierarchical clustering and K-means, and indicates that the bisecting K-MEans technique is better than the standard K-Means approach and as good or better as the hierarchical approaches that were tested for a variety of cluster evaluation metrics.
Top 10 algorithms in data mining
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN,
Chameleon: Hierarchical Clustering Using Dynamic Modeling
TLDR
Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters, which is important for dealing with highly variable clusters.
Selecting the right interestingness measure for association patterns
TLDR
An overview of various measures proposed in the statistics, machine learning and data mining literature is presented and it is shown that each measure has different properties which make them useful for some application domains, but not for others.
Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
TLDR
A novel clustering technique that addresses problems with varying densities and high dimensionality, while the use of core points handles problems with shape and size, and a number of optimizations that allow the algorithm to handle large data sets are discussed.
Feature bagging for outlier detection
TLDR
A novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed, which combines results from multiple outlier detection algorithms that are applied using different set of features.
Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs
TLDR
A key innovative feature of this parallel formulation is that it utilizes graph coloring to effectively parallelize both the coarsening and the refinement during the uncoarsening phase, making it possible to perform dynamic graph partition in adaptive computations without compromising quality.
Similarity Measures for Categorical Data: A Comparative Evaluation
TLDR
This paper studies the performance of a variety of similarity measures in the context of a specific data mining task: outlier detection and shows that while no one measure dominates others for all types of problems, some measures are able to have consistently high performance.
The Top Ten Algorithms in Data Mining
Identifying some of the most influential algorithms that are widely used in the data mining community, this volume provides a description of each algorithm, discusses the impact of the algorithms,
...
...