Top 10 algorithms in data mining

  title={Top 10 algorithms in data mining},
  author={Xindong Wu and Vipin Kumar and J. Ross Quinlan and Joydeep Ghosh and Qiang Yang and Hiroshi Motoda and Geoffrey J. McLachlan and Angus F. M. Ng and B. Liu and Philip S. Yu and Zhi-Hua Zhou and Michael S. Steinbach and David J. Hand and Dan Steinberg},
  journal={Knowledge and Information Systems},
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. [] Key Method With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining…
A Survey On Data Mining Algorithm
This paper puts forward the 8 most used data mining algorithms used in the research field which are: C4.5, k-Means, SVM, EM, PageRank, Apriori, kNN and CART. With each algorithm, a basic explanation
Classification Techniques in Data Mining : A Review
This survey paper is to provide a comprehensive review of different classification techniques in data mining and provides a survey of numerous data mining classification techniques for innovative database applications.
Survey on Classification Algorithms for Data Mining:(Comparison and Evaluation)
A comparison among three classification’s algorithms will be studied, these are (K- Nearest Neighbor classifier, Decision tree and Bayesian network) and the strength and accuracy of each algorithm for classification in terms of performance efficiency and time complexity required are demonstrated.
Comparison of Classification Algorithms using WEKA on Various Datasets
Using WEKA, an open source data mining tool which includes implementation of data mining algorithms, the ADTree, Bayes Network, Decision Table, J48, Logistic, Naive Bayes, NBTree, PART, RBFNetwork and SMO algorithms are compared.
An Empirical Comparison of Data Mining Classification Methods
A comparative study of the performance of C4.5, Naive Bayes, SVM and KNN Classification Algorithms is performed.
A Comparative Analysis of Association Rule Mining Algorithms in Data Mining: A Study
This paper presents the extensive study of various Association Rule mining algorithms and its comparisons and compared the ARM algorithms based on the merits, demerits, data support and speed.
A Taxonomy of Data Mining Problems
The author describes the progress made in developing data mining techniques and then classify them in terms of data mining problems taxonomy to help assist practitioners in using appropriate datamining techniques that solve business problems.
Review of Association Rule Mining Using Apriori Algorithm
In this work, an efficient mining based algorithm for rule generation is presented and by using Apriori algorithm the precision and recall and F-measure value are improved.
Performance Analysis of Various Data Mining Algorithms: A Review
The comparison between the classifiers by accuracy which shows ruleset classifier have higher accuracy when implement in weka is shown, and these algorithms useful in increasing sales and performance of industries like banking, insurance, medical etc and also detect fraud and intrusion for assistance of society.
Data mining Algorithm ’ s Variant Analysis 1
This paper examines the three noteworthy information mining calculations: Association, classification and clustering, utilizing WEKA apparatus.


10 Challenging Problems in Data Mining Research
This short article serves to summarize the 10 most challenging problems of the 14 responses the authors have received from this survey, by consulting some of the most active researchers in data mining and machine learning.
Fast Algorithms for Mining Association Rules
Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Multiple labels associative classification
The problem of producing rules with multiple labels is investigated, and a multi-class, multi-label associative classification approach (MMAC) is proposed that is an accurate and effective classification technique, highly competitive and scalable if compared with other traditional and associative Classification approaches.
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
A Weight Adjusted k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique and two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality.
Mining frequent patterns without candidate generation
This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Tree-based partitioning of date for association rule mining
This paper describes a partitioning approach which organises the data into tree structures that can be processed independently and presents experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds.
Non-redundant data clustering
An extension of the information bottleneck framework, called coordinated conditional information bottleneck, is presented, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints.
Finding centroid clusterings with entropy-based criteria
A series of entropy-based distance functions for comparing various clusterings enable us to directly select the local centroid from the candidate set and two combining methods for the global centroid are presented.
BIRCH: an efficient data clustering method for very large databases
A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases.
Adding the temporal dimension to search - a case study in publication search
  • Philip S. Yu, Xin Li, B. Liu
  • Computer Science
    The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)
  • 2005
This paper studies the temporal dimension of search in the context of research publication, and a number of methods are proposed to deal with the problem based on analyzing the behavior history and the source of each publication.