Top 10 algorithms in data mining

  title={Top 10 algorithms in data mining},
  author={Xindong Wu and Vipin Kumar and J. Ross Quinlan and Joydeep Ghosh and Qiang Yang and Hiroshi Motoda and Geoffrey J. McLachlan and Angus F. M. Ng and B. Liu and Philip S. Yu and Zhi-Hua Zhou and Michael S. Steinbach and David J. Hand and Dan Steinberg},
  journal={Knowledge and Information Systems},
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. [] Key Method With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining…
Classification Techniques in Data Mining : A Review
This survey paper is to provide a comprehensive review of different classification techniques in data mining and provides a survey of numerous data mining classification techniques for innovative database applications.
Survey on Classification Algorithms for Data Mining:(Comparison and Evaluation)
A comparison among three classification’s algorithms will be studied, these are (K- Nearest Neighbor classifier, Decision tree and Bayesian network) and the strength and accuracy of each algorithm for classification in terms of performance efficiency and time complexity required are demonstrated.
An Empirical Comparison of Data Mining Classification Methods
A comparative study of the performance of C4.5, Naive Bayes, SVM and KNN Classification Algorithms is performed.
A Comparative Analysis of Association Rule Mining Algorithms in Data Mining: A Study
This paper presents the extensive study of various Association Rule mining algorithms and its comparisons and compared the ARM algorithms based on the merits, demerits, data support and speed.
A Taxonomy of Data Mining Problems
The author describes the progress made in developing data mining techniques and then classify them in terms of data mining problems taxonomy to help assist practitioners in using appropriate datamining techniques that solve business problems.
Review of Association Rule Mining Using Apriori Algorithm
In this work, an efficient mining based algorithm for rule generation is presented and by using Apriori algorithm the precision and recall and F-measure value are improved.
Performance Analysis of Various Data Mining Algorithms: A Review
The comparison between the classifiers by accuracy which shows ruleset classifier have higher accuracy when implement in weka is shown, and these algorithms useful in increasing sales and performance of industries like banking, insurance, medical etc and also detect fraud and intrusion for assistance of society.
Data mining Algorithm ’ s Variant Analysis 1
This paper examines the three noteworthy information mining calculations: Association, classification and clustering, utilizing WEKA apparatus.
Study of the performance of the K* Algorithm in International Databases
An experimental study of K* algorithm was compared with five classification algorithms of the top ten data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM), which are C4.5, SVM, kNN, Naive Bayes and CART.
A Survey on Decision Tree Algorithm for Classification
Focus is provided on the various algorithms of Decision tree their characteristic, challenges, advantage and disadvantage, and their merits and weaknesses.


10 Challenging Problems in Data Mining Research
This short article serves to summarize the 10 most challenging problems of the 14 responses the authors have received from this survey, by consulting some of the most active researchers in data mining and machine learning.
Fast Algorithms for Mining Association Rules
Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Multiple labels associative classification
The problem of producing rules with multiple labels is investigated, and a multi-class, multi-label associative classification approach (MMAC) is proposed that is an accurate and effective classification technique, highly competitive and scalable if compared with other traditional and associative Classification approaches.
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
A Weight Adjusted k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique and two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality.
Mining frequent patterns without candidate generation
This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Tree-based partitioning of date for association rule mining
This paper describes a partitioning approach which organises the data into tree structures that can be processed independently and presents experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds.
BIRCH: an efficient data clustering method for very large databases
A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases.
Fast and exact out-of-core and distributed k-means clustering
This paper presents a new algorithm, called fast and exact k-means clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset and provably produces the same cluster centres as reported by the original k-Means algorithm.
PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth
  • J. PeiJiawei Han M. Hsu
  • Computer Science
    Proceedings 17th International Conference on Data Engineering
  • 2001
This work proposes a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected - Ettern_ mining), which explores prejxprojection in sequential pattern Mining, and shows that Pre fixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.
CanTree: a canonical-order tree for incremental frequent-pattern mining
A novel tree structure, called CanTree (canonical-order tree), that captures the content of the transaction database and orders tree nodes according to some canonical order, which can be easily maintained when database transactions are inserted, deleted, and/or modified.