Learn More
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide(More)
This paper proposes a novel approach named AGM to e-ciently mine the association rules among the frequently appearing sub-structures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the extended algorithm of the basket analysis. Its performance has(More)
1. Background and motivation The digital technologies and computer advances with the booming internet uses have led to massive data collection (corporate data, data warehouses, webs, just to name a few) and information (or misinformation) explosion. Szalay and Gray described this phenomenon as " drowning in data " (Szalay and Gray, 1999). They reported that(More)
The need for mining structured data has increased in the past few years. One of the best studied data structures in computer science and discrete mathematics are graphs. It can therefore be no surprise that graph based data mining has become quite popular in the last few years.This article introduces the theoretical basis of graph based data mining and(More)
Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and Web browsing history. There are a few approaches that can(More)
The rapid advance of computer technologies in data processing, collection, and storage has provided unparalleled opportunities to expand capabilities in production, services, communications , and research. However, immense quantities of high-dimensional data renew the challenges to the state-of-the-art data mining techniques. Feature selection is an(More)
We address the problem of estimating the parameters for a continuous time delay independent cascade (CTIC) model, a more realistic model for information diffusion in complex social network, from the observed information diffusion data. For this purpose we formulate the rigorous likelihood to obtain the observed data and propose an iterative method to obtain(More)
Feature selection is a problem of choosing a subset of relevant features. Researchers have been searching for optimal feature selection methods. `Branch and Bound' and Focus are two representatives. In general , only exhaustive search can bring about the optimal subset. However, under certain conditions, exhaustive search can be avoided without sac-riicing(More)