• Publications
  • Influence
Mining frequent patterns without candidate generation
TLDR
This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Expand
Data Mining: Concepts and Techniques
TLDR
This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. Expand
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
TLDR
A novel frequent-pattern tree (FP-tree) structure is proposed, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and an efficient FP-tree-based mining method, FP-growth, is developed for mining the complete set of frequent patterns by pattern fragment growth. Expand
gSpan: graph-based substructure pattern mining
  • Xifeng Yan, Jiawei Han
  • Mathematics, Computer Science
  • IEEE International Conference on Data Mining…
  • 9 December 2002
TLDR
A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Expand
A Framework for Clustering Evolving Data Streams
TLDR
A fundamentally different philosophy for data stream clustering is discussed which is guided by application-centered requirements and uses the concepts of a pyramidal time frame in conjunction with a microclustering approach. Expand
CMAR: accurate and efficient classification based on multiple class-association rules
TLDR
The authors propose a new associative classification method, CMAR, i.e., Classification based on Multiple Association Rules, which extends an efficient frequent pattern mining method, FP-growth, constructs a class distribution-associated FP-tree, and mines large databases efficiently. Expand
Data Mining: Concepts and Techniques, 3rd edition
TLDR
There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99]. Expand
PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
TLDR
Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. Expand
Semi-supervised Discriminant Analysis
  • Deng Cai, X. He, Jiawei Han
  • Mathematics, Computer Science
  • IEEE 11th International Conference on Computer…
  • 26 December 2007
TLDR
This paper proposes a novel method, called Semi- supervised Discriminant Analysis (SDA), which makes use of both labeled and unlabeled samples to learn a discriminant function which is as smooth as possible on the data manifold. Expand
Mining concept-drifting data streams using ensemble classifiers
TLDR
This paper proposes a general framework for mining concept-drifting data streams using weighted ensemble classifiers, and shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models. Expand
...
1
2
3
4
5
...