Share This Author
Mining frequent patterns without candidate generation
This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Data Mining: Concepts and Techniques
This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
- Jiawei Han, J. Pei, Yiwen Yin, Runying Mao
- Computer ScienceSixth IEEE International Conference on Data…
A novel frequent-pattern tree (FP-tree) structure is proposed, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and an efficient FP-tree-based mining method, FP-growth, is developed for mining the complete set of frequent patterns by pattern fragment growth.
gSpan: graph-based substructure pattern mining
A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.
A Framework for Clustering Evolving Data Streams
Data Mining: Concepts and Techniques, 3rd edition
There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
CMAR: accurate and efficient classification based on multiple class-association rules
- Wenmin Li, Jiawei Han, J. Pei
- Computer ScienceProceedings IEEE International Conference on…
- 29 November 2001
The authors propose a new associative classification method, CMAR, i.e., Classification based on Multiple Association Rules, which extends an efficient frequent pattern mining method, FP-growth, constructs a class distribution-associated FP-tree, and mines large databases efficiently.
PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu
- Computer ScienceProceedings of the VLDB Endowment
- 1 August 2011
Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures.
Semi-supervised Discriminant Analysis
- Deng Cai, Xiaofei He, Jiawei Han
- Computer ScienceIEEE International Conference on Computer Vision
- 26 December 2007
This paper proposes a novel method, called Semi- supervised Discriminant Analysis (SDA), which makes use of both labeled and unlabeled samples to learn a discriminant function which is as smooth as possible on the data manifold.
Mining concept-drifting data streams using ensemble classifiers
- Haixun Wang, W. Fan, Philip S. Yu, Jiawei Han
- Computer ScienceKnowledge Discovery and Data Mining
- 24 August 2003
This paper proposes a general framework for mining concept-drifting data streams using weighted ensemble classifiers, and shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.