• Publications
  • Influence
From Data Mining to Knowledge Discovery in Databases
An overview of this emerging field is provided, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. Expand
Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning
This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals. Expand
From Data Mining to Knowledge Discovery: An Overview
The KDD process for extracting useful knowledge from volumes of data
A new generation of computational techniques and tools is required to support the extraction of useful knowledge from the rapidly growing volumes of data, the subject of the emerging field of knowledge discovery in databases (KDD) and data mining. Expand
Advances in Knowledge Discovery and Data Mining
This talk will discuss the issues and focus on how to mine evolving data streams and preserve privacy and the characteristics of the data stream can change over time and the evolving pattern needs to be captured. Expand
Refining Initial Points for K-Means Clustering
A procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution that allows the iterative algorithm to converge to a “better” local minimum. Expand
Knowledge Discovery and Data Mining: Towards a Unifying Framework
The KDD process and basic data mining algorithms are defined, links between data mining, knowledge discovery, and other related fields are described, and an analysis of challenges facing practitioners in the field is analyzed. Expand
Scaling Clustering Algorithms to Large Databases
A scalable clustering framework applicable to a wide class of iterative clustering that requires at most one scan of the database and is instantiated and numerically justified with the popular K-Means clustering algorithm. Expand
Hierarchical Clustering Algorithms for Document Datasets
The experimental evaluation shows that, contrary to the common belief, partitional algorithms always lead to better solutions than agglomerative algorithms; making them ideal for clustering large document collections due to not only their relatively low computational requirements, but also higher clustering quality. Expand