Principles of Data Mining

@article{Hand2007PrinciplesOD,
  title={Principles of Data Mining},
  author={David J. Hand and Heikki Mannila and Padhraic Smyth},
  journal={Drug Safety},
  year={2007},
  volume={30},
  pages={621-622}
}
Data mining is the discovery of interesting, unexpected or valuable structures in large datasets. As such, it has two rather different aspects. One of these concerns large-scale, ‘global’ structures, and the aim is to model the shapes, or features of the shapes, of distributions. The other concerns small-scale, ‘local’ structures, and the aim is to detect these anomalies and decide if they are real or chance occurrences. In the context of signal detection in the pharmaceutical sector, most… Expand
A survey of temporal data mining
TLDR
An overview of techniques of temporal data mining is presented, mainly concentrate on algorithms for pattern discovery in sequential data streams, and some recent results regarding statistical analysis of pattern discovery methods are described. Expand
Theoretical Considerations for Data Mining
TLDR
The modern Knowledge Discovery in Databases (KDD) process combines the mathematics used to discover interesting patterns in data with the entire process of extracting data and using resulting models to apply to other data sets to leverage the information for some purpose. Expand
Analytical Classification and Evaluation of Various Approaches in Temporal Data Mining
TLDR
The aim of present study is to introduce, collect and evaluate various algorithms to create a global view over temporal data mining analyses, and believe that suggestive collection can be considerably beneficial in selecting the appropriate algorithm. Expand
Survey of Data Mining and Applications (Review from 1996 to Now)
TLDR
The science of extracting useful information from large data sets or databases is named as data mining, and it covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. Expand
An overview of the use of neural networks for data mining tasks
TLDR
This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. Expand
A survey of interestingness measures for knowledge discovery
  • K. McGarry
  • Computer Science
  • The Knowledge Engineering Review
  • 2005
TLDR
A review of the available literature on the various measures devised for evaluating and ranking the discovered patterns produced by the data mining process and their strengths and weaknesses with respect to the level of user integration within the discovery process is presented. Expand
Data Mining Tools for Exploring Big Data
TLDR
This course introduces data mining through a combination of lectures and examples, and begins with regression, then covers logistic regression, neural networks, and classification and regression trees, with a bit of cluster analysis. Expand
Data Mining Applications: Promise and Challenges
TLDR
Application of data mining is presented as an “experiment” carried out using data mining techniques that result in gaining useful knowledge and insights pertaining to the application domain. Expand
What Is Data Mining and How Does It Work?
TLDR
In this chapter, data mining is position with respect to other data analysis techniques and the most important classes of techniques developed in the area are introduced: pattern mining, classification, and clustering and outlier detection. Expand
Predictive Data mining and discovering hidden values of Data warehouse
TLDR
The paper describes how data mining tools predict future trends and behaviour which allows in making proactive knowledge-driven decisions. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 425 REFERENCES
Mining Very Large Databases
TLDR
A broad range of algorithms are described that address three classical data mining problems: market basket analysis, clustering, and classification that are scalable to very large data sets. Expand
Mining the most interesting rules
TLDR
It is argued that by returning a broader set of rules than previous algorithms, these techniques allow for improved insight into the data and support more user-interaction in the optimized rule-mining process. Expand
Data mining: data analysis on a grand scale?
  • Padhraic Smyth
  • Computer Science, Medicine
  • Statistical methods in medical research
  • 2000
TLDR
A brief review of the origins of data mining is provided as well as discussing some of the primary themes in current research in data mining, including scalable algorithms for massive data sets, discovering novel patterns in data, and analysis of text, web, and related multimedia data sets. Expand
Mathematical Programming in Data Mining
TLDR
A novel approach is proposed that purposely tolerates a small error in the training process in order to avoid overfitting data that may contain errors and is utilized to discover very useful survival curves for breast cancer patients from a medical database. Expand
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach
  • S. Salzberg
  • Computer Science
  • Data Mining and Knowledge Discovery
  • 2004
TLDR
Several phenomena that can, if ignored, invalidate an experimental comparison and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. Expand
What Makes Patterns Interesting in Knowledge Discovery Systems
TLDR
The focus of the paper is on studying subjective measures of interestingness, which are classified into actionable and unexpected, and the relationship between them is examined. Expand
A New SQL-like Operator for Mining Association Rules
TLDR
This work proposes a unifying model that enables a uniform description of the problem of discovering association rules, and provides SQL-like operator, named MINE RULE, which is capable of expressing all the problems presented so far in the literature concerning the mining of association rules. Expand
Mining quantitative association rules in large relational tables
TLDR
This work deals with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary and introduces measures of partial completeness which quantify the information lost due to partitioning. Expand
BIRCH: an efficient data clustering method for very large databases
TLDR
A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases. Expand
A Probabilistic Approach to Fast Pattern Matching in Time Series Databases
TLDR
The proposed approach provides a natural framework to support user-customizable "query by content" on time series data, taking prior domain information into account in a principled manner. Expand
...
1
2
3
4
5
...