Leakage in data mining: formulation, detection, and avoidance

  title={Leakage in data mining: formulation, detection, and avoidance},
  author={Shachar Kaufman and Saharon Rosset and Claudia Perlich},
Deemed "one of the top ten data mining mistakes", leakage is essentially the introduction of information about the data mining target, which should not be legitimately available to mine from. In addition to our own industry experience with real-life projects, controversies around several major public data mining competitions held recently such as the INFORMS 2010 Data Mining Challenge and the IJCNN 2011 Social Network Challenge are evidence that this issue is as relevant today as it has ever… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.


Publications referenced by this paper.
Showing 1-3 of 3 references

Exploratory data analysis

Addison-Wesley series in behavioral science : quantitative methods • 1977
View 6 Excerpts
Highly Influenced

Medical data mining and knowledge discovery.

IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society • 2000
View 5 Excerpts
Highly Influenced