A Reality Check for Data Snooping

@article{White2000ARC,
  title={A Reality Check for Data Snooping},
  author={Halbert L. White},
  journal={Econometrica},
  year={2000},
  volume={68},
  pages={1097-1126}
}
  • H. White
  • Published 1 September 2000
  • Computer Science
  • Econometrica
Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the method yielding the results. This problem is practically unavoidable in the analysis of time-series data, as typically only a single history measuring a given phenomenon of interest is available for analysis. It is… 

Figures from this paper

FORMALIZED DATA SNOOPING BASED ON GENERALIZED ERROR RATES
TLDR
A review of a number of recent proposals from the statistical literature and how these procedures apply to the general problem of model selection and how to decide which hypotheses to reject is discussed.
Rule Profitability , Data Snooping , and Reality Check : Evidence from the Foreign Exchange Market
We report evidence on the profitability and statistical significance among 2,127 technical trading rules. The best rules are found to be significantly profitable based on standard tests. We then
A Simple Adjustment for Bandwidth Snooping
Kernel-based estimators are often evaluated at multiple bandwidths as a form of sensitivity analysis. However, if in the reported results, a researcher selects the bandwidth based on this analysis,
A Reality Check for Credit Default Models
We propose a model selection methodology for credit default modeling in the presence of a large number of variables and candidate models. Accurate credit default models are critical to financial
The White Reality Check and Some of Its Recent Extensions ∗
TLDR
This chapter discusses several recent extensions of the Reality Check approach, and introduces a novel approach in which forecast combinations are evaluated via the examination of the quantiles of the expected loss distribution.
Robust trading rule selection and forecasting accuracy
TLDR
The approach to curb the data-snooping bias consists of constructing a framework for trading rule selection using a-priori robustness strategies, where robustness is gauged on the basis of time-series bootstrap and multi-objective criteria.
Asset Allocation Strategies, Data Snooping, and the 1/N Rule
Using a series of advanced tests from White’s (2000) “Reality Check” to correct for data-snooping bias, we assess the out-of-sample performance of various portfolio strategies relative to the naive
...
...

References

SHOWING 1-10 OF 54 REFERENCES
Model uncertainty, data mining and statistical inference
TLDR
The effects of model uncertainty, such as too narrow prediction intervals, and the non-trivial biases in parameter estimates which can follow data-based modelling are reviewed.
Statistical significance tests.
  • D. Cox
  • Economics
    British journal of clinical pharmacology
  • 1982
TLDR
The view taken in this paper is that such tests play an important but nevertheless strictly limited role in the critical analysis of data and should be separated from critical discussion of the underlying concepts.
Data Mining: Statistics and More?
Abstract Data mining is a new discipline lying at the interface of statistics, database technology, pattern recognition, machine learning, and other areas. It is concerned with the secondary analysis
Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions
In this article, the construction of confidence regions by approximating the sampling distribution of some statistic is studied. The true sampling distribution is estimated by an appropriate
Discussion contribution on ‘Data mining reconsidered: encompassing and the general‐to‐specific approach to specification search’ by Hoover and Perez
This is an interesting paper, which explores the effectiveness of elaborate search procedures for recovering the models which generated data sets. The aim of the paper is to answer the question
The Bonferroni and the Scheffé multiple comparison procedures
where b = (X'X)Y'X'y is the least squares estimator of fl. It is easily shown that z is N(6, o_2 V) where V = R (X'X) 'R'. The usual unbiased estimator of o.2 is s2= (y -Xb)'(y -Xb)/(Tk). In
The stationary bootstrap
Abstract This article introduces a resampling procedure called the stationary bootstrap as a means of calculating standard errors of estimators and constructing confidence regions for parameters
Can Stock Market Forecasters Forecast
THIS paper presents results of analyses of the forecasting efforts of 45 professional agencies which have attempted, either to select specific common stocks which should prove superior in investment
Let's Take the Con Out of Econometrics
Econometricians would like to project the image of agricultural experimenters who divide a farm into a set of smaller plots of land and who select randomly the level of fertilizer to be used on each
The Effect of Model Selection on Confidence Regions and Prediction Regions
Pötscher (1991, Econometric Theory7, 163–181) has recently considered the question of how the use of a model selection procedure affects the asymptotic distribution of parameter estimators and
...
...