Variable selection in neural network regression models with dependent data: a subsampling approach
Neural networks are flexible and powerful data analysis tools for handling complex data patterns. The flexibility is related to their nonlinear nature which, however, can lead to problems of optimization and model adequacy. Results of neural network estimation and prediction are therefore subject to some variability due to the sensitivity to initial conditions, to convergence to local minima and sometimes, more dramatically, to sampling variability. The approach considered here is based on a set of statistical tools used to asses the reliability of the results and to explore the model adequacy of a given neural network. The use of statistical tools makes it possible to obtain an objective measures of the confidence we may have on a specific result and to apply hypotheses tests on these measures (see White and Racine, 2001 inter alia). This mainly involves measures for selecting the topology of the network and for evaluating and improving its predictive accuracy. In all these cases a key issue is to quantify the relevance measure, to estimate its sampling variability and to test a specific hypothesis. Unfortunately, the complexity of the neural network model and of the algorithms used to etimate the parameters makes the use of analitycal tools, even if possible in principle, not feasible in practice. In this paper we focus on a methodology extensively based on the subsampling technique which gives consistent results under quite general and weak assumptions. Moreover, it performs well in non-standard set-ups and, from a theoretical point of view, it is easy to analyze (see La Rocca and Perna, 2003; Fukuchi, 1999 inter alia). Besides these theoretical arguments, it also has substantial computational advantages with respect to alternative resampling techniques such as the bootstrap, since the original measure of performance is only evaluated a number of times, usually much smaller than an equivalent bootstrap approach. Applications to simulated and real data will be discussed.