• Corpus ID: 28030076

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

@article{Feng2017SparseInputNN,
  title={Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification},
  author={Jean Feng and Noah Simon},
  journal={arXiv: Methodology},
  year={2017}
}
Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural networks can approximate complex multivariate functions, they generally require a large number of training observations to obtain reasonable fits, unless one can learn the appropriate network structure. In this manuscript, we show that neural networks can be applied successfully to high-dimensional… 

Figures and Tables from this paper

Statistical Aspects of High-Dimensional Sparse Artificial Neural Network Models
TLDR
This paper investigates the theoretical properties of the sparse group lasso regularized neural network and shows that under mild conditions, the classification risk converges to the optimal Bayes classifier’s risk (universal consistency), and proposes a variation on the regularization term.
Ensembled sparse-input hierarchical networks for high-dimensional datasets
  • Jean Feng, N. Simon
  • Computer Science
    Statistical Analysis and Data Mining: The ASA Data Science Journal
  • 2022
TLDR
On a collection of real-world datasets with different sizes, EASIER-net selected network architectures in a data-adaptive manner and achieved higher prediction accuracy than off-the-shelf methods on average.
On the Classification Consistency of High-Dimensional Sparse Neural Network
  • Kaixu Yang, T. Maiti
  • Computer Science
    2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
  • 2019
TLDR
This paper considers a single layer ANN classification model that is suitable for low training sample and shows that under mild conditions, the classification risk converges to the optimal Bayes classifier risk (universal consistency) under sparse group lasso regularization.
LassoNet: Neural Networks with Feature Sparsity
TLDR
This work introduces LassoNet, a neural network framework with global feature selection that uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly, and delivers an entire regularization path of solutions with a range of feature sparsity.
LassoNet: A Neural Network with Feature Sparsity
TLDR
LassoNet is introduced, a neural network framework with global feature selection that uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly, and delivers an entire regularization path of solutions with a range of feature sparsity.
Searching for Minimal Optimal Neural Networks
Consistent Feature Selection for Analytic Deep Neural Networks
TLDR
It is proved that for a wide class of networks, including deep feed-forward neural networks, convolutional neural Networks, and a major sub-class of residual neural network, the Adaptive Group Lasso selection procedure with GroupLasso as the base estimator is selection-consistent.
Locally Sparse Networks for Interpretable Predictions
TLDR
This work proposes a framework for training locally sparse neural networks where the local sparsity is learned via a sample-specific gating mechanism that identifies the subset of most relevant features for each measurement.
A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks
TLDR
A warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem of sparse linear associations and exhibits a phase transition in the probability of retrieving the needles, which is not observed with other ANN learners.
Let the Data Choose its Features: Differentiable Unsupervised Feature Selection
TLDR
A differentiable loss function which combines the graph Laplacian with a gating mechanism based on continuous approximation of Bernoulli random variables is proposed, tailored for the task of clustering.
...
...

References

SHOWING 1-10 OF 63 REFERENCES
Breaking the Curse of Dimensionality with Convex Neural Networks
  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
TLDR
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network
TLDR
Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.
Group sparse regularization for deep neural networks
ℓ1-penalization for mixture regression models
We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an ℓ1-penalized maximum
Neural Network Learning - Theoretical Foundations
TLDR
The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.
Universal approximation bounds for superpositions of a sigmoidal function
  • A. Barron
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1993
TLDR
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.
Learning the Number of Neurons in Deep Networks
TLDR
This paper proposes to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron, and shows that this approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
A Sparse-Group Lasso
TLDR
A regularized model for linear regression with ℓ1 andℓ2 penalties is introduced and it is shown that it has the desired effect of group-wise and within group sparsity.
Approximation and estimation bounds for artificial neural networks
  • A. Barron
  • Computer Science
    Machine Learning
  • 2004
TLDR
The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.
...
...