Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification
@article{Feng2017SparseInputNN, title={Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification}, author={Jean Feng and Noah Simon}, journal={arXiv: Methodology}, year={2017} }
Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural networks can approximate complex multivariate functions, they generally require a large number of training observations to obtain reasonable fits, unless one can learn the appropriate network structure. In this manuscript, we show that neural networks can be applied successfully to high-dimensional…
55 Citations
Statistical Aspects of High-Dimensional Sparse Artificial Neural Network Models
- Computer ScienceMach. Learn. Knowl. Extr.
- 2020
This paper investigates the theoretical properties of the sparse group lasso regularized neural network and shows that under mild conditions, the classification risk converges to the optimal Bayes classifier’s risk (universal consistency), and proposes a variation on the regularization term.
Ensembled sparse-input hierarchical networks for high-dimensional datasets
- Computer ScienceStatistical Analysis and Data Mining: The ASA Data Science Journal
- 2022
On a collection of real-world datasets with different sizes, EASIER-net selected network architectures in a data-adaptive manner and achieved higher prediction accuracy than off-the-shelf methods on average.
On the Classification Consistency of High-Dimensional Sparse Neural Network
- Computer Science2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
- 2019
This paper considers a single layer ANN classification model that is suitable for low training sample and shows that under mild conditions, the classification risk converges to the optimal Bayes classifier risk (universal consistency) under sparse group lasso regularization.
LassoNet: Neural Networks with Feature Sparsity
- Computer ScienceAISTATS
- 2021
This work introduces LassoNet, a neural network framework with global feature selection that uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly, and delivers an entire regularization path of solutions with a range of feature sparsity.
LassoNet: A Neural Network with Feature Sparsity
- Computer ScienceJ. Mach. Learn. Res.
- 2021
LassoNet is introduced, a neural network framework with global feature selection that uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly, and delivers an entire regularization path of solutions with a range of feature sparsity.
Searching for Minimal Optimal Neural Networks
- Computer ScienceStatistics & Probability Letters
- 2022
Consistent Feature Selection for Analytic Deep Neural Networks
- Computer ScienceNeurIPS
- 2020
It is proved that for a wide class of networks, including deep feed-forward neural networks, convolutional neural Networks, and a major sub-class of residual neural network, the Adaptive Group Lasso selection procedure with GroupLasso as the base estimator is selection-consistent.
Locally Sparse Networks for Interpretable Predictions
- Computer ScienceArXiv
- 2021
This work proposes a framework for training locally sparse neural networks where the local sparsity is learned via a sample-specific gating mechanism that identifies the subset of most relevant features for each measurement.
A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks
- Computer Science
- 2022
A warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem of sparse linear associations and exhibits a phase transition in the probability of retrieving the needles, which is not observed with other ANN learners.
Let the Data Choose its Features: Differentiable Unsupervised Feature Selection
- Computer ScienceArXiv
- 2020
A differentiable loss function which combines the graph Laplacian with a gating mechanism based on continuous approximation of Bernoulli random variables is proposed, tailored for the task of clustering.
References
SHOWING 1-10 OF 63 REFERENCES
Breaking the Curse of Dimensionality with Convex Neural Networks
- Computer ScienceJ. Mach. Learn. Res.
- 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network
- Computer ScienceIEEE Trans. Inf. Theory
- 1998
Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights.
Group sparse regularization for deep neural networks
- Computer ScienceNeurocomputing
- 2017
ℓ1-penalization for mixture regression models
- Mathematics, Computer Science
- 2010
We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an ℓ1-penalized maximum…
Neural Network Learning - Theoretical Foundations
- Computer Science
- 1999
The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.
Universal approximation bounds for superpositions of a sigmoidal function
- Computer ScienceIEEE Trans. Inf. Theory
- 1993
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.
Learning the Number of Neurons in Deep Networks
- Computer ScienceNIPS
- 2016
This paper proposes to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron, and shows that this approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.
Understanding deep learning requires rethinking generalization
- Computer ScienceICLR
- 2017
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
A Sparse-Group Lasso
- Computer Science
- 2013
A regularized model for linear regression with ℓ1 andℓ2 penalties is introduced and it is shown that it has the desired effect of group-wise and within group sparsity.
Approximation and estimation bounds for artificial neural networks
- Computer ScienceMachine Learning
- 2004
The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.