Corpus ID: 237940183

Searching for Minimal Optimal Neural Networks

  title={Searching for Minimal Optimal Neural Networks},
  author={Lam Si Tung Ho and Vu C. Dinh},
  • Lam Si Tung Ho, Vu C. Dinh
  • Published 27 September 2021
  • Computer Science, Mathematics
  • ArXiv
Large neural network models have high predictive power but may suffer from overfitting if the training set is not large enough. Therefore, it is desirable to select an appropriate size for neural networks. The destructive approach, which starts with a large architecture and then reduces the size using a Lasso-type penalty, has been used extensively for this task. Despite its popularity, there is no theoretical guarantee for this technique. Based on the notion of minimal neural networks, we… 

Figures from this paper


Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification
Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural
Consistent Feature Selection for Analytic Deep Neural Networks
It is proved that for a wide class of networks, including deep feed-forward neural networks, convolutional neural Networks, and a major sub-class of residual neural network, the Adaptive Group Lasso selection procedure with GroupLasso as the base estimator is selection-consistent.
Asymptotic Properties of Neural Network Sieve Estimators.
Neural networks are one of the most popularly used methods in machine learning and artificial intelligence nowadays. Due to the universal approximation theorem (Hornik et al. (1989)), a neural
Group sparse regularization for deep neural networks
The group Lasso penalty is extended, originally proposed in the linear regression literature, to impose group-level sparsity on the networks connections, where each group is defined as the set of outgoing weights from a unit.
Auto-Sizing Neural Networks: With Applications to n-gram Language Models
A method for automatically adjusting network size by pruning out hidden units through‘1,1 and ‘2,1 regularization’ is introduced and applied to language modeling and it is demonstrated its ability to correctly choose the number of hidden units while maintaining perplexity.
On Model Selection Consistency of Lasso
  • P. Zhao, Bin Yu
  • Mathematics, Computer Science
    J. Mach. Learn. Res.
  • 2006
It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.
Learning the Number of Neurons in Deep Networks
This paper proposes to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron, and shows that this approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.
The Lasso [28] is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables p is potentially much larger than the number of
Consistent estimation of the architecture of multilayer perceptrons
It is proved that suitable information criterion leads to consistent estimation of the true number of hidden units in regression models involving multilayer percep- trons (MLP) with one hidden layer and a Gaussian noise.
The Adaptive Lasso and Its Oracle Properties
The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a