Ensembled sparse-input hierarchical networks for high-dimensional datasets

  title={Ensembled sparse-input hierarchical networks for high-dimensional datasets},
  author={Jean Feng and Noah Simon},
Neural networks have seen limited use in prediction for high-dimensional data with small sample sizes, because they tend to overfit and require tuning many more hyperparameters than existing off-the-shelf machine learning methods. With small modifications to the network architecture and training procedure, we show that dense neural networks can be a practical data analysis tool in these settings. The proposed method, Ensemble by Averaging Sparse-Input Hierarchical networks (EASIER-net… 

Figures and Tables from this paper


Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification
This manuscript proposes fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features, and characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network.
Modern Neural Networks Generalize on Small Data Sets
In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads
Combined Group and Exclusive Sparsity for Deep Neural Networks
This work proposes an exclusive sparsity regularization based on (1, 2)-norm, which promotes competition for features between different weights, thus enforcing them to fit to disjoint sets of features, and combines theexclusive sparsity with the group sparsity, to promote both sharing and competition for Features in training of a deep neural network.
A Sparse-Group Lasso
A regularized model for linear regression with ℓ1 andℓ2 penalties is introduced and it is shown that it has the desired effect of group-wise and within group sparsity.
Nonparametric variable importance using an augmented neural network with multi-task learning
It is shown how a single augmented neural network with multi-task learning simultaneously estimates the importance of many feature subsets, improving on previous procedures for estimating importance.
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.
Selective prediction-set models with coverage guarantees
This work shows how to calculate point estimates and confidence intervals for the true coverage of any selective prediction-set model, as well as a uniform mixture of K set models obtained from K-fold sample-splitting, which outperforms existing approaches on both in-sample and out-of-sample age groups.
Exploiting sparseness in deep neural networks for large vocabulary speech recognition
The goal of enforcing sparseness as soft regularization and convex constraint optimization problems is formulated, solutions under the stochastic gradient ascent setting are proposed, and novel data structures are proposed to exploit the randomSparseness patterns to reduce model size and computation time.
Learning Structured Sparsity in Deep Neural Networks
The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.
Regularization and variable selection via the elastic net
It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.