Ensembled sparse-input hierarchical networks for high-dimensional datasets

  title={Ensembled sparse-input hierarchical networks for high-dimensional datasets},
  author={Jean Feng and Noah Simon},
Neural networks have seen limited use in prediction for high-dimensional data with small sample sizes, because they tend to overfit and require tuning many more hyperparameters than existing off-the-shelf machine learning methods. With small modifications to the network architecture and training procedure, we show that dense neural networks can be a practical data analysis tool in these settings. The proposed method, Ensemble by Averaging Sparse-Input Hierarchical networks (EASIER-net… 

Figures and Tables from this paper



Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

This manuscript proposes fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features, and characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network.

Modern Neural Networks Generalize on Small Data Sets

In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads

Learning Sparse Neural Networks through L0 Regularization

A practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero, which allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way.

A Sparse-Group Lasso

A regularized model for linear regression with ℓ1 andℓ2 penalties is introduced and it is shown that it has the desired effect of group-wise and within group sparsity.

Nonparametric variable importance using an augmented neural network with multi-task learning

It is shown how a single augmented neural network with multi-task learning simultaneously estimates the importance of many feature subsets, improving on previous procedures for estimating importance.

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

This work proposes an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates.

Understanding deep learning requires rethinking generalization

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Selective prediction-set models with coverage guarantees

This work shows how to calculate point estimates and confidence intervals for the true coverage of any selective prediction-set model, as well as a uniform mixture of K set models obtained from K-fold sample-splitting, which outperforms existing approaches on both in-sample and out-of-sample age groups.

Exploiting sparseness in deep neural networks for large vocabulary speech recognition

The goal of enforcing sparseness as soft regularization and convex constraint optimization problems is formulated, solutions under the stochastic gradient ascent setting are proposed, and novel data structures are proposed to exploit the randomSparseness patterns to reduce model size and computation time.