• Corpus ID: 8307266

Learning with Pseudo-Ensembles

@article{Bachman2014LearningWP,
  title={Learning with Pseudo-Ensembles},
  author={Philip Bachman and Ouais Alsharif and Doina Precup},
  journal={ArXiv},
  year={2014},
  volume={abs/1412.4864}
}
We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models spawned from a parent model by perturbing it according to some noise process. [] Key Method We present a novel regularizer based on making the behavior of a pseudo-ensemble robust with respect to the noise process generating it. In the fully-supervised setting, our regularizer matches the performance of dropout. But, unlike dropout, our regularizer naturally extends to the semi-supervised setting, where it produces…

Figures and Tables from this paper

Learning Non-deterministic Representations with Energy-based Ensembles
TLDR
Inspired by the stochasticity of the synaptic connections in the brain, this work introduces Energy-based Stochastic Ensembles, which can learn non-deterministic representations, i.e., mappings from the feature space to a family of distributions in the latent space.
L EARNING N ON - DETERMINISTIC R EPRESENTATIONS WITH E NERGY - BASED E NSEMBLES
TLDR
Inspired by the stochasticity of the synaptic connections in the brain, this work introduces Energy-based Stochastic Ensembles, which can learn non-deterministic representations, i.e., mappings from the feature space to a family of distributions in the latent space.
Temporal Ensembling for Semi-Supervised Learning
TLDR
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.
Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse Independent Subnetworks
TLDR
This work introduces a fast, low-cost method for creating ensembles of neural networks without needing to train multiple models from scratch, by first training a single parent network and dramatically pruning the parameters of each child to create an ensemble of members with unique and diverse topologies.
Consistency Regularization for Variational Auto-Encoders
TLDR
A regularization method to enforce consistency in variational auto-encoders and can even outperform the triplet loss, an advanced and popular contrastive learning-based method for representation learning.
Deep Ensembles for Low-Data Transfer Learning
TLDR
This work shows that the nature of pre-training itself is a performant source of diversity, and proposes a practical algorithm that efficiently identifies a subset ofPre-trained models for any downstream dataset and achieves state-of-the-art performance at a much lower inference budget.
Conformal Credal Self-Supervised Learning
TLDR
The construction of credal sets of labels is supported by a rigorous theoretical foundation, leading to better calibrated and less error-prone supervision for unlabeled data, and makes use of conformal prediction, an approach that comes with guarantees on the validity of set-valued predictions.
VFunc: a Deep Generative Model for Functions
TLDR
A deep generative model for functions that provides a joint distribution p(f, z) over functions f and latent variables z which lets us efficiently sample from the marginal p( f) and maximize a variational lower bound on the entropy H(f).
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
TLDR
This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
Credal Self-Supervised Learning
TLDR
The key idea is to let the learner itself iteratively generate “pseudo-supervision” for unlabeled instances based on its current hypothesis, and to learn from weakly labeled data of that kind, the authors leverage methods that have recently been proposed in the realm of so-called superset learning.
...
...

References

SHOWING 1-10 OF 27 REFERENCES
Deep Generative Stochastic Networks Trainable by Backprop
TLDR
Theorems that generalize recent work on the probabilistic interpretation of denoising autoencoders are provided and obtain along the way an interesting justification for dependency networks and generalized pseudolikelihood.
Learning Ordered Representations with Nested Dropout
TLDR
Nested dropout, a procedure for stochastically removing coherent nested sets of hidden units in a neural network, is introduced and it is rigorously shown that the application of nested dropout enforces identifiability of the units, which leads to an exact equivalence with PCA.
An empirical analysis of dropout in piecewise linear networks
TLDR
This work empirically investigate several questions related to the efficacy of dropout, specifically as it concerns networks employing the popular rectified linear activation function, and investigates an alternative criterion based on a biased estimator of the maximum likelihood ensemble gradient.
Understanding Dropout
TLDR
A general formalism for studying dropout on either units or connections, with arbitrary probability values, is introduced and used to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks.
Learning with Marginalized Corrupted Features
TLDR
This work proposes to corrupt training examples with noise from known distributions within the exponential family and presents a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution.
Dropout Training as Adaptive Regularization
TLDR
By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
Extracting and composing robust features with denoising autoencoders
TLDR
This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Large-Scale Feature Learning With Spike-and-Slab Sparse Coding
TLDR
This work introduces a new feature learning and extraction procedure based on a factor model the authors call spike-and-slab sparse coding (S3C), and presents a novel inference procedure for appropriate for use with GPUs which allows to dramatically increase both the training set size and the amount of latent factors that S3C may be trained with.
The Manifold Tangent Classifier
TLDR
A representation learning algorithm can be stacked to yield a deep architecture and it is shown how it builds a topological atlas of charts, each chart being characterized by the principal singular vectors of the Jacobian of a representation mapping.
...
...