Corpus ID: 7576447

On Herding in Deep Networks

  title={On Herding in Deep Networks},
  author={L. V. D. Maaten},
Maximum likelihood learning in Markov Random Fields (MRFs) with multiple layers of hidden units is typically performed using contrastive divergence or one of its variants. After learning, samples from the model are generally used to estimate expectations under the model distribution. Recently, Welling proposed a new approach to working with MRFs with a single layer of hidden units. The approach, called herding, tries to combine the two stages, learning and sampling, into a single stage. Herding… Expand

Figures and Tables from this paper


Exploring Strategies for Training Deep Neural Networks
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. Expand
Using fast weights to improve persistent contrastive divergence
It is shown that the weight updates force the Markov chain to mix fast, and using this insight, an even faster mixing chain is developed that uses an auxiliary set of "fast weights" to implement a temporary overlay on the energy landscape. Expand
A Fast Learning Algorithm for Deep Belief Nets
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. Expand
Deep Boltzmann Machines
A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass. Expand
Herding Dynamic Weights for Partially Observed Random Field Models
An algorithm to generate complex dynamics for parameters and (both visible and hidden) state vectors is introduced and it is shown that under certain conditions averages compute over trajectories of the proposed dynamical system converge to averages computed over the data. Expand
Modeling image patches with a directed hierarchy of Markov random fields
An efficient learning procedure for multilayer generative models that combine the best aspects of Markov random fields and deep, directed belief nets is described and it is shown that this type of model is good at capturing the statistics of patches of natural images. Expand
Herding dynamical weights to learn
  • M. Welling
  • Mathematics, Computer Science
  • ICML '09
  • 2009
A new "herding" algorithm is proposed which directly converts observed moments into a sequence of pseudo-samples. The pseudo-samples respect the moment constraints and may be used to estimateExpand
Learning Deep Architectures for AI
The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed. Expand
Scaling learning algorithms towards AI
It is argued that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence. Expand
Training Products of Experts by Minimizing Contrastive Divergence
A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Expand