Experiments with Stochastic Gradient Descent: Condensations of the Real line


It is well-known that training Restricted Boltzmann Machines (RBMs) can be difficult in practice. In the realm of stochastic gradient methods, several tricks have been used to obtain faster convergence. These include gradient averaging (known as momentum), averaging the parameters w, and different schedules for decreasing the “learning rate” parameter. In this article, we explore the use of continuous bijective transformations of the parameter space (“condensations”), which effectively amounts to making each parameter’s learning rate a function of its current location on the real line. We report on experiments applying condensations to Hinton & Salakhutdinov’s (2006) Contrastive Divergence procedure on the MNIST dataset, and show a statistically-significant improvement relative to constant and inverse-log schedules of the learning rate.

8 Figures and Tables

Cite this paper

@inproceedings{Lacerda2009ExperimentsWS, title={Experiments with Stochastic Gradient Descent: Condensations of the Real line}, author={Gustavo Lacerda}, year={2009} }