Share This Author
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.
On the information bottleneck theory of deep learning
This work studies the information bottleneck (IB) theory of deep learning, and finds that there is no evident causal connection between compression and generalization: networks that do not compress are still capable of generalization, and vice versa.
High-dimensional dynamics of generalization error in neural networks
Measuring Invariances in Deep Networks
- Ian J. Goodfellow, Quoc V. Le, Andrew M. Saxe, Honglak Lee, A. Ng
- Computer ScienceNIPS
- 7 December 2009
A number of empirical tests are proposed that directly measure the degree to which these learned features are invariant to different input transformations and find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images and convolutional deep belief networks learn substantially more invariant Features in each layer.
On Random Weights and Unsupervised Feature Learning
- Andrew M. Saxe, Pang Wei Koh, Zhenghao Chen, M. Bhand, B. Suresh, A. Ng
- Computer ScienceICML
- 28 June 2011
The answer is that certain convolutional pooling architectures can be inherently frequency selective and translation invariant, even with random weights, and the viability of extremely fast architecture search is demonstrated by using random weights to evaluate candidate architectures, thereby sidestepping the time-consuming learning process.
A deep learning framework for neuroscience
- Blake A. Richards, T. Lillicrap, Konrad Paul Kording
- Computer ScienceNature Neuroscience
- 2 October 2019
It is argued that a deep network is best understood in terms of components used to design it—objective functions, architecture and learning rules—rather than unit-by-unit computation.
Acquisition of decision making criteria: reward rate ultimately beats accuracy
It is found that an accuracy bias dominates early performance, but diminishes greatly with practice, and the residual discrepancy between optimal and observed performance can be explained by an adaptive response to uncertainty in time estimation.
A mathematical theory of semantic development in deep neural networks
- Andrew M. Saxe, James L. McClelland, S. Ganguli
- Computer ScienceProceedings of the National Academy of Sciences
- 23 October 2018
Notably, this simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.
Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup
- Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, F. Krzakala, L. Zdeborová
- Computer ScienceNeurIPS
- 18 June 2019
This work shows how the dynamics of stochastic gradient descent is captured by a set of differential equations and proves that this description is asymptotically exact in the limit of large inputs.
Active Long Term Memory Networks
The Active Long Term Memory Networks (A-LTM), a model of sequential multi-task deep learning that is able to maintain previously learned association between sensory input and behavioral output while acquiring knew knowledge, is introduced.