• Publications
  • Influence
Continual Learning Through Synaptic Intelligence
This study introduces intelligent synapses that bring some of this biological complexity into artificial neural networks, and shows that it dramatically reduces forgetting while maintaining computational efficiency. Expand
Deep Knowledge Tracing
The utility of using Recurrent Neural Networks to model student learning and the learned model can be used for intelligent curriculum design and allows straightforward interpretation and discovery of structure in student tasks are explored. Expand
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. Expand
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand
Exponential expressivity in deep neural networks through transient chaos
The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions. Expand
Deep Information Propagation
The presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks, and a mean field theory for backpropagation is developed that shows that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively. Expand
On the Expressive Power of Deep Neural Networks
We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.Expand
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
This work uses powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian, and reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Expand
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. Expand
Memory traces in dynamical systems
The Fisher Memory Curve is introduced as a measure of the signal-to-noise ratio (SNR) embedded in the dynamical state relative to the input SNR and it is illustrated the generality of the theory by showing that memory in fluid systems can be sustained by transient nonnormal amplification due to convective instability or the onset of turbulence. Expand