Adam: A Method for Stochastic Optimization
- Diederik P. Kingma, Jimmy Ba
- Computer ScienceInternational Conference on Learning…
- 22 December 2014
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Ke Xu, Jimmy Ba, Yoshua Bengio
- Computer ScienceInternational Conference on Machine Learning
- 10 February 2015
An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.
Layer Normalization
- Jimmy Ba, J. Kiros, Geoffrey E. Hinton
- Computer ScienceArXiv
- 21 July 2016
Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called…
Dream to Control: Learning Behaviors by Latent Imagination
- Danijar Hafner, T. Lillicrap, Jimmy Ba, Mohammad Norouzi
- Computer ScienceInternational Conference on Learning…
- 3 December 2019
Dreamer is presented, a reinforcement learning agent that solves long-horizon tasks purely by latent imagination and efficiently learn behaviors by backpropagating analytic gradients of learned state values through trajectories imagined in the compact state space of a learned world model.
Do Deep Nets Really Need to be Deep?
- Jimmy Ba, R. Caruana
- Computer ScienceNIPS
- 20 December 2013
This paper empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models.
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
- Yuhuai Wu, Elman Mansimov, R. Grosse, Shu Liao, Jimmy Ba
- Computer ScienceNIPS
- 1 August 2017
This work proposes to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature with trust region, which is the first scalable trust region natural gradient method for actor-critic methods.
Lookahead Optimizer: k steps forward, 1 step back
- Michael Ruogu Zhang, James Lucas, Geoffrey E. Hinton, Jimmy Ba
- Computer ScienceNeural Information Processing Systems
- 19 July 2019
Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost, and can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings.
Mastering Atari with Discrete World Models
- Danijar Hafner, T. Lillicrap, Mohammad Norouzi, Jimmy Ba
- Computer ScienceInternational Conference on Learning…
- 5 October 2020
DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model, and exceeds the final performance of the top single-GPU agents IQN and Rainbow.
Multiple Object Recognition with Visual Attention
- Jimmy Ba, Volodymyr Mnih, K. Kavukcuoglu
- Computer ScienceInternational Conference on Learning…
- 24 December 2014
The model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image and it is shown that the model learns to both localize and recognize multiple objects despite being given only class labels during training.
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
- Emilio Parisotto, Jimmy Ba, R. Salakhutdinov
- Computer ScienceInternational Conference on Learning…
- 19 November 2015
This work defines a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains, and uses Atari games as a testing environment to demonstrate these methods.
...
...