• Publications
  • Influence
Improved Techniques for Training GANs
TLDR
We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. Expand
  • 4,310
  • 635
  • PDF
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
TLDR
This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. Expand
  • 2,442
  • 309
  • PDF
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
TLDR
We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Expand
  • 780
  • 143
  • PDF
Benchmarking Deep Reinforcement Learning for Continuous Control
TLDR
We present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality. Expand
  • 999
  • 106
  • PDF
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
TLDR
We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. Expand
  • 436
  • 80
  • PDF
Variational Lossy Autoencoder
TLDR
In this paper, we present a simple but principled method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. Expand
  • 410
  • 76
  • PDF
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
TLDR
We propose RL$^2$, a general-purpose RL algorithm that learns from data and scales up to high-dimensional problems. Expand
  • 452
  • 61
  • PDF
A Simple Neural Attentive Meta-Learner
TLDR
We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. Expand
  • 572
  • 59
  • PDF
VIME: Variational Information Maximizing Exploration
TLDR
This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. Expand
  • 423
  • 53
  • PDF
Equivalence Between Policy Gradients and Soft Q-Learning
TLDR
We show that there is a precise equivalence between $Q$-learning and policy gradient methods in the setting of entropy-regularization reinforcement learning, that "soft" (entropy-regularized) Q-learning is exactly equivalent to a policy gradient method. Expand
  • 184
  • 39
  • PDF
...
1
2
3
4
5
...