• Publications
  • Influence
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
TLDR
Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.
Benchmarking Deep Reinforcement Learning for Continuous Control
TLDR
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
TLDR
This paper proposes to represent a "fast" reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm.
Variational Lossy Autoencoder
TLDR
This paper presents a simple but principled method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN with greatly improve generative modeling performance of VAEs.
VIME: Variational Information Maximizing Exploration
TLDR
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.
Adversarial Attacks on Neural Network Policies
TLDR
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.
Motion planning with sequential convex optimization and convex collision checking
TLDR
A sequential convex optimization procedure, which penalizes collisions with a hinge loss and increases the penalty coefficients in an outer loop as necessary, and an efficient formulation of the no-collisions constraint that directly considers continuous-time safety are presented.
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
TLDR
A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks.
Evaluating Protein Transfer Learning with TAPE
TLDR
It is found that self-supervised pretraining is helpful for almost all models on all tasks, more than doubling performance in some cases and suggesting a huge opportunity for innovative architecture design and improved modeling paradigms that better capture the signal in biological sequences.
Model-Ensemble Trust-Region Policy Optimization
TLDR
This paper analyzes the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and shows that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.
...
...