Corpus ID: 221095075

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

@article{Kumar2020OneSI,
  title={One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL},
  author={Saurabh Kumar and Aviral Kumar and Sergey Levine and Chelsea Finn},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.14484}
}
While reinforcement learning algorithms can learn effective policies for complex tasks, these policies are often brittle to even minor task variations, especially when variations are not explicitly provided during training. One natural approach to this problem is to train agents with manually specified variation in the training task or environment. However, this may be infeasible in practical situations, either because making perturbations is not possible, or because it is unclear how to choose… Expand

Figures from this paper

Learning a subspace of policies for online adaptation in Reinforcement Learning
TLDR
This article considers the simplest yet hard to tackle generalization setting where the test environment is unknown at train time, forcing the agent to adapt to the system’s new dynamics and proposes an approach where a subspace of policies are learned within the parameter space. Expand
Trajectory Diversity for Zero-Shot Coordination
TLDR
This work introduces Trajectory Diversity (TrajeDi) – a differentiable objective for generating diverse reinforcement learning policies and derive TrajeDi as a generalization of the Jensen-Shannon divergence between policies and motivate it experimentally in two simple settings. Expand
Discovering Diverse Nearly Optimal Policies withSuccessor Features
TLDR
Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal while remaining near-optimal with respect to the extrinsic reward of the MDP is proposed. Expand
Deep Reinforcement Learning amidst Continual Structured Non-Stationarity
TLDR
This work leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy RL with this representation, and empirically finds that this approach substantially outperforms approaches that do not reason about environment shift. Expand
Learning more skills through optimistic exploration
TLDR
It is demonstrated empirically that DISDAIN improves skill learning both in a tabular grid world (Four Rooms) and the 57 games of the Atari Suite (from pixels) and is encouraged to treat pessimism with DIS DAIN. Expand
A Simple Approach to Continual Learning by Transferring Skill Parameters
TLDR
It is shown how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch, given an appropriate curriculum. Expand
Dynamics-Aware Quality-Diversity for Efficient Learning of Skill Repertoires
TLDR
Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models, is proposed and shown how it can be used for continual acquisition of new skill repertoires. Expand
Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning
  • Shenao Zhang, Li Shen, Lei Han, Li Shen
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work proposes Meta Representations for Agents (MRA) that explicitly models the game-common and game-specific strategic knowledge and proves that as an approximation to a constrained mutual information maximization objective, the learned policies can reach Nash Equilibrium in every evaluation MG under the assumption of Lipschitz game on a sufficiently large latent space. Expand
Motion Planning by Learning the Solution Manifold in Trajectory Optimization
TLDR
The approach can be interpreted as training a deep generative model of collision-free trajectories for motion planning and the experimental results indicate that the trained model represents an infinite set of homotopic solutions formotion planning problems. Expand
Unpacking the Expressed Consequences of AI Research in Broader Impact Statements
TLDR
A qualitative thematic analysis of a sample of statements written for the NeurIPS 2020 conference identifies themes related to how consequences are expressed, areas of impacts expressed, and researchers' recommendations for mitigating negative consequences in the future. Expand
...
1
2
...

References

SHOWING 1-10 OF 48 REFERENCES
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
TLDR
This work proposes Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the gradient quality, and demonstrates that the Information Bottleneck is a particularly well suited regularization technique for RL as it is effective in the low-data regime encountered early on in training RL agents. Expand
Generalization and Regularization in DQN
TLDR
Despite regularization being largely underutilized in deep RL, it is shown that it can, in fact, help DQN learn more general features and can then be reused and fine-tuned on similar tasks, considerably improving the sample efficiency of D QN. Expand
Meta-Reinforcement Learning of Structured Exploration Strategies
TLDR
This work introduces a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience that are informed by prior knowledge and are more effective than random action-space noise. Expand
Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning
TLDR
This work uses meta-learning to train a dynamics model prior such that, when combined with recent data, this prior can be rapidly adapted to the local context and demonstrates the importance of incorporating online adaptation into autonomous agents that operate in the real world. Expand
Diversity is All You Need: Learning Skills without a Reward Function
TLDR
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy. Expand
Worst Cases Policy Gradients
TLDR
This work proposes an actor-critic framework that models the uncertainty of the future and simultaneously learns a policy based on that uncertainty model, and optimize policies for varying levels of conditional Value-at-Risk. Expand
Dynamics-Aware Unsupervised Discovery of Skills
TLDR
This work proposes an unsupervised learning algorithm, Dynamics-Aware Discovery of Skills (DADS), which simultaneously discovers predictable behaviors and learns their dynamics, and demonstrates that zero-shot planning in the learned latent space significantly outperforms standard MBRL and model-free goal-conditioned RL, and substantially improves over prior hierarchical RL methods for unsuper supervised skill discovery. Expand
Quantifying Generalization in Reinforcement Learning
TLDR
It is shown that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization. Expand
Action Robust Reinforcement Learning and Applications in Continuous Control
TLDR
This work formalizes two new criteria of robustness to action uncertainty and suggests algorithms in the tabular case that generalize the approach to deep reinforcement learning (DRL) and provides extensive experiments in the various MuJoCo domains. Expand
Reinforcement Learning with Perturbed Rewards
TLDR
This work develops a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed, and shows that trained policies based on the estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. Expand
...
1
2
3
4
5
...