Corpus ID: 226254184

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

@article{Gangwani2020HarnessingDR,
  title={Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity},
  author={Tanmay Gangwani and Jian Peng and Yuanshuo Zhou},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.02614}
}
Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on $f$-divergence between the stationary… Expand

Figures and Tables from this paper

Discovering Diverse Nearly Optimal Policies withSuccessor Features
TLDR
Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal while remaining near-optimal with respect to the extrinsic reward of the MDP is proposed. Expand
Modelling Behavioural Diversity for Learning in Open-Ended Games
TLDR
By incorporating the diversity metric into best-response dynamics, this work develops diverse fictitious play and diverse policy-space response oracle for solving normalform games and open-ended games and proves the uniqueness of the diverse best response and the convergence of the algorithms on two-player games. Expand
Policy gradient assisted MAP-Elites
TLDR
PGA-MAP-Elites is presented, a novel algorithm that enables MAP-Elite to efficiently evolve large neural network controllers by introducing a gradient-based variation operator inspired by Deep Reinforcement Learning. Expand

References

SHOWING 1-10 OF 36 REFERENCES
Learning Self-Imitating Diverse Policies
TLDR
A self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings, and can be reduced into a policy-gradient algorithm with shaped rewards learned from experience replays with Stein variational policy gradient descent with Jensen-Shannon divergence. Expand
Collaborative Evolutionary Reinforcement Learning
TLDR
Collaborative Evolutionary Reinforcement Learning (CERL) is introduced, a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space and significantly outperforms its composite learners while remaining overall more sample-efficient. Expand
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
TLDR
This work proposes an algorithm, DualDICE, that is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset and improves accuracy compared to existing techniques. Expand
Imitation Learning via Off-Policy Distribution Matching
TLDR
This work shows how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective and calls the resulting algorithm ValueDICE, finding that it can achieve state-of-the-art sample efficiency and performance. Expand
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
TLDR
By simply adding a distance measure to the loss function, the proposed methodology significantly enhances an agent's exploratory behaviors, and thus preventing the policy from being trapped in local optima and an adaptive scaling method for stabilizing the learning process is proposed. Expand
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
TLDR
This paper shows that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search and quality diversity algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Expand
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
TLDR
A new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy, based on an extension of the doubly robust estimator and a new way to mix between model based estimates and importance sampling based estimates. Expand
Stein Variational Policy Gradient
TLDR
A novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but well-behaved policies is proposed. Expand
Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
TLDR
This paper introduces MERL (Multiagent Evolutionary RL), a hybrid algorithm that does not require an explicit alignment between local and global objectives, and uses fast, policy-gradient based learning for each agent by utilizing their dense local rewards. Expand
High-Dimensional Continuous Control Using Generalized Advantage Estimation
TLDR
This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias. Expand
...
1
2
3
4
...