# Alternating Optimisation and Quadrature for Robust Reinforcement Learning

@article{Paul2016AlternatingOA, title={Alternating Optimisation and Quadrature for Robust Reinforcement Learning}, author={Supratik Paul and Kamil Ciosek and Michael A. Osborne and Shimon Whiteson}, journal={ArXiv}, year={2016}, volume={abs/1605.07496} }

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but are controllable in a simulator. This paper considers the problem of finding an optimal policy while taking into account the impact… Expand

#### 5 Citations

OFFER: Off-Environment Reinforcement Learning

- Computer Science
- AAAI
- 2017

It is proved that OFFER converges to a locally optimal policy and it is shown experimentally that it learns better and faster than a policy gradient baseline. Expand

Off-Environment RL with Rare Events

- 2016

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not… Expand

Expected Policy Gradients

- Computer Science, Mathematics
- AAAI
- 2018

A new general policy gradient theorem is established, of which the stochastic and deterministic policy gradient theorems are special cases, and it is proved that EPG reduces the variance of the gradient estimates without requiring deterministic policies and, for the Gaussian case, with no computational overhead. Expand

Agent-Agnostic Human-in-the-Loop Reinforcement Learning

- Computer Science
- ArXiv
- 2017

This work explores protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning, to incorporate the beneficial properties of a human teacher into Reinforcement learning without making strong assumptions about the inner workings of the agent. Expand

On the Sampling Problem for Kernel Quadrature

- Computer Science, Mathematics
- ICML
- 2017

It is argued that the practical choice of sampling distribution is an important open problem and a novel automatic approach based on adaptive tempering and sequential Monte Carlo is considered, demonstrating a dramatic reduction in integration error. Expand

#### References

SHOWING 1-10 OF 43 REFERENCES

Reinforcement learning in the presence of rare events

- Computer Science
- ICML '08
- 2008

This work introduces algorithms for policy evaluation, using both tabular and function approximation representations of the value function, and proves that in both cases, the reinforcement learning algorithms converge. Expand

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

- Computer Science, Mathematics
- ArXiv
- 2010

A tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions using the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. Expand

Active Policy Learning for Robot Planning and Exploration under Uncertainty

- Computer Science
- Robotics: Science and Systems
- 2007

A simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes, tested in the domain of robot navigation and exploration under uncertainty, which effectively trades-off between exploration and exploitation. Expand

Active Learning of Model Evidence Using Bayesian Quadrature

- Computer Science, Mathematics
- NIPS
- 2012

This work proposes a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Expand

Bayesian optimization for learning gaits under uncertainty

- Computer Science
- Annals of Mathematics and Artificial Intelligence
- 2015

Bayesian optimization, a model-based approach to black-box optimization under uncertainty, is evaluated on both simulated problems and real robots, demonstrating that Bayesian optimization is particularly suited for robotic applications, where it is crucial to find a good set of gait parameters in a small number of experiments. Expand

A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

- Computer Science
- Auton. Robots
- 2009

A Bayesian optimization method that dynamically trades off exploration and exploitation for optimal sensing with a mobile robot and is applicable to other closely-related domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors. Expand

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

- Computer Science
- ICML
- 2011

PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning. Expand

Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

- Computer Science, Medicine
- Evol. Intell.
- 2011

An extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning, and proposes and evaluates several methods for three increasingly challenging variations of the task. Expand

Bayesian Monte Carlo

- Computer Science, Mathematics
- NIPS
- 2002

It is found that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. Expand

Bayes–Hermite quadrature

- Mathematics
- 1991

Abstract Bayesian quadrature treats the problem of numerical integration as one of statistical inference. A prior Gaussian process distribution is assumed for the integrand, observations arise from… Expand