Corpus ID: 753892

Alternating Optimisation and Quadrature for Robust Reinforcement Learning

@article{Paul2016AlternatingOA,
  title={Alternating Optimisation and Quadrature for Robust Reinforcement Learning},
  author={Supratik Paul and Kamil Ciosek and Michael A. Osborne and Shimon Whiteson},
  journal={ArXiv},
  year={2016},
  volume={abs/1605.07496}
}
Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but are controllable in a simulator. This paper considers the problem of finding an optimal policy while taking into account the impact… Expand
OFFER: Off-Environment Reinforcement Learning
TLDR
It is proved that OFFER converges to a locally optimal policy and it is shown experimentally that it learns better and faster than a policy gradient baseline. Expand
Off-Environment RL with Rare Events
Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does notExpand
Expected Policy Gradients
TLDR
A new general policy gradient theorem is established, of which the stochastic and deterministic policy gradient theorems are special cases, and it is proved that EPG reduces the variance of the gradient estimates without requiring deterministic policies and, for the Gaussian case, with no computational overhead. Expand
Agent-Agnostic Human-in-the-Loop Reinforcement Learning
TLDR
This work explores protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning, to incorporate the beneficial properties of a human teacher into Reinforcement learning without making strong assumptions about the inner workings of the agent. Expand
On the Sampling Problem for Kernel Quadrature
TLDR
It is argued that the practical choice of sampling distribution is an important open problem and a novel automatic approach based on adaptive tempering and sequential Monte Carlo is considered, demonstrating a dramatic reduction in integration error. Expand

References

SHOWING 1-10 OF 43 REFERENCES
Reinforcement learning in the presence of rare events
TLDR
This work introduces algorithms for policy evaluation, using both tabular and function approximation representations of the value function, and proves that in both cases, the reinforcement learning algorithms converge. Expand
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
TLDR
A tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions using the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. Expand
Active Policy Learning for Robot Planning and Exploration under Uncertainty
TLDR
A simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes, tested in the domain of robot navigation and exploration under uncertainty, which effectively trades-off between exploration and exploitation. Expand
Active Learning of Model Evidence Using Bayesian Quadrature
TLDR
This work proposes a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Expand
Bayesian optimization for learning gaits under uncertainty
TLDR
Bayesian optimization, a model-based approach to black-box optimization under uncertainty, is evaluated on both simulated problems and real robots, demonstrating that Bayesian optimization is particularly suited for robotic applications, where it is crucial to find a good set of gait parameters in a small number of experiments. Expand
A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot
TLDR
A Bayesian optimization method that dynamically trades off exploration and exploitation for optimal sensing with a mobile robot and is applicable to other closely-related domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors. Expand
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
TLDR
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning. Expand
Neuroevolutionary reinforcement learning for generalized control of simulated helicopters
TLDR
An extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning, and proposes and evaluates several methods for three increasingly challenging variations of the task. Expand
Bayesian Monte Carlo
TLDR
It is found that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. Expand
Bayes–Hermite quadrature
Abstract Bayesian quadrature treats the problem of numerical integration as one of statistical inference. A prior Gaussian process distribution is assumed for the integrand, observations arise fromExpand
...
1
2
3
4
5
...