Efficient Baseline-Free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE

@inproceedings{Sehnke2013EfficientBS,
  title={Efficient Baseline-Free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE},
  author={Frank Sehnke},
  booktitle={ICANN},
  year={2013}
}
  • Frank Sehnke
  • Published in ICANN 10 September 2013
  • Computer Science
Policy Gradient methods that explore directly in parameter space are among the most effective and robust direct policy search methods and have drawn a lot of attention lately. The basic method from this field, Policy Gradients with Parameter-based Exploration, uses two samples that are symmetric around the current hypothesis to circumvent misleading reward in asymmetrical reward distributed problems gathered with the usual baseline approach. The exploration parameters are still updated by a… 
3 Citations
Baseline-Free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE
TLDR
This paper shows how the exploration parameters can be sampled quasi-symmetrically despite having limited instead of free parameters for exploration, and gives a transformation approximation to get quasi symmetric samples with respect to the exploration without changing the overall sampling distribution.

References

SHOWING 1-10 OF 21 REFERENCES
Baseline-Free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE
TLDR
This paper shows how the exploration parameters can be sampled quasi-symmetrically despite having limited instead of free parameters for exploration, and gives a transformation approximation to get quasi symmetric samples with respect to the exploration without changing the overall sampling distribution.
Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration
TLDR
This letter combines the following ideas and gives a highly effective policy gradient method: policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates, and an importance sampling technique.
Multimodal Parameter-exploring Policy Gradients
TLDR
The basic PGPE algorithm is extended to use multimodal mixture distributions for each parameter, while remaining efficient, and results demonstrate the advantages of this modification, with faster convergence to better optima.
Parameter-exploring policy gradients
Analysis and Improvement of Policy Gradient Estimation
Exploring parameter space in reinforcement learning
TLDR
It is described how a novel exploration method, State-Dependent Exploration, can modify existing algorithms to mimic exploration in parameter space, and review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration.
Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks
TLDR
An efficient algorithm for estimating the natural policy gradient using parameter-based exploration; this algorithm samples directly in the parameter space using the inverse of the exact Fisher information matrix.
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
TLDR
This paper considers variance reduction methods that were developed for Monte Carlo estimates of integrals, and gives bounds for the estimation error of the gradient estimates for both baseline and actor-critic algorithms, in terms of the sample size and mixing properties of the controlled system.
Parameter exploring policy gradients and their implications
TLDR
The PGPE algorithm developed in this thesis, a new type of Policy Gradient algorithm, allows model-free learning in complex, continuous, partially observable and high dimensional environments and was the most successful in cracking non-differentiable physical cryptography systems.
Path Integral Policy Improvement with Covariance Matrix Adaptation
TLDR
PI2 is considered as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function and is compared to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance.
...
...