Hyperparameter Auto-Tuning in Self-Supervised Robotic Learning

  title={Hyperparameter Auto-Tuning in Self-Supervised Robotic Learning},
  author={Jiancong Huang and Juan Rojas and Matthieu Zimmer and Hongmin Wu and Yisheng Guan and P. Weng},
  journal={IEEE Robotics and Automation Letters},
Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. The effects are further exacerbated when using single policies to solve multi-task learning… 


Self-Tuning Deep Reinforcement Learning
The Self-Tuning Actor Critic (STAC) is presented which uses metagradients to tune the hyperparameters of the usual loss function of the IMPALA actor critic agent via differentiable cross validation, whilst the agent interacts with and learns from the environment.
A Self-Tuning Actor-Critic Algorithm
This paper applies the algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator.
Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
This paper proposes a formal exploration objective for goal-reaching policies that maximizes state coverage and presents an algorithm called Skew-Fit, which enables a real-world robot to learn to open a door, entirely from scratch, from pixels, and without any manually-designed reward function.
Deep Reinforcement Learning that Matters
Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested.
Automatic Goal Generation for Reinforcement Learning Agents
This work uses a generator network to propose tasks for the agent to try to achieve, specified as goal states, and shows that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment.
Visual Reinforcement Learning with Imagined Goals
An algorithm is proposed that acquires general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies, efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.
InfoVAE: Balancing Learning and Inference in Variational Autoencoders
It is shown that the proposed Info-VAE model can significantly improve the quality of the variational posterior and can make effective use of the latent features regardless of the flexibility of the decoding distribution.
Universal Value Function Approximators
An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.
Contextual Imagined Goals for Self-Supervised Robotic Learning
A conditional goal-setting model is proposed that enables self-supervised goal-conditioned off-policy learning with raw image observations in the real world, enabling a robot to manipulate a variety of objects and generalize to new objects that were not seen during training.
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificial