# Symmetry Learning for Function Approximation in Reinforcement Learning

@article{Mahajan2017SymmetryLF, title={Symmetry Learning for Function Approximation in Reinforcement Learning}, author={Anuj Mahajan and Theja Tulabandhula}, journal={ArXiv}, year={2017}, volume={abs/1706.02999} }

In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for…

## 17 Citations

Exploiting Abstract Symmetries in Reinforcement Learning for Complex Environments

- Computer ScienceICRA
- 2022

This work presents a novel concept that exploits abstract spatial symmetry in complex environments for extending the skills of naïvely trained agents in local abstractions of the environment with the concept of EASE (Exploitation of Abstract Symmetry of Environments).

Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning

- Computer ScienceIEEE Robotics and Automation Letters
- 2020

This work demonstrates that invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL and formulation of a general framework, called Invariant Transform Experience Replay is presented.

Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation

- Computer ScienceArXiv
- 2019

This work proposes two novel data augmentation techniques for DRL that exploits reflectional symmetries and lax goal definitions in order to reuse more efficiently observed data.

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

- Computer ScienceICML
- 2021

This work proposes a novel tensorised formulation of the Bellman equation, which gives rise to the method Tesseract, which utilises the view of Q-function seen as a tensor where the modes correspond to action spaces of different agents.

Invariant Transform Experience Replay

- Computer ScienceArXiv
- 2019

Invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL and are presented with two techniques, Kaleidoscope Experience Replay and Goal-augmented Experience Replay, which take advantage of lax goal definitions.

Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

- Computer ScienceArXiv
- 2021

This work uses cooperative multi-agent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions.

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

- Computer ScienceNeurIPS
- 2020

This paper introduces MDP homomorphic networks for deep reinforcement learning and introduces an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done.

MAVEN: Multi-Agent Variational Exploration

- Computer ScienceNeurIPS
- 2019

A novel approach called MAVEN is proposed that hybridises value and policy-based methods by introducing a latent space for hierarchical control, which allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks.

Quarto as a Reinforcement Learning problem

- Psychology
- 2019

Reinforcement Learning has proven itself through a recent history of superhuman level performances in various tasks. Board games are of particular interest because they can be simulated completely,…

VIREL: A Variational Inference Framework for Reinforcement Learning

- Computer ScienceNeurIPS
- 2019

VIREL is proposed, a novel, theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP and it is shown that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step.

## References

SHOWING 1-10 OF 24 REFERENCES

Universal Value Function Approximators

- Computer ScienceICML
- 2015

An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

- Computer ScienceICML
- 1999

Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.

Human-level control through deep reinforcement learning

- Computer ScienceNature
- 2015

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Improving reinforcement learning by using sequence trees

- Computer ScienceMachine Learning
- 2010

This paper proposes a novel approach to discover options in the form of stochastic conditionally terminating sequences; it shows how such sequences can be integrated into the reinforcement learning…

Learning Options in Reinforcement Learning

- Computer ScienceSARA
- 2002

This paper empirically explores a simple approach to creating options based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals, and proposes a greedy algorithm for identifying subgoal counts based on state visitation counts.

Representation Learning: A Review and New Perspectives

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2013

Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

Macro Actions in Reinforcement Learning

- Computer Science
- 2017

This work compares multiple algorithms that exercise the macro-actions heuristic on a task to learn a defense agent in Half Field Offense problem (HFO) and proposes and compares a few other simple techniques that align with the heuristic and improve over DI-SARSA.

Learning Macro-Actions in Reinforcement Learning

- Computer ScienceNIPS
- 1998

A method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded.

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

- Computer ScienceArtif. Intell.
- 1999

Guided Policy Search

- Computer ScienceICML
- 2013

This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.