Designing Deception in Adversarial Reinforcement Learning
@inproceedings{Choudhury2011DesigningDI, title={Designing Deception in Adversarial Reinforcement Learning}, author={Sanjiban Choudhury and Alok Kanti Deb and Jayanta Mukherjee}, year={2011} }
In an adversarial scenario, deceptions are powerful tools capable of earning time delayed rewards which an agent can use to circumvent the opponent’s counter attack. This paper illustrates deception as a complementary policy to direct objective satisfaction. In this paper, a framework for deceptions is defined to finally determine the number and nature of these actions. A minimal set of these actions ensures fast learning while being robust enough to confront any strong opponent. To satisfy the…
Figures and Tables from this paper
One Citation
Deceptive robot motion: synthesis, analysis and experiments
- PsychologyAuton. Robots
- 2015
An analysis of deceptive motion is presented, starting with how humans would deceive, moving to a mathematical model that enables the robot to autonomously generate deceptive motion, and ending with studies on the implications of deceive motion for human-robot interactions and the effects of iterated deception.
References
SHOWING 1-10 OF 31 REFERENCES
Autonomous helicopter control using reinforcement learning policy search methods
- Computer ScienceProceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)
- 2001
This work considers algorithms that evaluate and synthesize controllers under distributions of Markovian models and demonstrates the presented learning control algorithm by flying an autonomous helicopter and shows that the controller learned is robust and delivers good performance in this real-world domain.
Reinforcement Learning for RoboCup Soccer Keepaway
- Computer ScienceAdapt. Behav.
- 2005
The application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer results in agents that significantly outperform a range of benchmark policies.
Combining Policy Search with Planning in Multi-agent Cooperation
- Computer ScienceRoboCup
- 2008
A novel method called Policy Search Planning (PSP), in which Policy Search is used to find an optimal policy for selecting plans from a plan pool, is proposed, which extends an existing gradient-search method (GPOMDP) to a MAS domain.
Reinforcement Learning with Hierarchies of Machines
- Computer ScienceNIPS
- 1997
This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.
Evolving Keepaway Soccer Players through Task Decomposition
- Computer ScienceGECCO
- 2003
The experiments indicate that neuro-evolution can learn effective behaviors and that the less constrained coevolutionary approach outperforms the sequential approach, and suggest that solution spaces should not be over-constrained when supplementing the learning of complex tasks with human knowledge.
Composing Functions to Speed up Reinforcement Learning in a Changing World
- Computer ScienceECML
- 1998
A system that transfers the results of prior learning to speed up reinforcement learning in a changing world, where there is close to a two orders of magnitude increase in learning rate over using a basic reinforcement learning algorithm.
Policy Gradient Methods for Robotics
- Computer Science2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
- 2006
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.
Macro-Actions in Reinforcement Learning: An Empirical Analysis
- Computer Science
- 1998
Although eligibility traces increased the rate of convergence to the optimal value function compared to learning with macro-actions but without eligibility traces, eligibility traces did not permit the optimal policy to be learned as quickly as it was using macro- actions.
Reinforcement learning of motor skills with policy gradients
- Computer ScienceNeural Networks
- 2008
On-line Q-learning using connectionist systems
- Computer Science
- 1994
Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.