• Corpus ID: 17845195

Designing Deception in Adversarial Reinforcement Learning

  title={Designing Deception in Adversarial Reinforcement Learning},
  author={Sanjiban Choudhury and Alok Kanti Deb and Jayanta Mukherjee},
In an adversarial scenario, deceptions are powerful tools capable of earning time delayed rewards which an agent can use to circumvent the opponent’s counter attack. This paper illustrates deception as a complementary policy to direct objective satisfaction. In this paper, a framework for deceptions is defined to finally determine the number and nature of these actions. A minimal set of these actions ensures fast learning while being robust enough to confront any strong opponent. To satisfy the… 

Deceptive robot motion: synthesis, analysis and experiments

An analysis of deceptive motion is presented, starting with how humans would deceive, moving to a mathematical model that enables the robot to autonomously generate deceptive motion, and ending with studies on the implications of deceive motion for human-robot interactions and the effects of iterated deception.



Autonomous helicopter control using reinforcement learning policy search methods

  • J. BagnellJ. Schneider
  • Computer Science
    Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)
  • 2001
This work considers algorithms that evaluate and synthesize controllers under distributions of Markovian models and demonstrates the presented learning control algorithm by flying an autonomous helicopter and shows that the controller learned is robust and delivers good performance in this real-world domain.

Reinforcement Learning for RoboCup Soccer Keepaway

The application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer results in agents that significantly outperform a range of benchmark policies.

Combining Policy Search with Planning in Multi-agent Cooperation

A novel method called Policy Search Planning (PSP), in which Policy Search is used to find an optimal policy for selecting plans from a plan pool, is proposed, which extends an existing gradient-search method (GPOMDP) to a MAS domain.

Reinforcement Learning with Hierarchies of Machines

This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.

Evolving Keepaway Soccer Players through Task Decomposition

The experiments indicate that neuro-evolution can learn effective behaviors and that the less constrained coevolutionary approach outperforms the sequential approach, and suggest that solution spaces should not be over-constrained when supplementing the learning of complex tasks with human knowledge.

Composing Functions to Speed up Reinforcement Learning in a Changing World

A system that transfers the results of prior learning to speed up reinforcement learning in a changing world, where there is close to a two orders of magnitude increase in learning rate over using a basic reinforcement learning algorithm.

Policy Gradient Methods for Robotics

  • Jan PetersS. Schaal
  • Computer Science
    2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2006
An overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field is given and how the most recently developed methods can significantly improve learning performance is shown.

Macro-Actions in Reinforcement Learning: An Empirical Analysis

Although eligibility traces increased the rate of convergence to the optimal value function compared to learning with macro-actions but without eligibility traces, eligibility traces did not permit the optimal policy to be learned as quickly as it was using macro- actions.

Reinforcement learning of motor skills with policy gradients

On-line Q-learning using connectionist systems

Simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q( ) are more robust than standard Q-learning updates.