Share This Author
Deep Recurrent Q-Learning for Partially Observable MDPs
The effects of adding recurrency to a Deep Q-Network is investigated by replacing the first post-convolutional fully-connected layer with a recurrent LSTM, which successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens.
- P. Stone
- Computer ScienceEncyclopedia of Machine Learning and Data Mining
Questions 1. Consider the comparison between ε-greedy methods shown in Figure 2.1 in the Sutton and Barto book. Which method will perform best in the long run in terms of cumulative rewards and…
Transfer Learning for Reinforcement Learning Domains: A Survey
- Matthew E. Taylor, P. Stone
- Computer Science, PsychologyJournal of machine learning research
- 1 December 2009
This article presents a framework that classifies transfer learning methods in terms of their capabilities and goals, and then uses it to survey the existing literature, as well as to suggest future directions for transfer learning work.
A Multiagent Approach to Autonomous Intersection Management
This article suggests an alternative mechanism for coordinating the movement of autonomous vehicles through intersections and demonstrates in simulation that this new mechanism has the potential to significantly outperform current intersection control technology--traffic lights and stop signs.
PAC Subset Selection in Stochastic Multi-armed Bandits
- Shivaram Kalyanakrishnan, Ambuj Tewari, P. Auer, P. Stone
- Computer ScienceInternational Conference on Machine Learning
- 26 June 2012
The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.
Reinforcement Learning for RoboCup Soccer Keepaway
The application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer results in agents that significantly outperform a range of benchmark policies.
Multiagent Systems: A Survey from a Machine Learning Perspective
This survey of MAS is intended to serve as an introduction to the field and as an organizational framework, and highlights how multiagent systems can be and have been used to build complex systems.
Multiagent traffic management: a reservation-based intersection control mechanism
This paper proposes a reservation-based system for alleviating traffic congestion, specifically at intersections, and under the assumption that the cars are controlled by agents, and specifies a precise metric for evaluating the quality of traffic control at an intersection.
Behavioral Cloning from Observation
- F. Torabi, Garrett Warnell, P. Stone
- Computer ScienceInternational Joint Conference on Artificial…
- 4 May 2018
This work proposes a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that allows the agent to acquire experience in a self-supervised fashion to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken.
Interactively shaping agents via human reinforcement: the TAMER framework
Results from two domains demonstrate that lay users can train TAMER agents without defining an environmental reward function (as in an MDP) and indicate that human training within the TAMER framework can reduce sample complexity over autonomous learning algorithms.