• Publications
  • Influence
Deep Recurrent Q-Learning for Partially Observable MDPs
The effects of adding recurrency to a Deep Q-Network is investigated by replacing the first post-convolutional fully-connected layer with a recurrent LSTM, which successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Expand
Transfer Learning for Reinforcement Learning Domains: A Survey
This article presents a framework that classifies transfer learning methods in terms of their capabilities and goals, and then uses it to survey the existing literature, as well as to suggest future directions for transfer learning work. Expand
A Multiagent Approach to Autonomous Intersection Management
This article suggests an alternative mechanism for coordinating the movement of autonomous vehicles through intersections and demonstrates in simulation that this new mechanism has the potential to significantly outperform current intersection control technology--traffic lights and stop signs. Expand
Reinforcement Learning for RoboCup Soccer Keepaway
The application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer results in agents that significantly outperform a range of benchmark policies. Expand
Reinforcement Learning
  • P. Stone
  • Computer Science
  • Encyclopedia of Machine Learning and Data Mining
  • 2017
PAC Subset Selection in Stochastic Multi-armed Bandits
The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given. Expand
Multiagent traffic management: a reservation-based intersection control mechanism
  • K. Dresner, P. Stone
  • Computer Science
  • Proceedings of the Third International Joint…
  • 19 July 2004
This paper proposes a reservation-based system for alleviating traffic congestion, specifically at intersections, and under the assumption that the cars are controlled by agents, and specifies a precise metric for evaluating the quality of traffic control at an intersection. Expand
Multiagent Systems: A Survey from a Machine Learning Perspective
This survey of MAS is intended to serve as an introduction to the field and as an organizational framework, and highlights how multiagent systems can be and have been used to build complex systems. Expand
Interactively shaping agents via human reinforcement: the TAMER framework
Results from two domains demonstrate that lay users can train TAMER agents without defining an environmental reward function (as in an MDP) and indicate that human training within the TAMER framework can reduce sample complexity over autonomous learning algorithms. Expand
Behavioral Cloning from Observation
This work proposes a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that allows the agent to acquire experience in a self-supervised fashion to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. Expand