• Corpus ID: 235414883

MDP Playground: A Design and Debug Testbed for Reinforcement Learning

@inproceedings{Rajan2019MDPPA,
  title={MDP Playground: A Design and Debug Testbed for Reinforcement Learning},
  author={Raghunandan Rajan and Jessica Lizeth Borja Diaz and Suresh Guttikonda and F{\'a}bio Ferreira and Andr{\'e} Biedenkapp and Jan Ole von Hartz and Frank Hutter},
  year={2019}
}
We present MDP Playground, an efficient testbed for Reinforcement Learning 1 (RL) agents with orthogonal dimensions that can be controlled independently 2 to challenge agents in different ways and obtain varying degrees of hardness in 3 generated environments. We consider and allow control over a wide variety of 4 dimensions, including delayed rewards, rewardable sequences, density of rewards, 5 stochasticity, image representations, irrelevant features, time unit, action range 6 and more. We… 

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, and it is suggested fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG.

A Survey of Generalisation in Deep Reinforcement Learning

It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.

Hardness in Markov Decision Processes: Theory and Practice

A systematic survey of the theory of hardness is presented, and four agents in non-tabular versions of Colosseum environments are benchmarked, obtaining results that demonstrate the generality of tabular hardness measures.

The Sandbox Environment for Generalizable Agent Research (SEGAR)

The Sandbox Environment for Generalizable Agent Research (SEGAR) improves the ease and accountability of generalization research in RL, as generalization objectives can be easy designed by specifying task distributions, which in turns allows the researcher to measure the nature of the generalization objective.

Learning Invariant Feature Hierarchies

A number of unsupervised learning algorithms to train computer vision models that are weakly inspired by the visual cortex will be presented, based on the sparse auto-encoder concept and the effectiveness of these algorithms for learning invariant feature hierarchies will be demonstrated.

565 M

  • A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, 566 H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through 567 deep reinforcement learning. Nature, 518(7540):529–533
  • 2015

533 M

  • A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, 534 H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through 535 deep reinforcement learning. Nature, 518(7540):529–533
  • 2015

H

  • van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, 479 M. Azar, and D. Silver. Rainbow: Combining improvements in deep reinforcement learning. 480 In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Conference on Artificial 481 Intelligence (AAAI’18), pages 3215–32
  • 2018

The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)

The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.

and Sergey Levine

  • Soft Actor-Critic: Off-policy maximum 469 entropy deep reinforcement learning with a stochastic actor. In J. G. Dy and A. Krause, editors, 470 Proceedings of the 35th International Conference on Machine Learning (ICML’18), pages 471 1856–1865. PMLR
  • 2018

The arcade learning environment: An 437 evaluation platform for general agents

  • Journal of Artificial Intelligence Research,
  • 2013

H

  • van Hoof, and D. Meger. Addressing function approximation error in actor-critic 460 methods. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference 461 on Machine Learning (ICML’18), pages 1582–1591. PMLR
  • 2018

and 561 K

  • Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In M. Balcan 562 and K. Weinberger, editors, Proceedings of the 33rd International Conference on Machine 563 Learning (ICML’16), volume 48, pages 1928–1937
  • 2016

and 529 K

  • Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In M. Balcan 530 and K. Weinberger, editors, Proceedings of the 33rd International Conference on Machine 531 Learning (ICML’16), volume 48, pages 1928–1937
  • 2016
...