MDP Playground: A Design and Debug Testbed for Reinforcement Learning
@inproceedings{Rajan2019MDPPA, title={MDP Playground: A Design and Debug Testbed for Reinforcement Learning}, author={Raghunandan Rajan and Jessica Lizeth Borja Diaz and Suresh Guttikonda and F{\'a}bio Ferreira and Andr{\'e} Biedenkapp and Jan Ole von Hartz and Frank Hutter}, year={2019} }
We present MDP Playground, an efficient testbed for Reinforcement Learning 1 (RL) agents with orthogonal dimensions that can be controlled independently 2 to challenge agents in different ways and obtain varying degrees of hardness in 3 generated environments. We consider and allow control over a wide variety of 4 dimensions, including delayed rewards, rewardable sequences, density of rewards, 5 stochasticity, image representations, irrelevant features, time unit, action range 6 and more. We…
Figures from this paper
figure 1 figure 2 figure 3 figure 4 figure 5 figure 6 figure 7 figure 8 figure 9 figure 10 figure 11 figure 12 figure 13 figure 14 figure 15 figure 16 figure 17 figure 18 figure 19 figure 20 figure 21 figure 22 figure 23 figure 24 figure 25 figure 26 figure 27 figure 28 figure 29 figure 30 figure 31 figure 32 figure 33 figure 34 figure 35 figure 36 figure 37 figure 38 figure 39 figure 40 figure 41 figure 42 figure 43 figure 44 figure 45 figure 46 figure 47 figure 48 figure 49 figure 50 figure 51 figure 52 figure 53 figure 54 figure 55 figure 56 figure 57 figure 58 figure 59 figure 60 figure 61 figure 62 figure 63 figure 64 figure 65 figure 66 figure 67 figure 68 figure 69 figure 70 figure 71 figure 72 figure 73 figure 74 figure 75 figure 76 figure 77 figure 82 figure 86 figure 89 figure 90 figure 96 figure 99 figure 100 figure 106 figure 114 figure 115 figure 116 figure 117
4 Citations
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning
- Computer ScienceJ. Artif. Intell. Res.
- 2023
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, and it is suggested fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG.
A Survey of Generalisation in Deep Reinforcement Learning
- Computer ScienceArXiv
- 2021
It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.
Hardness in Markov Decision Processes: Theory and Practice
- Computer ScienceNeurIPS
- 2022
A systematic survey of the theory of hardness is presented, and four agents in non-tabular versions of Colosseum environments are benchmarked, obtaining results that demonstrate the generality of tabular hardness measures.
The Sandbox Environment for Generalizable Agent Research (SEGAR)
- Computer ScienceArXiv
- 2022
The Sandbox Environment for Generalizable Agent Research (SEGAR) improves the ease and accountability of generalization research in RL, as generalization objectives can be easy designed by specifying task distributions, which in turns allows the researcher to measure the nature of the generalization objective.
129 References
Learning Invariant Feature Hierarchies
- Computer ScienceECCV Workshops
- 2012
A number of unsupervised learning algorithms to train computer vision models that are weakly inspired by the visual cortex will be presented, based on the sparse auto-encoder concept and the effectiveness of these algorithms for learning invariant feature hierarchies will be demonstrated.
565 M
- A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, 566 H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through 567 deep reinforcement learning. Nature, 518(7540):529–533
- 2015
533 M
- A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, 534 H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through 535 deep reinforcement learning. Nature, 518(7540):529–533
- 2015
H
- van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, 479 M. Azar, and D. Silver. Rainbow: Combining improvements in deep reinforcement learning. 480 In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Conference on Artificial 481 Intelligence (AAAI’18), pages 3215–32
- 2018
The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
- Computer ScienceIJCAI
- 2013
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.
and Sergey Levine
- Soft Actor-Critic: Off-policy maximum 469 entropy deep reinforcement learning with a stochastic actor. In J. G. Dy and A. Krause, editors, 470 Proceedings of the 35th International Conference on Machine Learning (ICML’18), pages 471 1856–1865. PMLR
- 2018
The arcade learning environment: An 437 evaluation platform for general agents
- Journal of Artificial Intelligence Research,
- 2013
H
- van Hoof, and D. Meger. Addressing function approximation error in actor-critic 460 methods. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference 461 on Machine Learning (ICML’18), pages 1582–1591. PMLR
- 2018
and 561 K
- Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In M. Balcan 562 and K. Weinberger, editors, Proceedings of the 33rd International Conference on Machine 563 Learning (ICML’16), volume 48, pages 1928–1937
- 2016
and 529 K
- Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In M. Balcan 530 and K. Weinberger, editors, Proceedings of the 33rd International Conference on Machine 531 Learning (ICML’16), volume 48, pages 1928–1937
- 2016