Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
@article{Kumar2020ImplicitUI, title={Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning}, author={Aviral Kumar and Rishabh Agarwal and Dibya Ghosh and S. Levine}, journal={ArXiv}, year={2020}, volume={abs/2010.14498} }
We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We characterize this loss of expressivity in terms of a drop in the rank of the learned value network features… CONTINUE READING
Figures and Tables from this paper
Figures and Tables
figure 1 figure 10 figure 2 figure 3 figure 4 figure 6 figure 7 figure 8 figure 9 figure A.1 figure A.10 figure A.11 figure A.12 figure A.13 figure A.14 figure A.15 figure A.16 figure A.17 figure A.18 figure A.19 figure A.2 figure A.20 figure A.21 figure A.3 figure A.4 figure A.5 figure A.6 figure A.7 figure A.8 figure A.9 table B.1 figure D.1
One Citation
References
SHOWING 1-10 OF 60 REFERENCES
Diagnosing Bottlenecks in Deep Q-learning Algorithms
- Computer Science, Mathematics
- ICML
- 2019
- 39
- Highly Influential
- PDF
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
- Computer Science, Mathematics
- ICML
- 2018
- 170
- Highly Influential
- PDF
The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning
- Computer Science, Mathematics
- ArXiv
- 2020
- 4
- Highly Influential
- PDF
Harnessing Structures for Value-Based Planning and Reinforcement Learning
- Computer Science, Mathematics
- ICLR
- 2020
- 7
- PDF
A Theoretical Analysis of Deep Q-Learning
- Computer Science, Mathematics
- L4DC
- 2020
- 83
- Highly Influential
- PDF
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
- Computer Science, Mathematics
- NeurIPS
- 2020
- 10
- PDF
Off-Policy Deep Reinforcement Learning without Exploration
- Computer Science, Mathematics
- ICML
- 2019
- 177
- PDF