Corpus ID: 218900501

MOPO: Model-based Offline Policy Optimization

@article{Yu2020MOPOMO,
  title={MOPO: Model-based Offline Policy Optimization},
  author={Tianhe Yu and G. Thomas and Lantao Yu and S. Ermon and James Y. Zou and Sergey Levine and Chelsea Finn and Tengyu Ma},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.13239}
}
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a batch of previously collected data. This problem setting is compelling, because it offers the promise of utilizing large, diverse, previously collected datasets to acquire policies without any costly or dangerous active exploration, but it is also exceptionally difficult, due to the distributional shift between the offline training data and the learned policy. While there has been significant progress… Expand

Figures, Tables, and Topics from this paper

Representation Balancing Offline Model-based Reinforcement Learning
COMBO: Conservative Offline Model-Based Policy Optimization
Near Real-World Benchmarks for Offline Reinforcement Learning
Risk-Averse Offline Reinforcement Learning
Regularized Behavior Value Estimation
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 77 REFERENCES
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Behavior Regularized Offline Reinforcement Learning
Model-Ensemble Trust-Region Policy Optimization
Striving for Simplicity in Off-policy Deep Reinforcement Learning
Off-Policy Deep Reinforcement Learning without Exploration
Model-Based Reinforcement Learning via Meta-Policy Optimization
AlgaeDICE: Policy Gradient from Arbitrary Experience
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
...
1
2
3
4
5
...