Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning
@article{Wang2021LearningMF, title={Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning}, author={Kai Wang and Sanket Shah and Haipeng Chen and A. Perrault and Finale Doshi-Velez and Milind Tambe}, journal={ArXiv}, year={2021}, volume={abs/2106.03279} }
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating…
3 Citations
SimPO: Simultaneous Prediction and Optimization
- Computer Science2022 IEEE International Conference on Services Computing (SCC)
- 2022
This paper proposes a formulation for the Simultaneous Prediction and Optimization (SimPO) framework, which introduces the use of a joint weighted loss of a decision-driven predictive ML model and an optimization objective function, which is optimized end-to-end directly through gradient-based methods.
Visualizing The Implicit Model Selection Tradeoff
- Computer ScienceSSRN Electronic Journal
- 2021
Methods for comparing predictive models in an interpretable manner are proposed that synthesize ideas from supervised learning, unsupervised learning, dimensionality reduction, and visualization and demonstrated how they can be used to inform the model selection process.
Near-optimality for infinite-horizon restless bandits with many arms
- Computer Science, EconomicsArXiv
- 2022
By replacing a global Lagrange multiplier used by the Whittle index with a sequence of Lagrangian multipliers, one per time period up to a finite truncation point, a class of policies are derived that have a O(√N) optimality gap and are demonstrated to provide state-of-the-art performance on specific problems.
Sequential dynamic resource allocation in multi-beam satellite systems: A learning-based optimization method
- Chinese Journal of Aeronautics
- 2022
On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds
- Computer ScienceArXiv
- 2022
The results provide the first rigorous analysis of the approximation and learning-theoretic properties of solution functions with implications for algorithmic design and performance guarantees.
References
SHOWING 1-10 OF 40 REFERENCES
Objective Mismatch in Model-based Reinforcement Learning
- Computer ScienceL4DC
- 2020
It is demonstrated that the likelihood of one-step ahead predictions is not always correlated with control performance, a critical limitation in the standard MBRL framework which will require further research to be fully understood and addressed.
Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization
- Computer ScienceAAAI
- 2019
This work focuses on combinatorial optimization problems and introduces a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions, and shows that decisionfocused learning often leads to improved optimization performance compared to traditional methods.
MOReL : Model-Based Offline Reinforcement Learning
- Computer ScienceNeurIPS
- 2020
Theoretically, it is shown that MOReL is minimax optimal (up to log factors) for offline RL, and through experiments, it matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
COMBO: Conservative Offline Model-Based Policy Optimization
- Computer ScienceNeurIPS
- 2021
A new model-based offline RL algorithm, COMBO, is developed that trains a value function using both the offline dataset and data generated using rollouts under the model while also additionally regularizing the value function on out-of-support state-action tuples generated via model rollouts, without requiring explicit uncertainty estimation.
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems
- Computer ScienceNeurIPS
- 2020
By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, this work achieves a large reduction in training and inference time and improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
- Computer ScienceICML
- 2017
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning…
Task-based End-to-end Model Learning in Stochastic Optimization
- Computer ScienceNIPS
- 2017
This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming.
Generalization Bounds in the Predict-then-Optimize Framework
- Computer ScienceNeurIPS
- 2019
By exploiting the structure of the SPO loss function and an additional strong convexity assumption on the feasible region, this work can dramatically improve the dependence on the dimension via an analysis and corresponding bounds that are akin to the margin guarantees in classification problems.
POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning
- Computer ScienceAISTATS
- 2020
A new optimization objective is introduced that produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available.
Transfer in Reinforcement Learning: A Framework and a Survey
- Computer ScienceReinforcement Learning
- 2012
This chapter provides a formalization of the general transfer problem, the main settings which have been investigated so far, and the most important approaches to transfer in reinforcement learning.