• Corpus ID: 235358811

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

  title={Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning},
  author={Kai Wang and Sanket Shah and Haipeng Chen and A. Perrault and Finale Doshi-Velez and Milind Tambe},
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating… 

Figures and Tables from this paper

SimPO: Simultaneous Prediction and Optimization

This paper proposes a formulation for the Simultaneous Prediction and Optimization (SimPO) framework, which introduces the use of a joint weighted loss of a decision-driven predictive ML model and an optimization objective function, which is optimized end-to-end directly through gradient-based methods.

Visualizing The Implicit Model Selection Tradeoff

Methods for comparing predictive models in an interpretable manner are proposed that synthesize ideas from supervised learning, unsupervised learning, dimensionality reduction, and visualization and demonstrated how they can be used to inform the model selection process.

Near-optimality for infinite-horizon restless bandits with many arms

By replacing a global Lagrange multiplier used by the Whittle index with a sequence of Lagrangian multipliers, one per time period up to a finite truncation point, a class of policies are derived that have a O(√N) optimality gap and are demonstrated to provide state-of-the-art performance on specific problems.

On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds

The results provide the first rigorous analysis of the approximation and learning-theoretic properties of solution functions with implications for algorithmic design and performance guarantees.



Objective Mismatch in Model-based Reinforcement Learning

It is demonstrated that the likelihood of one-step ahead predictions is not always correlated with control performance, a critical limitation in the standard MBRL framework which will require further research to be fully understood and addressed.

Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization

This work focuses on combinatorial optimization problems and introduces a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions, and shows that decisionfocused learning often leads to improved optimization performance compared to traditional methods.

MOReL : Model-Based Offline Reinforcement Learning

Theoretically, it is shown that MOReL is minimax optimal (up to log factors) for offline RL, and through experiments, it matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.

COMBO: Conservative Offline Model-Based Policy Optimization

A new model-based offline RL algorithm, COMBO, is developed that trains a value function using both the offline dataset and data generated using rollouts under the model while also additionally regularizing the value function on out-of-support state-action tuples generated via model rollouts, without requiring explicit uncertainty estimation.

Automatically Learning Compact Quality-aware Surrogates for Optimization Problems

By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, this work achieves a large reduction in training and inference time and improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

Task-based End-to-end Model Learning in Stochastic Optimization

This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming.

Generalization Bounds in the Predict-then-Optimize Framework

By exploiting the structure of the SPO loss function and an additional strong convexity assumption on the feasible region, this work can dramatically improve the dependence on the dimension via an analysis and corresponding bounds that are akin to the margin guarantees in classification problems.

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

A new optimization objective is introduced that produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available.

Transfer in Reinforcement Learning: A Framework and a Survey

  • A. Lazaric
  • Computer Science
    Reinforcement Learning
  • 2012
This chapter provides a formalization of the general transfer problem, the main settings which have been investigated so far, and the most important approaches to transfer in reinforcement learning.