Smart "Predict, then Optimize"

@article{Elmachtoub2022SmartT,
  title={Smart "Predict, then Optimize"},
  author={Adam N. Elmachtoub and Paul Grigas},
  journal={Manag. Sci.},
  year={2022},
  volume={68},
  pages={9-26}
}
Many real-world analytics problems involve two significant challenges: prediction and optimization. Because of the typically complex nature of each challenge, the standard paradigm is predict-then-optimize. By and large, machine learning tools are intended to minimize prediction error and do not account for how the predictions will be used in the downstream optimization problem. In contrast, we propose a new and very general framework, called Smart “Predict, then Optimize” (SPO), which directly… 

Figures from this paper

Risk Bounds and Calibration for a Smart Predict-then-Optimize Method
TLDR
Risk bounds and uniform calibration results for the SPO+ loss relative to the SPo loss are developed and shown to provide a quantitative way to transfer the excess surrogate risk to excess true risk.
Generalization Bounds in the Predict-then-Optimize Framework
TLDR
By exploiting the structure of the SPO loss function and an additional strong convexity assumption on the feasible region, this work can dramatically improve the dependence on the dimension via an analysis and corresponding bounds that are akin to the margin guarantees in classification problems.
Predict and Optimize: Through the Lens of Learning to Rank
TLDR
Noise contrastive estimation can be considered a case of learning to rank the solution cache and pairwise and listwise ranking loss functions are developed, which can be differentiated in closed form without the need of solving the optimization problem.
Smart Predict-and-Optimize for Hard Combinatorial Optimization Problems
TLDR
This work investigates the use of SPO to solve more realistic discrete optimization problems, and shows for the first time that a predict-and-optimize approach can successfully be used on large-scale combinatorial optimization problems.
The Perils of Learning Before Optimizing
TLDR
It is shown that the performance gap between a two-stage and end-to-end approach is closely related to the price of correlation concept in stochastic optimization and the implications of some existing POC results for the predict-then-optimize problem are shown.
A Surrogate Objective Framework for Prediction+Optimization with Soft Constraints
TLDR
A novel analytically differentiable surrogate objective framework for real-world linear and semi-definite negative quadratic programming problems with soft linear and non-negative hard constraints is proposed and derives the closed-form solution with respect to predictive parameters and thus gradients for any variable in the problem.
A Surrogate Objective Framework for Prediction+Optimization with Soft Constraints
TLDR
A novel analytically differentiable surrogate objective framework for real-world linear and semi-definite negative quadratic programming problems with soft linear and non-negative hard constraints is proposed and derives the closed-form solution with respect to predictive parameters and thus gradients for any variable in the problem.
A Surrogate Objective Framework for Prediction+Optimization with Soft Constraints
TLDR
A novel analytically differentiable surrogate objective framework for real-world linear and semi-definite negative quadratic programming problems with soft linear and non-negative hard constraints is proposed and derives the closed-form solution with respect to predictive parameters and thus gradients for any variable in the problem.
Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization
TLDR
This work focuses on combinatorial optimization problems and introduces a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions, and shows that decisionfocused learning often leads to improved optimization performance compared to traditional methods.
Fast Rates for Contextual Linear Optimization
TLDR
Surprisingly, in the case of contextual linear optimization, it is shown that the naïve plug-in approach actually achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance.
...
...

References

SHOWING 1-10 OF 71 REFERENCES
From Predictive to Prescriptive Analytics
TLDR
This paper combines ideas from machine learning (ML) and operations research and management science (OR/MS) in developing a framework for using data to prescribe optimal decisions in OR/MS problems, and develops a metric P termed the coefficient of prescriptiveness to measure the prescriptive content of data and the efficacy of a policy from an operations perspective.
Learning Enabled Optimization: Towards a Fusion of Statistical Learning and Stochastic Optimization
TLDR
This paper introduces several novel concepts such as statistical optimality, hypothesis tests for model-fidelity, generalization error of stochastic optimization, and finally, a non-parametric methodology for model selection, which provide a formal framework for modeling, solving, validating, and reporting solutions for Learning Enabled Optimization.
On Structured Prediction Theory with Calibrated Convex Surrogate Losses
TLDR
For any task loss, a convex surrogate is constructed that can be optimized via stochastic gradient descent and tight bounds are proved on the so-called "calibration function" relating the excess surrogate risk to the actual risk.
The Big Data Newsvendor: Practical Insights from Machine Learning
TLDR
An innovative machine-learning approach to a classic problem solved by almost every company, every day, for inventory management, in which the best one-step, feature-based newsvendor algorithm is shown to beat the best-practice benchmark by 24% in the out-of-sample cost at a fraction of the speed.
Data-driven inverse optimization with imperfect information
TLDR
This paper formalizes this inverse optimization problem as a distributionally robust program minimizing the worst-case risk that the predicted decision differs from the agent’s actual response to a random signal and shows that the emerging inverse optimization problems can be exactly reformulated as tractable convex programs when a new suboptimality loss function is used.
Task-based End-to-end Model Learning in Stochastic Optimization
TLDR
This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming.
Task-based End-to-end Model Learning
TLDR
This paper proposes an end-to-end approach for learning probabilistic machine learning models within the context of stochastic programming, in a manner that directly captures the ultimate task-based objective for which they will be used.
Optimization Methods for Large-Scale Machine Learning
TLDR
A major theme of this study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter, leading to a discussion about the next generation of optimization methods for large- scale machine learning.
Data-driven Inverse Optimization with Incomplete Information
TLDR
A main strength of the proposed approach is that it naturally generalizes to situations where the observer has imperfect information, when the agent suffers from bounded rationality or implementation errors, or when the observed signal-response pairs are corrupted by measurement noise.
Structured Prediction by Conditional Risk Minimization
TLDR
A general approach for supervised learning with structured output spaces, such as combinatorial and polyhedral sets, that is based on minimizing estimated conditional risk functions, which enables, in some cases, efficient training and inference without explicitly introducing a convex surrogate for the original loss function, even when it is discontinuous.
...
...