# Learning with Differentiable Perturbed Optimizers

@article{Berthet2020LearningWD, title={Learning with Differentiable Perturbed Optimizers}, author={Quentin Berthet and Mathieu Blondel and Olivier Teboul and Marco Cuturi and Jean-Philippe Vert and Francis R. Bach}, journal={ArXiv}, year={2020}, volume={abs/2002.08676} }

Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths). Although these discrete decisions are easily computed, they break the back-propagation of computational graphs. In order to expand the scope of learning problems that can be solved in an end-to-end fashion, we propose a systematic method to transform optimizers into operations that are differentiable and never locally constant. Our approach…

## 71 Citations

### Learning Linear Programs from Optimal Decisions

- Computer ScienceNeurIPS
- 2020

This work proposes a flexible gradient-based framework for learning linear programs from optimal decisions, and provides a fast batch-mode PyTorch implementation of the homogeneous interior point algorithm, which supports gradients by implicit differentiation or backpropagation.

### Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization

- Computer ScienceICML
- 2021

This work learns the variance of these randomized structured predictors and demonstrates empirically the effectiveness of learning the balance between the signal and the random noise in structured discrete spaces.

### Gradient Estimation with Stochastic Softmax Tricks

- Computer ScienceNeurIPS
- 2020

Stochastic softmax tricks can be used to train latent variable models that perform better and discover more latent structure and this framework is a unified perspective on existing relaxed estimators for perturbation models, and it contains many novel relaxations.

### End-to-End Constrained Optimization Learning: A Survey

- Computer ScienceIJCAI
- 2021

This paper presents a conceptual review of the recent advancements in this emerging area of hybrid machine learning and optimization to predict fast, approximate, solutions to combinatorial problems and to enable structural logical inference.

### Differentiable Greedy Submodular Maximization: Guarantees, Gradient Estimators, and Applications

- Computer Science, MathematicsArXiv
- 2020

A theoretically guaranteed versatile framework that makes the greedy algorithm for monotone submodular function maximization differentiable and smooths it via randomization, and proves that it almost recovers original approximation guarantees in expectation for the cases of cardinality and $\kappa$-extensible system constrains.

### Differentiable Greedy Submodular Maximization with Guarantees and Gradient Estimators

- Computer Science, Mathematics
- 2020

It is proved that the smoothed greedy algorithm almost recovers original approximation guarantees in expectation for the cases of cardinality and $\kappa$-extensible system constrains and it is shown that unbiased gradient estimators of any expected output-dependent quantities can be efficiently obtained by sampling outputs.

### Understanding Deep Architectures with Reasoning Layer

- Computer ScienceNeurIPS 2020
- 2020

This paper takes an initial step towards an understanding of such hybrid deep architectures by showing that properties of the algorithm layers are intimately related to the approximation and generalization abilities of the end-to-end model.

### Differentiating through Log-Log Convex Programs

- Computer Science
- 2020

This work shows how to efficiently compute the derivative (when it exists) of the solution map of log-log convex programs (LLCPs) and uses the adjoint of the derivative to implement differentiable log- log convex optimization layers in PyTorch and TensorFlow.

### Learning Representations for Axis-Aligned Decision Forests through Input Perturbation

- Computer ScienceArXiv
- 2020

A novel but intuitive proposal to achieve representation learning for decision forests without imposing new restrictions or necessitating structural changes, and that is applicable to any arbitrary decision forest and that it allows the use of arbitrary deep neural networks for representation learning.

### Differentiable Greedy Algorithm for Monotone Submodular Maximization: Guarantees, Gradient Estimators, and Applications

- Computer Science, MathematicsAISTATS
- 2021

This paper presents a theoretically guaranteed differentiable greedy algorithm for monotone submodular function maximization, and proves that it almost recovers original approximation guarantees in expectation for the cases of cardinality and κ-extendible system constraints.

## 57 References

### Differentiable Learning of Submodular Models

- Computer ScienceNIPS 2017
- 2017

This paper provides an easily computable approximation to the Jacobian complemented with a complete theoretical analysis that lets us experimentally learn probabilistic log-supermodular models via a bi-level variational inference formulation.

### Differentiable Dynamic Programming for Structured Prediction and Attention

- Computer ScienceICML
- 2018

Theoretically, this work provides a new probabilistic perspective on backpropagating through these DP operators, and relates them to inference in graphical models, and derives two particular instantiations of the framework, a smoothed Viterbi algorithm for sequence prediction and a smoothing DTW algorithm for time-series alignment.

### Differentiation of Blackbox Combinatorial Solvers

- Computer ScienceICLR
- 2020

This work presents a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions, and incorporates the Gurobi MIP solver, Blossom V algorithm, and Dijkstra's algorithm into architectures that extract suitable features from raw inputs for the traveling salesman problem, the min-cost perfect matching problem and the shortest path problem.

### Differentiable Convex Optimization Layers

- Computer ScienceNeurIPS
- 2019

This paper introduces disciplined parametrized programming, a subset of disciplined convex programming, and demonstrates how to efficiently differentiate through each of these components, allowing for end-to-end analytical differentiation through the entire convex program.

### Learning with Fenchel-Young Losses

- Computer ScienceJ. Mach. Learn. Res.
- 2020

Fenchel-Young losses are introduced, a generic way to construct a convex loss function for a regularized prediction function, and an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins.

### Differentiable Ranking and Sorting using Optimal Transport

- Computer ScienceNeurIPS
- 2019

This work proposes a framework to sort elements that is algorithmically differentiable, and calls these operators S-sorts, S-CDFs and S-quantiles, and uses them in various learning settings to propose applications to quantile regression and introduce differentiable formulations of the top-k accuracy that deliver state-of-the art performance.

### Ranking via Sinkhorn Propagation

- Computer ScienceArXiv
- 2011

This paper examines the class of rank-linear objective functions, which includes popular metrics such as precision and discounted cumulative gain, and proposes a technique for learning DSM-based ranking functions using an iterative projection operator known as Sinkhorn normalization, or SinkProp.

### SparseMAP: Differentiable Sparse Structured Inference

- Computer ScienceICML
- 2018

This work introduces SparseMAP, a new method for sparse structured inference, and its natural loss function, which reveals competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems.

### A Smoother Way to Train Structured Prediction Models

- Computer ScienceNeurIPS
- 2018

The experimental results show that the proposed framework allows us to build upon efficient inference algorithms to develop large-scale optimization algorithms for structured prediction which can achieve competitive performance on the two real-world problems.

### Optimization with Sparsity-Inducing Penalties

- Computer ScienceFound. Trends Mach. Learn.
- 2012

This monograph covers proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view.