• Corpus ID: 238634302

Learning with Algorithmic Supervision via Continuous Relaxations

  title={Learning with Algorithmic Supervision via Continuous Relaxations},
  author={Felix Petersen and Christian Borgelt and Hilde Kuehne and Oliver Deussen},
The integration of algorithmic components into neural architectures has gained increased attention recently, as it allows training neural networks with new forms of supervision such as ordering constraints or silhouettes instead of using ground truth labels. Many approaches in the field focus on the continuous relaxation of a specific task and show promising results in this context. But the focus on single tasks also limits the applicability of the proposed concepts to a narrow range of… 

Figures and Tables from this paper

Monotonic Differentiable Sorting Networks
A novel relaxation of conditional swap operations that guarantees monotonicity in differentiable sorting networks is proposed and a family of sigmoid functions are introduced and it is proved that they produce differentiability sorting networks that are monotonic.
GenDR: A Generalized Differentiable Renderer
This work presents and studies a generalized family of differentiable renderers, which generalizes existing differentiability renderers like SoftRas and DIB-R, with an array of different smoothing distributions to cover a large spectrum of reasonable settings.


Differentiable Dynamic Programming for Structured Prediction and Attention
Theoretically, this work provides a new probabilistic perspective on backpropagating through these DP operators, and relates them to inference in graphical models, and derives two particular instantiations of the framework, a smoothed Viterbi algorithm for sequence prediction and a smoothing DTW algorithm for time-series alignment.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Stochastic Optimization of Sorting Networks via Continuous Relaxations
This work proposes NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, which permits straight-through optimization of any computational graph involve a sorting operation.
A Fully Differentiable Beam Search Decoder
It is shown that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained language model, while implicitly learning a language model.
Task Loss Estimation for Sequence Prediction
This work proposes another method for deriving differentiable surrogate losses that provably meet the requirement of consistency with the task loss, and focuses on the broad class of models that define a score for every input-output pair.
Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder
A novel latent-variable generative model for semi-supervised syntactic dependency parsing that relies on differentiable dynamic programming over stochastically perturbed edge scores and introduces a differentiable relaxation to obtain approximate samples and compute gradients with respect to the parser parameters.
Learning Latent Permutations with Gumbel-Sinkhorn Networks
A collection of new methods for end-to-end learning in such models that approximate discrete maximum-weight matching using the continuous Sinkhorn operator are introduced.
Differentiation of Blackbox Combinatorial Solvers
This work presents a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions, and incorporates the Gurobi MIP solver, Blossom V algorithm, and Dijkstra's algorithm into architectures that extract suitable features from raw inputs for the traveling salesman problem, the min-cost perfect matching problem and the shortest path problem.
Categorical Reparameterization with Gumbel-Softmax
It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification.