• Corpus ID: 234337602

Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision

  title={Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision},
  author={Felix Petersen and Christian Borgelt and Hilde Kuehne and Oliver Deussen},
Sorting and ranking supervision is a method for training neural networks end-to-end based on ordering constraints. That is, the ground truth order of sets of samples is known, while their absolute values remain unsupervised. For that, we propose differentiable sorting networks by relaxing their pairwise conditional swap operations. To address the problems of vanishing gradients and extensive blurring that arise with larger numbers of layers, we propose mapping activations to regions with… 
Monotonic Differentiable Sorting Networks
A novel relaxation of conditional swap operations that guarantees monotonicity in differentiable sorting networks is proposed and a family of sigmoid functions are introduced and it is proved that they produce differentiability sorting networks that are monotonic.
Relational Surrogate Loss Learning
This paper shows that directly maintaining the relation of models between surrogate losses and metrics suffices, and proposes a rank correlation-based optimization method to maximize this relation and learn surrogate losses.
Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities
The proposed rank-consistent ordinal regression framework (CORN) achieves rank consistency by a novel training scheme that uses conditional training sets to obtain the unconditional rank probabilities through applying the chain rule for conditional probability distributions.
Differentiable Top-k Classification Learning
This work proposes a differentiable top-k cross-entropy classification loss and finds that relaxing k does not only produce better top-5 accuracies, but also leads to top-1 accuracy improvements.


Stochastic Optimization of Sorting Networks via Continuous Relaxations
This work proposes NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, which permits straight-through optimization of any computational graph involve a sorting operation.
Differentiable Ranking and Sorting using Optimal Transport
This work proposes a framework to sort elements that is algorithmically differentiable, and calls these operators S-sorts, S-CDFs and S-quantiles, and uses them in various learning settings to propose applications to quantile regression and introduce differentiable formulations of the top-k accuracy that deliver state-of-the art performance.
Fast Differentiable Sorting and Ranking
This paper proposes the first differentiable sorting and ranking operators with O(n \log n) time and space complexity, and achieves this feat by constructing differentiable operators as projections onto the permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization.
Ranking via Sinkhorn Propagation
This paper examines the class of rank-linear objective functions, which includes popular metrics such as precision and discounted cumulative gain, and proposes a technique for learning DSM-based ranking functions using an iterative projection operator known as Sinkhorn normalization, or SinkProp.
Differentiable Top-k with Optimal Transport
This work proposes a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top- k operator, which approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem.
Learning Latent Permutations with Gumbel-Sinkhorn Networks
A collection of new methods for end-to-end learning in such models that approximate discrete maximum-weight matching using the continuous Sinkhorn operator are introduced.
A Box-Constrained Approach for Hard Permutation Problems
It is demonstrated that for most problems in QAPLIB and for a class of synthetic QAP problems, the sorting-network formulation returns solutions that are competitive with the FAQ algorithm, often in significantly less computing time.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Learning to rank for information retrieval
Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.