Corpus ID: 230770198

Minibatch optimal transport distances; analysis and applications

@article{Fatras2021MinibatchOT,
  title={Minibatch optimal transport distances; analysis and applications},
  author={Kilian Fatras and Younes Zine and Szymon Majewski and R'emi Flamary and R{\'e}mi Gribonval and Nicolas Courty},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.01792}
}
Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning. Yet, despite recent algorithmic developments, their complexity prevents their direct use on large scale datasets. To overcome this challenge, a common workaround is to compute these distances on minibatches i.e. to average the outcome of several smaller optimal transport problems. We propose in this paper an extended analysis of this practice, which… Expand
Unbalanced minibatch Optimal Transport; applications to Domain Adaptation
TLDR
The experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines. Expand
An Efficient Mini-batch Method via Partial Transportation
  • Khai Nguyen, Dang Nguyen, Tung Pham, Nhat Ho
  • Computer Science
  • ArXiv
  • 2021
TLDR
It is observed that m-POT is better than m-OT deep domain adaptation applications while having comparable performance with m-UOT and on other applications, such as deep generative model, gradient flow, and color transfer, m- POT yields more favorable performance than both m- OT and m- UOT. Expand
On Transportation of Mini-batches: A Hierarchical Approach
  • Khai Nguyen, Dang Nguyen, +5 authors Nhat Ho
  • Mathematics, Computer Science
  • 2021
TLDR
A novel mini-batching scheme for optimal transport, named Batch of Mini-batches Optimal Transport (BoMb-OT), that achieves a favorable performance in deep learning models such as deep generative models and deep domain adaptation and also yields either a lower quantitative result or a better qualitative result than the m-OT. Expand
MICo: Learning improved representations via sampling-based state similarity for Markov decision processes
TLDR
A new behavioural distance over the state space of a Markov decision process is presented, and empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark. Expand
Using Optimal Transport as Alignment Objective for fine-tuning Multilingual Contextualized Embeddings
TLDR
This work proposes using Optimal Transport as an alignment objective during fine-tuning to further improve multilingual contextualized representations for downstream cross-lingual transfer and allows different types of mappings due to soft matching between source and target sentences. Expand
BoMb-OT: On Batch of Mini-batches Optimal Transport
TLDR
The proposed Batch of Mini-batches Optimal Transport (BoMb-OT) is a novel mini-batching scheme for optimal transport that can be formulated as a well-defined distance on the space of probability measures and provides a better objective loss than m-OT for doing approximate Bayesian computation, estimating parameters of interest in parametric generative models, and learning non-parametricGenerative models with gradient flow. Expand
Improving Mini-batch Optimal Transport via Partial Transportation
  • Khai Nguyen, Dang Nguyen, Tung Pham, Nhat Ho
  • Mathematics, Computer Science
  • 2021
TLDR
It is observed that m-POT is better than m-OT in deep domain adaptation applications while having comparable performance with m-UOT, and on other applications, such as deep generative model and color transfer, m- POT yields more favorable performance thanm-OT while m- UOT is non-trivial to apply. Expand
MICo: Improved representations via sampling-based state similarity for Markov decision processes
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deepExpand
Quantized Gromov-Wasserstein
TLDR
Quantized Gromov Wasserstein (qGW) is defined, a metric that treats parts as fundamental objects and fits into a hierarchy of theoretical upper bounds for the GW problem, which motivates a new algorithm for approximating optimal GW matchings which yields algorithmic speedups and reductions in memory complexity. Expand
Subspace Detours Meet Gromov-Wasserstein
In the context of optimal transport methods, the subspace detour approach was recently presented by Muzellec and Cuturi (2019). It consists in building a nearly optimal transport plan in the measuresExpand

References

SHOWING 1-10 OF 87 REFERENCES
Stochastic Optimization for Large-scale Optimal Transport
TLDR
A new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications, based on entropic regularization of the primal OT problem, which results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. Expand
Sliced Gromov-Wasserstein
TLDR
A novel OT discrepancy is defined that can deal with large scale distributions via a slicing approach and is demonstrated to have ability to tackle similar problems as GW while being several order of magnitudes faster to compute. Expand
Stochastic Optimization for Regularized Wasserstein Estimators
TLDR
This work introduces an algorithm to solve a regularized version of this problem of Wasserstein estimators, with a time per step which is sublinear in the natural dimensions of the problem, and optimize it with stochastic gradient steps that can be computed directly from samples, without solving additional optimization problems at each step. Expand
Tree-Sliced Variants of Wasserstein Distances
TLDR
The tree-sliced Wasserstein distance is proposed, computed by averaging the Wasserenstein distance between these measures using random tree metrics, built adaptively in either low or high-dimensional spaces. Expand
Learning Generative Models with Sinkhorn Divergences
TLDR
This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles three issues by relying on two key ideas: entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; and algorithmic (automatic) differentiation of these iterations. Expand
Sinkhorn AutoEncoders
TLDR
It is proved that optimizing the encoder over any class of universal approximators, such as deterministic neural networks, is enough to come arbitrarily close to the optimum, and advertised this framework, which holds for any metric space and prior, as a sweet-spot of current generative autoencoding objectives. Expand
Large Scale Optimal Transport and Mapping Estimation
TLDR
This paper proposes a stochastic dual approach of regularized OT, and shows empirically that it scales better than a recent related approach when the amount of samples is very large, and estimates a Monge map as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. Expand
Sliced Wasserstein Generative Models
  • Jiqing Wu, Z. Huang, +4 authors L. Gool
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
This paper proposes to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion and designs two types of differentiable SWD blocks to equip modern generative frameworks---Auto-Encoders and Generative Adversarial Networks. Expand
Sliced Wasserstein Kernels for Probability Distributions
  • S. Kolouri, Yang Zou, G. Rohde
  • Computer Science, Mathematics
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
TLDR
This work provides a new perspective on the application of optimal transport flavored distances through kernel methods in machine learning tasks and provides a family of provably positive definite kernels based on the Sliced Wasserstein distance. Expand
Hierarchical Optimal Transport for Multimodal Distribution Alignment
TLDR
This work introduces a hierarchical formulation of OT which leverages clustered structure in data to improve alignment in noisy, ambiguous, or multimodal settings and demonstrates that when clustered structure exists in datasets, and is consistent across trials or time points, a hierarchical alignment strategy can provide significant improvements in cross-domain alignment. Expand
...
1
2
3
4
5
...