Corpus ID: 29153004

Wasserstein regularization for sparse multi-task regression

@article{Janati2019WassersteinRF,
  title={Wasserstein regularization for sparse multi-task regression},
  author={Hicham Janati and Marco Cuturi and Alexandre Gramfort},
  journal={ArXiv},
  year={2019},
  volume={abs/1805.07833}
}
We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space. Such problems often employ sparse priors, which promote models using a small subset of regressors. To increase statistical power, the so-called multi-task techniques were proposed, which consist in the simultaneous estimation of several related models. Combined with sparsity assumptions, it lead to models enforcing… Expand
Decentralised Sparse Multi-Task Regression.
We consider a sparse multi-task regression framework for fitting a collection of related sparse models. Representing models as nodes in a graph with edges between related models, a framework thatExpand
Sinkhorn Regression
TLDR
This paper proposes an efficient algorithm to solve the relaxed model and establish its complete statistical guarantees under mild conditions, and leverage Kullback-Leibler divergence to relax the proposed model with marginal constraints into its unbalanced formulation to adapt more types of features. Expand
Sliced Multi-Marginal Optimal Transport
TLDR
The sliced multimarginal discrepancy is massively scalable for a large number of probability measures with support as large as 10 samples and can be applied to solving problems such as barycentric averaging, multi-task density estimation and multi- task reinforcement learning. Expand
Multi-source Deep Gaussian Process Kernel Learning
TLDR
The approximation of prior-posterior DGP can be considered a novel kernel composition which blends the kernels in different layers and have explicit dependence on the data, suggesting that data-informed approximate DGPs are a powerful tool for integrating data across sources. Expand
A Principled Approach for Learning Task Similarity in Multitask Learning
TLDR
An upper bound on the generalization error of multitask learning is provided, showing the benefit of explicit and implicit task similarity knowledge, and a new training algorithm is proposed to learn the task relation coefficients and neural network parameters iteratively. Expand
Manifold optimization for non-linear optimal transport problems
TLDR
This work discusses optimization-related ingredients that allow modeling the OT problem on smooth Riemannian manifolds by exploiting the geometry of the search space and makes available the Manifold optimization-based Optimal Transport repository, or MOT, repository with codes useful in solving OT problems in Python and Matlab. Expand
FEATURE-ROBUST OPTIMAL TRANSPORT
  • 2020
Optimal transport is a machine learning problem with applications including distribution comparison, feature selection, and generative adversarial networks. In this paper, we propose feature-robustExpand
Multi-subject MEG/EEG source imaging with sparse multi-task regression
TLDR
This analysis of a multimodal dataset shows how multi-subject source localization reduces the gap between MEG and fMRI for brain mapping and proposes the Minimum Wasserstein Estimates (MWE), a new joint regression method based on optimal transport metrics that promotes spatial proximity on the cortical mantle. Expand
Estimation of Wasserstein distances in the Spiked Transport Model
We propose a new statistical model, the spiked transport model, which formalizes the assumption that two probability distributions differ only on a low-dimensional subspace. We study the minimax rateExpand
Manifold optimization for optimal transport
TLDR
This work discusses optimization-related ingredients that allow modeling the OT problem on smooth Riemannian manifolds by exploiting the geometry of the search space and makes available the Manifold optimization-based Optimal Transport repository with codes useful in solving OT problems in Python and Matlab. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 36 REFERENCES
A Dirty Model for Multi-task Learning
We consider multi-task learning in the setting of multiple linear regression, and where some relevant features could be shared across the tasks. Recent research has studied the use of l1/lq normExpand
Multi-level Lasso for Sparse Multi-task Regression
TLDR
The approach is based on an intuitive decomposition of the regression coe_cients into a product between a component that is common to all tasks and another component that captures task-specificity that yields the Multi-level Lasso objective. Expand
Multi-Task Feature Learning
TLDR
The method builds upon the well-known 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks, and develops an iterative algorithm for solving it. Expand
A Convex Feature Learning Formulation for Latent Task Structure Discovery
TLDR
The main contribution is a convex formulation that employs a graph-based regularizer and simultaneously discovers few groups of related tasks, having close-by task parameters, as well as the feature space shared within each group. Expand
Multi-task feature selection
We address the problem of joint feature selection across a group of related classification or regression tasks. We propose a novel type of joint regularization of the model parameters in order toExpand
Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization
Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of l1/l∞-regularizedExpand
Learning with a Wasserstein Loss
TLDR
An efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which can encourage smoothness of the predictions with respect to a chosen metric on the output space. Expand
Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning
TLDR
A new nonlinear dictionary learning method for histograms in the probability simplex that leverages optimal transport theory, relying on Wasserstein barycenters instead of the usual matrix product between dictionary and codes, allowing for nonlinear relationships between atoms and the reconstruction of input data. Expand
Sparse Group Lasso: Consistency and Climate Applications
TLDR
In this paper, theoretical statistical consistency of estimators with tree-structured norm regularizers is proved, which proves that the SGL model provides better predictive performance than the current state-of-the-art, remains climatologically interpretable, and is robust in its variable selection. Expand
Learning Tree Structure in Multi-Task Learning
TLDR
A TAsk Tree (TAT) model for MTL is developed, which devise sequential constraints to make the distance between the parameters in the component matrices corresponding to each pair of tasks decrease over layers, and hence the component parameters will keep fused until the topmost layer, once they become fused in a layer. Expand
...
1
2
3
4
...