Pareto Invariant Risk Minimization

  title={Pareto Invariant Risk Minimization},
  author={Yongqiang Chen and Kaiwen Zhou and Yatao Bian and Binghui Xie and Kaili Ma and Yonggang Zhang and Han Yang and Bo Han and James Cheng},
Despite the success of invariant risk minimization (IRM) in tackling the Out-of-Distribution generalization problem, IRM can compromise the optimality when applied in practice. The practical variants of IRM, e.g., IRMv1, have been shown to have significant gaps with IRM and thus could fail to capture the invariance even in simple problems. Moreover, the optimization procedure in IRMv1 involves two intrinsically conflicting objectives, and often requires careful tuning for the objective weights… 

Figures and Tables from this paper


Does Invariant Risk Minimization Capture Invariance?
It is shown that the Invariant Risk Minimization (IRM) formulation can fail to capture “natural” invariances, at least when used in its practical “linear” form, and even on very simple problems which directly follow the motivating examples for IRM.
Empirical or Invariant Risk Minimization? A Sample Complexity Perspective
This work analyzes the IRM and ERM frameworks from the perspective of sample complexity, finding that depending on the type of data generation mechanism, the two approaches might have very different finite sample and asymptotic behavior.
Efficient Continuous Pareto Exploration in Multi-Task Learning
This work proposes a sample-based sparse linear system, for which standard Hessian-free solvers in machine learning can be applied and reveals the primary directions in local Pareto sets for trade-off balancing, finds more solutions with different trade-offs efficiently, and scales well to tasks with millions of parameters.
Multi-Task Learning with User Preferences: Gradient Descent with Controlled Ascent in Pareto Optimization
This work develops the first gradient-based multi-objective MTL algorithm that combines multiple gradient descent with carefully controlled ascent to traverse the Pareto front in a principled manner, which also makes it robust to initialization.
The Risks of Invariant Risk Minimization
In this setting, the first analysis of classification under the IRM objective is presented, and it is found that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.
Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences
This work develops efficient solution methods for a robust empirical risk minimization problem designed to give calibrated confidence intervals on performance and provide optimal tradeoffs between bias and variance and solves the resulting minimax problems with nearly the same computational cost of stochastic gradient descent.
Understanding Why Generalized Reweighting Does Not Improve Over ERM
This work posit the class of Generalized Reweighting algorithms, as a broad category of approaches that iteratively update model parameters based on iterative reweighting of the training samples, and shows that when overparameterized models are trained under GRW, the resulting models are close to that obtained by ERM.
Pareto Domain Adaptation
A Pareto Domain Adaptation approach to control the overall optimization direction, aiming to cooperatively optimize all training objectives, and proposes a dynamic preference mechanism to dynamically guide the cooperative optimization by the gradient of the surrogate loss on a held-out unlabeled target dataset.
Pareto Multi-Task Learning
Experimental results confirm that the proposed Pareto MTL algorithm can generate well-representative solutions and outperform some state-of-the-art algorithms on many multi-task learning applications.