• Corpus ID: 235368011

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

@article{Sutter2021RobustGD,
  title={Robust Generalization despite Distribution Shift via Minimum Discriminating Information},
  author={Tobias Sutter and Andreas Krause and Daniel Kuhn},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.04443}
}
Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation… 

Figures from this paper

References

SHOWING 1-10 OF 100 REFERENCES
Learning Models with Uniform Performance via Distributionally Robust Optimization
TLDR
A distributionally robust stochastic optimization framework that learns a model providing good performance against perturbations to the data-generating distribution is developed, and a convex formulation for the problem is given, providing several convergence guarantees.
Distributionally Robust Bayesian Optimization
TLDR
This paper presents a novel distributionally robust Bayesian optimization algorithm (DRBO), which provably obtains sub-linear robust regret in various settings that differ in how the uncertain covariate is observed.
Stable Prediction across Unknown Environments
TLDR
This paper proposes a novel Deep Global Balancing Regression (DGBR) algorithm to jointly optimize a deep auto-encoder model for feature selection and a global balancing model for stable prediction across unknown environments, and demonstrates that the algorithm outperforms the state-of-the-art methods for stable predictions acrossunknown environments.
Robust Classification Under Sample Selection Bias
TLDR
This work develops a framework for learning a robust bias-aware (RBA) probabilistic classifier that adapts to different sample selection biases using a minimax estimation formulation and demonstrates the behavior and effectiveness of the approach on binary classification tasks.
Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations
TLDR
It is demonstrated that the distributionally robust optimization problems over Wasserstein balls can in fact be reformulated as finite convex programs—in many interesting cases even as tractable linear programs.
Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning
TLDR
This tutorial argues that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.
Distributionally Robust Optimization and Generalization in Kernel Methods
TLDR
It is shown that MMD DRO is roughly equivalent to regularization by the Hilbert norm and, as a byproduct, reveal deep connections to classic results in statistical learning.
Robust Covariate Shift Regression
TLDR
This work proposes a robust approach for regression under covariate shift that embraces the uncertainty resulting from sample selection bias by producing regression models that are explicitly robust to it.
Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport
TLDR
It is proved that the surgery estimator finds stable relationships in strictly more scenarios than previous approaches which only consider conditional relationships, and performs competitively against entirely data-driven approaches.
Regularization via Mass Transportation
TLDR
This paper introduces new regularization techniques using ideas from distributionally robust optimization, and gives new probabilistic interpretations to existing techniques to minimize the worst-case expected loss, where the worst case is taken over the ball of all distributions that have a bounded transportation distance from the empirical distribution.
...
1
2
3
4
5
...