Corpus ID: 237491713

On Tilted Losses in Machine Learning: Theory and Applications

  title={On Tilted Losses in Machine Learning: Theory and Applications},
  author={Tian Li and Ahmad Beirami and Maziar Sanjabi and Virginia Smith},
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM—tilted empirical risk minimization (TERM)—which uses exponential tilting to flexibly tune the impact… Expand
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives
  • Qi Qi, Zhishuai Guo, Yi Xu, Rong Jin, Tianbao Yang
  • Computer Science, Mathematics
  • 2020
A class of DRO with an KL divergence regularization on the dual variables is considered, the minmax problem is transformed into a compositional minimization problem, and a practical duality-free online stochastic methods without requiring a large mini-batch size are proposed. Expand
Attentional Biased Stochastic Gradient for Imbalanced Classification
The method is a simple modification to momentum SGD where an attentional mechanism to assign an individual importance weight to each gradient in the mini-batch where the scaling factor is interpreted as the regularization parameter in the framework of information-regularized distributionally robust optimization. Expand


Model Projection: Theory and Applications to Fair Machine Learning
The model projection formulation can be directly used to design fair models according to different group fairness metrics and generalizes existing approaches within the fair machine learning literature. Expand
Learning to Reweight Examples for Robust Deep Learning
This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available. Expand
Better generalization with less data using robust gradient descent
A technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure, and which illustrates that more efficient and reliable learning is possible without prior knowledge of the loss tails. Expand
On Human-Aligned Risk Minimization
This paper empirically studies a class of human-aligned risk measures inspired by cumulative prospect theory, and empirically demonstrates their improved performance on desiderata such as fairness, in contrast to the traditional workhorse of expected loss minimization. Expand
Adaptive Normalized Risk-Averting Training for Deep Neural Networks
In practice, it is shown how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets, and provides a new perspective to address the non-convex optimization problem in DNNs. Expand
Fairness Without Demographics in Repeated Loss Minimization
This paper develops an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution and proves that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice. Expand
Learning with Average Top-k Loss
The \atk loss is a natural generalization of the two widely used aggregate losses, namely the average loss and the maximum loss, but can combine their advantages and mitigate their drawbacks to better adapt to different data distributions. Expand
Rényi Fair Inference
This paper uses Renyi correlation as a measure of fairness of machine learning models and develops a general training framework to impose fairness, and proposes a min-max formulation which balances the accuracy and fairness when solved to optimality. Expand
Biased Importance Sampling for Deep Neural Network Training
The loss value can be used as an alternative importance metric, and a way to efficiently approximate it for a deep model, using a small model trained for that purpose in parallel, which allows in particular to utilize a biased gradient estimate that implicitly optimizes a soft max-loss, and leads to better generalization performance. Expand
Can gradient clipping mitigate label noise?
It is proved that for the common problem of label noise in classification, standard gradient clipping does not in general provide robustness, and it is shown that a simple variant of gradient clipping is provably robust, and corresponds to suitably modifying the underlying loss function. Expand