• Corpus ID: 4703661

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

@inproceedings{Chen2018GradNormGN,
  title={GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks},
  author={Zhao Chen and Vijay Badrinarayanan and Chen-Yu Lee and Andrew Rabinovich},
  booktitle={ICML},
  year={2018}
}
Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. [...] Key Result Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.Expand
Regularizing Deep Multi-Task Networks using Orthogonal Gradients
TLDR
This work proposes a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients and encourages task specific decoders to optimize different parts of the feature extractor, thus reducing competition.
A Comparison of Loss Weighting Strategies for Multi task Learning in Deep Neural Networks
TLDR
It is found that multi-task learning typically does not improve performance for a user-defined combination of tasks, and requires careful selection of both task pairs and weighting strategies to equal or exceed the performance of single task learning.
Learning to Branch for Multi-Task Learning
TLDR
This work proposes a novel tree-structured design space that casts a tree branching operation as a gumbel-softmax sampling procedure that enables differentiable network splitting that is end-to-end trainable.
ROTOGRAD: DYNAMIC GRADIENT HOMOGENIZATION
  • 2020
GradNorm (Chen et al., 2018) is a broadly used gradient-based approach for training multitask networks, where different tasks share, and thus compete during learning, for the network parameters.
RotoGrad: Gradient Homogenization in Multitask Learning
TLDR
RotoGrad is introduced, an algorithm that tackles negative transfer as a whole: it jointly homogenizes gradient magnitudes and directions, while ensuring training convergence, and is shown to outperforms competing methods in complex problems.
A Closer Look at Loss Weighting in Multi-Task Learning
  • Baijiong Lin, Feiyang Ye, Yu Zhang
  • Computer Science
    ArXiv
  • 2021
TLDR
It is surprisingly found that training a MTL model with random weights sampled from a distribution can achieve comparable performance over state-of-the-art baselines and is proposed as Random Loss Weighting (RLW), which can be implemented in only one additional line of code over existing works.
Towards Impartial Multi-task Learning
TLDR
This paper proposes an impartial multi-task learning (IMTL) that can be end-to-end trained without any heuristic hyper-parameter tuning, and is general to be applied on all kinds of losses without any distribution assumption.
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference
TLDR
The reparameterization enables the model to learn new tasks without adversely affecting the performance of existing ones and achieves state-of-the-art on two challenging multi-task learning benchmarks, PASCAL-Context and NYUD, and also demonstrates superior incremental learning capability as compared to its close competitors.
Optimization Strategies in Multi-Task Learning: Averaged or Independent Losses?
TLDR
This work investigates the benefits of alternating independent gradient descent steps on the different task-specific objective functions and forms a novel way to combine this approach with state-of-the-art optimizers, and proposes a random task grouping as a trade-off between better optimization and computational efficiency.
Optimization Strategies in Multi-Task Learning: Averaged or Separated Losses?
TLDR
This work investigates the benefits of alternating independent gradient descent steps on the different task-specific objective functions and forms a novel way to combine this approach with state-of-the-art optimizers, and proposes a random task grouping as a trade-off between better optimization and computational efficiency.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Learning Multiple Tasks with Deep Relationship Networks
TLDR
This work proposes a novel Deep Relationship Network (DRN) architecture for multi-task learning by discovering correlated tasks based on multiple task-specific layers of a deep convolutional neural network that yields state-of-the-art classification results on standard multi-domain object recognition datasets.
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
TLDR
A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings.
Learning Multiple Tasks with Multilinear Relationship Networks
TLDR
Multilinear Relationship Networks (MRN) is presented that discover the task relationships based on novel tensor normal priors over parameter tensors of multiple task-specific layers in deep convolutional networks and yields state-of-the-art results on three multi-task learning datasets.
Multitask Learning
  • R. Caruana
  • Computer Science
    Encyclopedia of Machine Learning and Data Mining
  • 1998
TLDR
Suggestions for how to get the most out of multitask learning in artificial neural nets are presented, an algorithm forMultitask learning with case-based methods like k-nearest neighbor and kernel regression is presented, and an algorithms for multitasklearning in decision trees are sketched.
Cross-Stitch Networks for Multi-task Learning
TLDR
This paper proposes a principled approach to learn shared representations in Convolutional Networks using multitask learning using a new sharing unit: "cross-stitch" unit that combines the activations from multiple networks and can be trained end-to-end.
Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification
TLDR
Evaluation on person attributes classification tasks involving facial and clothing attributes suggests that the models produced by the proposed method are fast, compact and can closely match or exceed the state-of-the-art accuracy from strong baselines by much more expensive models.
UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory
  • I. Kokkinos
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
In this work we train in an end-to-end manner a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a network can act like
Task Clustering and Gating for Bayesian Multitask Learning
TLDR
A Bayesian approach is adopted in which some of the model parameters are shared and others more loosely connected through a joint prior distribution that can be learned from the data to combine the best parts of both the statistical multilevel approach and the neural network machinery.
Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning
TLDR
It is presented that MTL can improve the generalization performance of shared tasks and a grouping method based on the weights in the top layer to make MTL more effective is proposed to take full advantage of weight sharing in the deep architecture.
...
1
2
3
4
...