Corpus ID: 4703661

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

@inproceedings{Chen2018GradNormGN,
  title={GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks},
  author={Zhao Chen and Vijay Badrinarayanan and Chen-Yu Lee and Andrew Rabinovich},
  booktitle={ICML},
  year={2018}
}
Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. [...] Key Result Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.Expand
Regularizing Deep Multi-Task Networks using Orthogonal Gradients
TLDR
This work proposes a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients and encourages task specific decoders to optimize different parts of the feature extractor, thus reducing competition. Expand
A Comparison of Loss Weighting Strategies for Multi task Learning in Deep Neural Networks
TLDR
It is found that multi-task learning typically does not improve performance for a user-defined combination of tasks, and requires careful selection of both task pairs and weighting strategies to equal or exceed the performance of single task learning. Expand
Learning to Branch for Multi-Task Learning
TLDR
This work proposes a novel tree-structured design space that casts a tree branching operation as a gumbel-softmax sampling procedure that enables differentiable network splitting that is end-to-end trainable. Expand
ROTOGRAD: DYNAMIC GRADIENT HOMOGENIZATION
  • 2020
GradNorm (Chen et al., 2018) is a broadly used gradient-based approach for training multitask networks, where different tasks share, and thus compete during learning, for the network parameters.Expand
RotoGrad: Gradient Homogenization in Multitask Learning
TLDR
RotoGrad is introduced, an algorithm that tackles negative transfer as a whole: it jointly homogenizes gradient magnitudes and directions, while ensuring training convergence, and is shown to outperforms competing methods in complex problems. Expand
A Closer Look at Loss Weighting in Multi-Task Learning
  • Baijiong Lin, Feiyang Ye, Yu Zhang
  • Computer Science
  • ArXiv
  • 2021
TLDR
It is surprisingly found that training a MTL model with random weights sampled from a distribution can achieve comparable performance over state-of-the-art baselines and is proposed as Random Loss Weighting (RLW), which can be implemented in only one additional line of code over existing works. Expand
Towards Impartial Multi-task Learning
TLDR
This paper proposes an impartial multi-task learning (IMTL) that can be end-to-end trained without any heuristic hyper-parameter tuning, and is general to be applied on all kinds of losses without any distribution assumption. Expand
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference
TLDR
The reparameterization enables the model to learn new tasks without adversely affecting the performance of existing ones and achieves state-of-the-art on two challenging multi-task learning benchmarks, PASCAL-Context and NYUD, and also demonstrates superior incremental learning capability as compared to its close competitors. Expand
Optimization Strategies in Multi-Task Learning: Averaged or Independent Losses?
TLDR
This work investigates the benefits of alternating independent gradient descent steps on the different task-specific objective functions and forms a novel way to combine this approach with state-of-the-art optimizers, and proposes a random task grouping as a trade-off between better optimization and computational efficiency. Expand
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
TLDR
This work presents Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency, and discusses how GradDrop reveals links between optimal multiloss training and gradient stochasticity. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Learning Multiple Tasks with Deep Relationship Networks
TLDR
This work proposes a novel Deep Relationship Network (DRN) architecture for multi-task learning by discovering correlated tasks based on multiple task-specific layers of a deep convolutional neural network that yields state-of-the-art classification results on standard multi-domain object recognition datasets. Expand
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
TLDR
A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings. Expand
Learning Multiple Tasks with Multilinear Relationship Networks
TLDR
Multilinear Relationship Networks (MRN) is presented that discover the task relationships based on novel tensor normal priors over parameter tensors of multiple task-specific layers in deep convolutional networks and yields state-of-the-art results on three multi-task learning datasets. Expand
Multitask Learning
  • R. Caruana
  • Computer Science
  • Encyclopedia of Machine Learning and Data Mining
  • 1998
TLDR
Suggestions for how to get the most out of multitask learning in artificial neural nets are presented, an algorithm forMultitask learning with case-based methods like k-nearest neighbor and kernel regression is presented, and an algorithms for multitasklearning in decision trees are sketched. Expand
Cross-Stitch Networks for Multi-task Learning
TLDR
This paper proposes a principled approach to learn shared representations in Convolutional Networks using multitask learning using a new sharing unit: "cross-stitch" unit that combines the activations from multiple networks and can be trained end-to-end. Expand
Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification
TLDR
Evaluation on person attributes classification tasks involving facial and clothing attributes suggests that the models produced by the proposed method are fast, compact and can closely match or exceed the state-of-the-art accuracy from strong baselines by much more expensive models. Expand
UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory
  • I. Kokkinos
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
In this work we train in an end-to-end manner a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a network can act likeExpand
Task Clustering and Gating for Bayesian Multitask Learning
TLDR
A Bayesian approach is adopted in which some of the model parameters are shared and others more loosely connected through a joint prior distribution that can be learned from the data to combine the best parts of both the statistical multilevel approach and the neural network machinery. Expand
Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning
TLDR
It is presented that MTL can improve the generalization performance of shared tasks and a grouping method based on the weights in the top layer to make MTL more effective is proposed to take full advantage of weight sharing in the deep architecture. Expand
...
1
2
3
4
...