• Corpus ID: 238260065

Optimization Strategies in Multi-Task Learning: Averaged or Independent Losses?

  title={Optimization Strategies in Multi-Task Learning: Averaged or Independent Losses?},
  author={Lucas Pascal and Pietro Michiardi and Xavier Bost and Benoit Huet and Maria A. Zuluaga},
In Multi-Task Learning (MTL), it is a common practice to train multi-task networks by optimizing an objective function, which is a weighted average of the task-specific objective functions. Although the computational advantages of this strategy are clear, the complexity of the resulting loss landscape has not been studied in the literature. Arguably, its optimization may be more difficult than a separate optimization of the constituting task-specific objectives. In this work, we investigate the… 

Figures and Tables from this paper


Multi-Task Learning as Multi-Objective Optimization
This paper proposes an upper bound for the multi-objective loss and shows that it can be optimized efficiently, and proves that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions.
Many Task Learning With Task Routing
This paper introduces Many Task Learning (MaTL) as a special case of MTL where more than 20 tasks are performed by a single model and applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks.
MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning
This work proposes a multi-stream multi-task network to take advantage of using feature representations from preceding frames in a video sequence for joint learning of segmentation, depth, and motion in order to better handle the difference in convergence rates of different tasks.
End-To-End Multi-Task Learning With Attention
The proposed Multi-Task Attention Network (MTAN) consists of a single shared network containing a global feature pool, together with a soft-attention module for each task, which allows learning of task-specific feature-level attention.
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
A gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes is presented, showing that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks.
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings.
Attentive Single-Tasking of Multiple Tasks
In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as
Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels
This paper proposes "stochastic filter groups" (SFG), a mechanism to assign convolution kernels in each layer to "specialist" and "generalist" groups, which are specific to and shared across different tasks, respectively.
Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
This work learns binary masks that “piggyback” on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task, and shows performance comparable to dedicated fine-tuned networks for a variety of classification tasks.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.