• Corpus ID: 250627454

Improved optimization strategies for deep Multi-Task Networks

  title={Improved optimization strategies for deep Multi-Task Networks},
  author={Lucas Pascal and Pietro Michiardi and Xavier Bost and Benoit Huet and Maria A. Zuluaga},
—In Multi-Task Learning (MTL), it is a common practice to train multi-task networks by optimizing an objective function, which is a weighted average of the task-specific objective functions. Although the computational advantages of this strategy are clear, the complexity of the resulting loss landscape has not been studied in the literature. Arguably, its optimization may be more difficult than a separate optimization of the constituting task-specific objectives. In this work, we investigate the… 

Figures and Tables from this paper

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Extensive empirical studies about IMP reveal the following key insights: 1) performing gradient descent updates by alternating on diverse heterogeneous modalities, loss functions, and tasks, while also varying input resolutions, efficiently improves multimodal understanding.

Multi-task deep learning for glaucoma detection from color fundus images

This work aims at designing and training a novel multi-task deep learning model that leverages the similarities of related eye-fundus tasks and measurements used in glaucoma diagnosis, and its performance pairs with trained experts using ∼3.5 times fewer parameters than training each task separately.

FAMO: Fast Adaptive Multitask Optimization

  • Bo LiuYihao FengPeter StoneQiang Liu
  • Computer Science
  • 2023
This work introduces Fast Adaptive Multitask Optimization (FAMO), a dynamic weighting method that decreases task losses in a balanced way using O(1) space and time and shows comparable or superior performance to state-of-the-art gradient manipulation techniques.

Multi-Task Learning for Dense Prediction Tasks: A Survey

This survey provides a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision, explicitly emphasizing on dense prediction tasks.

Multi-Task Learning as Multi-Objective Optimization

This paper proposes an upper bound for the multi-objective loss and shows that it can be optimized efficiently, and proves that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions.

Many Task Learning With Task Routing

This paper introduces Many Task Learning (MaTL) as a special case of MTL where more than 20 tasks are performed by a single model and applies a conditional feature-wise transformation over the convolutional activations that enables a model to successfully perform a large number of tasks.

End-To-End Multi-Task Learning With Attention

The proposed Multi-Task Attention Network (MTAN) consists of a single shared network containing a global feature pool, together with a soft-attention module for each task, which allows learning of task-specific feature-level attention.

MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning

This work proposes a multi-stream multi-task network to take advantage of using feature representations from preceding frames in a video sequence for joint learning of segmentation, depth, and motion in order to better handle the difference in convergence rates of different tasks.

Gradient Surgery for Multi-Task Learning

This work identifies a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develops a simple yet general approach for avoiding such interference between task gradients.

Attentive Single-Tasking of Multiple Tasks

In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as

GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

A gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes is presented, showing that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks.

Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels

This paper proposes "stochastic filter groups" (SFG), a mechanism to assign convolution kernels in each layer to "specialist" and "generalist" groups, which are specific to and shared across different tasks, respectively.

Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings.