Learning Sparse Sharing Architectures for Multiple Tasks

@article{Sun2020LearningSS,
  title={Learning Sparse Sharing Architectures for Multiple Tasks},
  author={Tianxiang Sun and Yunfan Shao and Xiaonan Li and Pengfei Liu and Hang Yan and Xipeng Qiu and Xuanjing Huang},
  journal={ArXiv},
  year={2020},
  volume={abs/1911.05034}
}
Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not easy since it is difficult to understand the underlying shared factors among these tasks. In this paper, we propose a novel parameter sharing mechanism, named \emph{Sparse Sharing}. Given multiple tasks, our approach automatically finds a sparse sharing… Expand
Multi-Task Learning with Deep Neural Networks: A Survey
TLDR
An overview of multi-task learning methods for deep neural networks is given, with the aim of summarizing both the well-established and most recent directions within the field. Expand
Learning Twofold Heterogeneous Multi-Task by Sharing Similar Convolution Kernel Pairs
TLDR
A simple and effective multi-task adaptive learning (MTAL) network to learn multiple tasks in such THMTL setting by exploring and utilizing the inherent relationship between tasks for knowledge sharing from similar convolution kernels in individual layers of the MTAL network. Expand
Progressive Multi-task Learning with Controlled Information Flow for Joint Entity and Relation Extraction
TLDR
A multitask learning architecture based on the observation that correlations exist between outputs of some related tasks, and they reflect the relevant features that need to be extracted from the input, which is referred to as Progressive Multitask learning model with Explicit Interactions (PMEI). Expand
Neuron-Connection Level Sharing for Multi-Task Learning in Video Conversion Rate Prediction
  • XUANJI XIAO
  • 2021
Click-through Rate (CTR) and post-click conversion rate(CVR) predictions are two fundamental modules in industrial ranking systems such as recommender systems, advertising, and search engines. MostExpand
Finding Sparse Structures for Domain Specific Neural Machine Translation
TLDR
PRUNE-TUNE is a novel domain adaptation method via gradual pruning that alleviates the over-fitting and the degradation problem without model modification, and is able to sequentially learn a single network with multiple disjoint domainspecific sub-networks for multiple domains. Expand
Multi-Task Learning for Sentiment Analysis with Hard-Sharing and Task Recognition Mechanisms
TLDR
A task recognition mechanism is designed to reduce the interference of the hard-shared feature space and also to enhance the correlation between multiple tasks, providing a new solution reducing interference in the shared feature space for sentiment analysis. Expand
Learning Language Specific Sub-network for Multilingual Machine Translation
TLDR
The proposed LaSS is a single unified multilingual MT model that learns Language Specific Sub-network (LaSS) for each language pair to counter parameter interference and shows its strong generalization performance at easy extension to new language pairs and zero-shot translation. Expand
Finding Sparse Structure for Domain Specific Neural Machine Translation
TLDR
Empirical experiment results show that PRUNE-TUNE outperforms several strong competitors in the target domain test set without the quality degradation of the general domain in both single and multiple domain settings. Expand
AdaptCL: Efficient Collaborative Learning with Dynamic and Adaptive Pruning
TLDR
A novel and efficient collaborative learning framework named AdaptCL, which generates an adaptive sub-model dynamically from the global base model for each data holder, without any prior information about worker capability, and achieves time savings of more than 41% on average and improves accuracy in a low heterogeneous environment. Expand
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
TLDR
It is demonstrated that adaptively sharing the super tickets across tasks benefits multi-task learning, and that the phase transition is task and model dependent — as model size becomes larger and training data set becomes smaller, the transition becomes more pronounced. Expand
...
1
2
...

References

SHOWING 1-10 OF 37 REFERENCES
Latent Multi-Task Architecture Learning
TLDR
This work presents an approach that learns a latent multi-task architecture that jointly addresses (a)--(c) and consistently outperforms previous approaches to learning latent architectures for multi- task problems and achieves up to 15% average error reductions over common approaches to MTL. Expand
Meta Multi-Task Learning for Sequence Modeling
TLDR
A shared meta-network is used to capture the meta-knowledge of semantic composition and generate the parameters of the task-specific semantic composition models in a new sharing scheme of composition function across multiple tasks. Expand
Cross-Stitch Networks for Multi-task Learning
TLDR
This paper proposes a principled approach to learn shared representations in Convolutional Networks using multitask learning using a new sharing unit: "cross-stitch" unit that combines the activations from multiple networks and can be trained end-to-end. Expand
Adversarial Multi-task Learning for Text Classification
TLDR
This paper proposes an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other, and conducts extensive experiments on 16 different text classification tasks, which demonstrates the benefits of the approach. Expand
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
TLDR
A hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. Expand
Learning Multi-Task Communication with Message Passing for Sequence Learning
TLDR
This work adopts the idea from message-passing graph neural networks, and proposes a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. Expand
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
TLDR
A principled approach to multi-task deep learning is proposed which weighs multiple loss functions by considering the homoscedastic uncertainty of each task, allowing us to simultaneously learn various quantities with different units or scales in both classification and regression settings. Expand
Deep multi-task learning with low level tasks supervised at lower layers
TLDR
It is consistently better to have POS supervision at the innermost rather than the outermost layer, and it is argued that “lowlevel” tasks are better kept at the lower layers, enabling the higher- level tasks to make use of the shared representation of the lower-level tasks. Expand
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
TLDR
A joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks and uses a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Expand
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
TLDR
This work presents a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model and demonstrates that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods. Expand
...
1
2
3
4
...