Empirical evaluation of multi-task learning in deep neural networks for natural language processing

@article{Li2020EmpiricalEO,
  title={Empirical evaluation of multi-task learning in deep neural networks for natural language processing},
  author={Jianquan Li and Xiaokang Liu and Wenpeng Yin and Min Yang and Liqun Ma},
  journal={Neural Computing and Applications},
  year={2020},
  volume={33},
  pages={4417-4428}
}
Multi-task learning (MTL) aims at boosting the overall performance of each individual task by leveraging useful information contained in multiple-related tasks. It has shown great success in natural language processing (NLP). Currently, a number of MTL architectures and learning mechanisms have been proposed for various NLP tasks, including exploring linguistic hierarchies, orthogonality constraints, adversarial learning, gate mechanism, and label embedding. However, there is no systematic… Expand
Deeper Task-Specificity Improves Joint Entity and Relation Extraction
TLDR
This work suggests that previous solutions to joint NER and RE undervalue task-specificity and demonstrates the importance of correctly balancing the number of shared and task- specific parameters for MTL approaches in general. Expand
A Brief Review of Deep Multi-task Learning and Auxiliary Task Learning
TLDR
A brief review on the recent deep multi-task learning approaches followed by methods on selecting useful auxiliary tasks that can be used in dMTL to improve the performance of the model for the main task is provided. Expand
Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models
TLDR
This work focuses on the transformer encoder-decoder model for the open-domain dialogue response generation task, and finds that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. Expand
Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2
TLDR
This work takes advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2, to solve this challenge by performing text summarization on this dataset and provides abstractive and comprehensive information based on keywords extracted from the original articles. Expand
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models
TLDR
This work focuses on the transformer encoder-decoder model for the open-domain dialogue response generation task and proposes an intuitive fine-tuning strategy named "mix-review", which finds that mix-review effectively regularize the fine- Tuning process, and the forgetting problem is largely alleviated. Expand

References

SHOWING 1-10 OF 40 REFERENCES
Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations. Expand
Sluice networks: Learning what to share between loosely related tasks
TLDR
Sluice Networks is introduced, a general framework for multi-task learning where trainable parameters control the amount of sharing and it is shown that a) label entropy is predictive of gains in sluice networks, confirming findings for hard parameter sharing and b) while slUice networks easily fit noise, they are robust across domains in practice. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
Latent Multi-Task Architecture Learning
TLDR
This work presents an approach that learns a latent multi-task architecture that jointly addresses (a)--(c) and consistently outperforms previous approaches to learning latent architectures for multi- task problems and achieves up to 15% average error reductions over common approaches to MTL. Expand
An Overview of Multi-Task Learning in Deep Neural Networks
TLDR
This article seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks, particularly in deep neural networks. Expand
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
TLDR
A joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks and uses a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Expand
Gated Multi-Task Network for Text Classification
TLDR
This paper introduces gate mechanism into multi-task CNN and proposes a new Gated Sharing Unit, which can filter the feature flows between tasks and greatly reduce the interference. Expand
Deep multi-task learning with low level tasks supervised at lower layers
TLDR
It is consistently better to have POS supervision at the innermost rather than the outermost layer, and it is argued that “lowlevel” tasks are better kept at the lower layers, enabling the higher- level tasks to make use of the shared representation of the lower-level tasks. Expand
Adversarial Multi-task Learning for Text Classification
TLDR
This paper proposes an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other, and conducts extensive experiments on 16 different text classification tasks, which demonstrates the benefits of the approach. Expand
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
TLDR
The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models. Expand
...
1
2
3
4
...