BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

@article{Stickland2019BERTAP,
  title={BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning},
  author={Asa Cooper Stickland and Iain Murray},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.02671}
}
Multi-task learning allows the sharing of useful information between multiple related tasks. In natural language processing several recent approaches have successfully leveraged unsupervised pre-training on large amounts of data to perform well on various tasks, such as those in the GLUE benchmark. These results are based on fine-tuning on each task separately. We explore the multi-task learning setting for the recent BERT model on the GLUE benchmark, and how to best add task-specific… CONTINUE READING
12
Twitter Mentions

References

Publications referenced by this paper.
SHOWING 1-10 OF 33 REFERENCES

Efficient Parametrization of Multi-domain Deep Neural Networks

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models

  • 2014 IEEE Spoken Language Technology Workshop (SLT)
  • 2014
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Multitask Learning

  • Learning to Learn
  • 1998
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL