BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

@inproceedings{Stickland2019BERTAP,
  title={BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning},
  author={Asa Cooper Stickland and Iain Murray},
  booktitle={ICML},
  year={2019}
}
Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using… CONTINUE READING
16
Twitter Mentions

Figures, Tables, and Topics from this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 32 REFERENCES

Efficient Parametrization of Multi-domain Deep Neural Networks

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models

  • 2014 IEEE Spoken Language Technology Workshop (SLT)
  • 2014
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Multitask Learning

VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Similar Papers

Loading similar papers…