BAM! Born-Again Multi-Task Networks for Natural Language Understanding

@inproceedings{Clark2019BAMBM,
  title={BAM! Born-Again Multi-Task Networks for Natural Language Understanding},
  author={Kevin Clark and Minh-Thang Luong and Urvashi Khandelwal and Christopher D. Manning and Quoc V. Le},
  booktitle={ACL},
  year={2019}
}
It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on… CONTINUE READING

References

Publications referenced by this paper.
SHOWING 1-10 OF 44 REFERENCES

Massive multi-task learning with snorkel metal: Bringing more supervision to bear

Braden Hancock, Clara McCreery, +6 authors R Chris
  • 2019
VIEW 1 EXCERPT
HIGHLY INFLUENTIAL

Deep Contextualized Word Representations

VIEW 2 EXCERPTS
HIGHLY INFLUENTIAL

Semi-supervised Sequence Learning

VIEW 1 EXCERPT
HIGHLY INFLUENTIAL