Corpus ID: 207847573

What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning

@article{Lee2019WhatWE,
  title={What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning},
  author={Jin-Chun Lee and Raphael Tang and Jimmy Lin},
  journal={ArXiv},
  year={2019},
  volume={abs/1911.03090}
}
  • Jin-Chun Lee, Raphael Tang, Jimmy Lin
  • Published 2019
  • Computer Science
  • ArXiv
  • Pretrained transformer-based language models have achieved state of the art across countless tasks in natural language processing. These models are highly expressive, comprising at least a hundred million parameters and a dozen layers. Recent evidence suggests that only a few of the final layers need to be fine-tuned for high quality on downstream tasks. Naturally, a subsequent research question is, "how many of the last layers do we need to fine-tune?" In this paper, we precisely answer this… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 28 REFERENCES

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    VIEW 2 EXCERPTS
    HIGHLY INFLUENTIAL

    Parameter-Efficient Transfer Learning for NLP

    VIEW 3 EXCERPTS
    HIGHLY INFLUENTIAL

    One billion word benchmark for measuring progress in statistical language modeling

    VIEW 2 EXCERPTS

    Attention is All you Need

    VIEW 2 EXCERPTS

    XLNet: Generalized Autoregressive Pretraining for Language Understanding

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL