• Corpus ID: 231749529

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

  title={AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning},
  author={Yuhan Liu and Saurabh Agarwal and Shivaram Venkataraman},
With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine-tuning models pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours when using GPUs. While prior work proposes limiting the number of layers that are fine-tuned, e.g., freezing all layers but the last layer, we find that such static approaches lead to reduced accuracy. We propose, AutoFreeze, a system that uses an… 
Train Deep Neural Networks in 40-D Subspaces
A Dynamic Linear Dimensionality Reduction (DLDR) is proposed, which dramatically reduces the parameter space to a variable subspace of significantly lower dimension, and a quasi-Newtonbased algorithm is developed to train these variables obtained by DLDR, rather than the original parameters of neural networks.
HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
A novel unified parameterefficient transfer learning framework that works effectively on both pure language and V&L tasks and adds fewer trainable parameters in multi-task learning while achieves superior performances and transfer ability compared to state-of-the-art methods.
Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning
This work shows two approaches for improving the performance of BERT-based bi-encoders by replacing the full fine-tuning step with a lightweight fine- Tuning and developing semi-Siamese models where queries and documents are handled with a limited amount of difference.
Enriching Language Models with Visually-grounded Word Vectors and the Lancaster Sensorimotor Norms
It is found that enriching language models with the Lancaster norms and image vectors improves results in both tasks, with some implications for robust language models that capture holistic linguistic meaning in a language learning context.
Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets
This work cast the DTL model selection in the presence of frozen layers as an instance of the multi-query optimization and proposes two optimizations that reduce redundant computations and training overheads.


What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
This paper examines two recent pretrained language models, BERT and RoBERTa, across standard tasks in textual entailment, semantic similarity, sentiment analysis, and linguistic acceptability, and shows that only a fourth of the final layers need to be fine-tuned to achieve 90% of the original quality.
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
This work proposes a method based on progressive layer dropping that speeds the training of Transformer-based language models, not at the cost of excessive hardware resources but from model architecture change and training technique boosted efficiency.
Parameter-Efficient Transfer Learning for NLP
To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.
Reducing Transformer Depth on Demand with Structured Dropout
LayerDrop, a form of structured dropout, is explored, which has a regularization effect during training and allows for efficient pruning at inference time, and shows that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance.
The Right Tool for the Job: Matching Model and Instance Complexities
This work proposes a modification to contextual representation fine-tuning which allows for an early (and fast) “exit” from neural network calculations for simple instances, and late (and accurate) exit for hard instances during inference.
FastBERT: a Self-distilling BERT with Adaptive Inference Time
A novel speed-tunable FastBERT with adaptive inference time that is able to speed up by a wide range from 1 to 12 times than BERT if given different speedup thresholds to make a speed-performance tradeoff.
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
This work proposes a simple but effective method, DeeBERT, to accelerate BERT inference, which allows samples to exit earlier without passing through the entire model, and provides new ideas to efficiently apply deep transformer-based models to downstream tasks.
Efficient Training of BERT by Progressively Stacking
This paper proposes the stacking algorithm to transfer knowledge from a shallow model to a deep model; then it applies stacking progressively to accelerate BERT training, and shows that the models trained by the training strategy achieve similar performance to models trained from scratch, but the algorithm is much faster.
Q8BERT: Quantized 8Bit BERT
This work shows how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by 4x with minimal accuracy loss and the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.
PyTorch: An Imperative Style, High-Performance Deep Learning Library
This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.