Corpus ID: 209072953

Faster and Just As Accurate: A Simple Decomposition for Transformer Models

@inproceedings{Cao2019FasterAJ,
  title={Faster and Just As Accurate: A Simple Decomposition for Transformer Models},
  author={Qingqing Cao and H. Trivedi and A. Balasubramanian and Niranjan Balasubramanian},
  year={2019}
}

Topics from this paper

EXTREME MODEL COMPRESSION
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training (Jacob et al., 2018),Expand
Training with Quantization Noise for Extreme Model Compression
TLDR
This paper proposes to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights, establishing new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. Expand
Analyzing Redundancy in Pretrained Transformer Models
TLDR
An efficient feature-based transfer learning procedure is presented, which maintains 97% performance while using at-most 10% of the original neurons of the model, and reveals interesting insights about redundancy. Expand
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
TLDR
This systematic study identifies the state of the art in compression for each part of BERT, clarifies current best practices for compressing large-scale Transformer models, and provides insights into the inner workings of various methods. Expand
Training with Quantization Noise for Extreme Fixed-Point Compression
TLDR
This paper proposes to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights, establishing new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. Expand