• Corpus ID: 243861553

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

@inproceedings{Frantar2021MFACEM,
  title={M-FAC: Efficient Matrix-Free Approximations of Second-Order Information},
  author={Elias Frantar and Eldar Kurtic and Dan Alistarh},
  booktitle={NeurIPS},
  year={2021}
}
Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which limits their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the… 

SPDY: Accurate Pruning with Speedup Guarantees

SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, is introduced, while minimizing accuracy loss.

How Well Do Sparse ImageNet Models Transfer?

This study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups.

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

GPTQ is proposed, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient, and can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline.

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

A new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods is introduced.

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers

The results show for the first time that ViT-family models can in fact be pruned to high sparsity levels, and it is shown that the oViT method is compatible with structured pruning methods and quantization, and that it can lead to speedups on a sparsity-aware inference engine.

References

SHOWING 1-10 OF 61 REFERENCES

A Kronecker-factored approximate Fisher matrix for convolution layers

Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

K-FAC is an efficient method for approximating natural gradient descent in neural networks which is based on an efficiently invertible approximation of a neural network's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse.

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

It is demonstrated that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning and can be extended to take into account first-order information, as well as illustrate its ability to automatically set layer-wise pruning thresholds and perform compression in the limited-data regime.

Distributed Second-Order Optimization using Kronecker-Factored Approximations

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

ADAHESSIAN is a new stochastic optimization algorithm that directly incorporates approximate curvature information from the loss function, and it includes several novel performance-improving features, including a fast Hutchinson based method to approximate the curvature matrix with low computational overhead.

Kronecker-factored Curvature Approximations for Recurrent Neural Networks

This work extends the K-FAC method to handle RNNs by introducing a novel approximation to the FIM for FIM, and demonstrates that this method significantly outperforms general purpose state-of-the-art optimizers like SGD with momentum and Adam on several challenging RNN training tasks.

An Evaluation of Fisher Approximations Beyond Kronecker Factorization

Two coarser approximations on top of a Kronecker factorization of the Fisher Information Matrix are studied, to scale up Natural Gradient to deep and wide Convolutional Neural Networks (CNNs) and yield a further block-diagonal approximation tailored for CNNs.

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Surprisingly, experimental results demonstrate that the authors can update only 1-4% of the weights at each back propagation pass, and the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given.

Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks

This paper presents a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and also scales up substantially better as more computational resources become available.

Efficient Full-Matrix Adaptive Regularization

The preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefully-preconditioned steps sometimes lead to a better solution.
...