HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

Jiaqi Gu; Ben Keller; Jean Kossaifi; Anima Anandkumar; Brucek Khailany; D. Pan

DOI:10.48550/arXiv.2211.16749
Corpus ID: 254096167

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

@article{Gu2022HEATHA,
  title={HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression},
  author={Jiaqi Gu and Ben Keller and Jean Kossaifi and Anima Anandkumar and Brucek Khailany and David Z. Pan},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.16749},
  url={https://api.semanticscholar.org/CorpusID:254096167}
}

Jiaqi GuBen Keller D. Pan
Published in arXiv.org 30 November 2022
Engineering, Computer Science

AHardware-aware tensor decomposition framework is proposed that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization and jointly investigates tensor contraction path optimizations and a fused Einsum mapping strategy.

[PDF] Semantic Reader

Figures and Tables from this paper

Topics

HEAT Transformer BERT Variant Hardwareefficient Hardware Computer Vision Tensor Decompositions Natural Language Processing Energy-delay Product Self-attention

Learning Low-Rank Tensor Cores with Probabilistic ℓ0-Regularized Rank Selection for Model Compression

Tianxiao CaoLu SunCanh Hao NguyenHiroshi Mamitsuka

Computer Science

International Joint Conference on Artificial…

2024

A novel automatic rank selection method for deep model compression that allows learning model weights and decomposition ranks simultaneously and enables the automatic rank selection to be incorporated with arbitrary tensor decompositions and neural network layers such as linear layers, convolutional layers, and embedding layers is proposed.

Partial Tensorized Transformers for Natural Language Processing

Subhadra VadlamannatiRyan Solgi

Computer Science

International Conference on Agents and Artificial…

2024

This work focuses both on embedding-layer compression and partial tensorization of neural networks (PTNN) through an algorithmic approach, and significantly improves the accuracy of existing models by up to 5%, all without the need for post-training adjustments.

[PDF]

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

Ali Aghababaei HarandiMassih-Reza Amini

Computer Science

arXiv.org

2024

This paper presents a unified framework that simultaneously applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints, and maintains the performance of highly compressed models on par with their original counterparts.

[PDF]

ESPACE: Dimensionality Reduction of Activations for Model Compression

Charbel SakrBrucek Khailany

Computer Science

Neural Information Processing Systems

2024

Comparison with related works on compressing Llama2-7B via matrix factorization shows that ESPACE is a first step in advancing the state-of-the-art in tensor decomposition compression of LLMs.

[PDF]

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

Zi YangSamridhi ChoudharyXinfeng XieCao GaoSiegfried KunzmannZheng Zhang

Computer Science, Environmental Science

Neural Information Processing Systems

2024

CoMERA achieves rank-adaptive tensor-compressed (pre-training) via a multi-objective optimization formulation and improves the training to provide both a high compression ratio and excellent accuracy in the training process.

[PDF]

Quantum-Inspired Tensor Network for Earth Science

Soronzonbold OtgonbaatarD. Kranzlmüller

Physics, Environmental Science

IEEE International Geoscience and Remote Sensing…

2023

A quantum-inspired tensor network is employed for compressing trainable parameters of physics-informed neural networks (PINNs) in Earth science and the spectral resolution of remotely-sensed images is improved by employing tensor decomposition.

[PDF]

Smartformer: An intelligent transformer compression framework for time-series modeling

Xiaojian WangYinan WangJin YangYing Chen

Computer Science, Engineering

IISE Transactions

2024

An intelligent model compression framework, Smartformer, is proposed by incorporating reinforcement learning and CP-decomposition techniques to satisfy the aforementioned three objectives and can mitigate the overfitting issue and thus improve the accuracy of the existing time-series models in all scenarios.

Gradient-Free Structured Pruning with Unlabeled Data

Azade NovaH. DaiD. Schuurmans

Computer Science

International Conference on Machine Learning

2023

This paper proposes a gradient-free structured pruning framework that uses only unlabeled data and shows that up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.

[PDF]

Transformers in Speech Processing: A Survey

S. LatifAun ZaidiH. CuayáhuitlFahad ShamshadMoazzam ShoukatJunaid Qadir

Computer Science, Linguistics

arXiv.org

2023

By consolidating findings from across the speech technology landscape, this paper provides a valuable resource for researchers interested in harnessing the power of transformers to advance the field.

[PDF]

Deeply Tensor Compressed Transformers for End-to-End Object Detection

Peining ZhenZiyang GaoTianshu HouYuan ChengHai-Bao Chen

Computer Science, Engineering

AAAI Conference on Artificial Intelligence

2022

This paper proposes to deeply compress the transformers with low-rank tensor decomposition to obtain a compact end-to-end detection framework and proposes a gated multi-head attention (GMHA) module to mitigate the accuracy drop of the tensor-compressed DETR models.

TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network

Chunhua DengFangxuan SunXuehai QianJun LinZhongfeng WangBo Yuan

Computer Science, Engineering

International Symposium on Computer Architecture

2019

A computation-efficient inference scheme for TT-format DNN, which enjoys two key merits: 1) it achieves theoretical limit of number of multiplications, thus eliminating all redundant computations; and 2) the multi-stage processing scheme reduces the intensive memory access to all tensor cores, bringing significant energy saving.

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Chunxing YinBilge AcunXing LiuCarole-Jean Wu

Computer Science

Conference on Machine Learning and Systems

2021

The promising potential of Tensor Train decomposition for DLRMs (TT-Rec) is demonstrated and the effect of weight initialization distribution on DLRM accuracy and proposed to initialize the tensor cores of TT-Rec following the sampled Gaussian distribution is presented.

[PDF]

Tensor Methods in Computer Vision and Deep Learning

Yannis PanagakisJean Kossaifi S. Zafeiriou

Computer Science

Proceedings of the IEEE

2021

This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning, with a particular focus on visual data analysis and computer vision applications.

[PDF]

Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination

Cole HawkinsXing-er LiuZheng Zhang

Computer Science

SIAM Journal on Mathematics of Data Science

2022

This work provides the first general-purpose rank-adaptive framework for end-to-end tensorized training of neural networks and develops a scalable stochastic variational inference solver to estimate the posterior density of large-scale problems in training.

[PDF]

A Tensorized Transformer for Language Modeling

Xindian MaPeng Zhang M. Zhou

Computer Science

Neural Information Processing Systems

2019

A novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD) with tensor train decomposition is proposed, which can not only largely compress the model parameters but also obtain performance improvements.

MiniViT: Compressing Vision Transformers with Weight Multiplexing

Jinnian ZhangHouwen Peng Lu Yuan

Computer Science

Computer Vision and Pattern Recognition

2022

MiniViT, a new compression framework, which achieves parameter reduction in vision transformers while retaining the same performance, and makes the weights shared across layers, while imposing a transformation on the weights to increase diversity.

[PDF]

Tensor Decomposition for Compressing Recurrent Neural Network

Andros TjandraS. SaktiSatoshi Nakamura

Computer Science

IEEE International Joint Conference on Neural…

2018

This paper utilizes several tensor decompositions method including CANDECOMP/PARAFAC, Tucker decomposition and Tensor Train to re-parameterize the Gated Recurrent Unit (GRU) RNN to reduce the number of parameters and maintain the expressive power from RNN simultaneously.

[PDF]

Tensorized Embedding Layers

Oleksii HrinchukValentin KhrulkovL. MirvakhabovaElena OrlovaI. Oseledets

Computer Science

Findings

2020

A novel way of parameterizing embedding layers based on the Tensor Train decomposition is introduced, which allows compressing the model significantly at the cost of a negligible drop or even a slight gain in performance.

Tensorizing Neural Networks

Alexander NovikovD. PodoprikhinA. OsokinD. Vetrov

Computer Science, Mathematics

Neural Information Processing Systems

2015

This paper converts the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved.

[PDF]

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

Ask This Paper

Ask a question about " "

Supporting Statements

Figures and Tables from this paper

Topics

Learning Low-Rank Tensor Cores with Probabilistic ℓ0-Regularized Rank Selection for Model Compression

Partial Tensorized Transformers for Natural Language Processing

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

ESPACE: Dimensionality Reduction of Activations for Model Compression

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

Quantum-Inspired Tensor Network for Earth Science

Smartformer: An intelligent transformer compression framework for time-series modeling

Gradient-Free Structured Pruning with Unlabeled Data

Transformers in Speech Processing: A Survey

Deeply Tensor Compressed Transformers for End-to-End Object Detection

TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Tensor Methods in Computer Vision and Deep Learning

Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination

A Tensorized Transformer for Language Modeling

MiniViT: Compressing Vision Transformers with Weight Multiplexing

Tensor Decomposition for Compressing Recurrent Neural Network

Tensorized Embedding Layers

Tensorizing Neural Networks

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

Ask This Paper

Ask a question about " "

Supporting Statements

Figures and Tables from this paper

Topics

9 Citations

Learning Low-Rank Tensor Cores with Probabilistic ℓ0-Regularized Rank Selection for Model Compression

Partial Tensorized Transformers for Natural Language Processing

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

ESPACE: Dimensionality Reduction of Activations for Model Compression

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

Quantum-Inspired Tensor Network for Earth Science

Smartformer: An intelligent transformer compression framework for time-series modeling

Gradient-Free Structured Pruning with Unlabeled Data

Transformers in Speech Processing: A Survey

34 References

Deeply Tensor Compressed Transformers for End-to-End Object Detection

TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Tensor Methods in Computer Vision and Deep Learning

Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination

A Tensorized Transformer for Language Modeling

MiniViT: Compressing Vision Transformers with Weight Multiplexing

Tensor Decomposition for Compressing Recurrent Neural Network

Tensorized Embedding Layers

Tensorizing Neural Networks

Related Papers