Tensorized Embedding Layers

@inproceedings{Hrinchuk2020TensorizedEL,
  title={Tensorized Embedding Layers},
  author={Oleksii Hrinchuk and Valentin Khrulkov and Leyla Mirvakhabova and Elena Orlova and I. Oseledets},
  booktitle={FINDINGS},
  year={2020}
}
The embedding layers transforming input words into real vectors are the key components of deep neural networks used in natural language processing. However, when the vocabulary is large, the corresponding weight matrices can be enormous, which precludes their deployment in a limited resource setting. We introduce a novel way of parameterizing embedding layers based on the Tensor Train decomposition, which allows compressing the model significantly at the cost of a negligible drop or even a… 

Figures and Tables from this paper

Exploring Extreme Parameter Compression for Pre-trained Language Models
TLDR
This work aims to explore larger compression ratios for PLMs, among which tensor decomposition is a potential but under-investigated one, and shows that the proposed method is orthogonal to existing compression methods like knowledge distillation.
Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices
TLDR
In Transformer models, the approach leads to more than ten-fold reduction in the number of total trainable parameters, including embedding, attention, and feed-forward layers, with little degradation in on-task performance.
Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination
TLDR
A Bayesian model is developed that supports various low-rank tensor formats and reduces neural network parameters with automatic rank determination during training and a customized Bayesian solver is developed to train large-scale tensorized neural networks.
Block-wise Word Embedding Compression Revisited: Better Weighting and Structuring
TLDR
A discriminative word embedding compression algorithm is constructed that more effectively finds word weights than competitors in most cases and can act like a framework through successful cooperation with quantization.
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
TLDR
This work proposes a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element and shows that such approximation can be achieved by computing optimal piecewiseconstant approximation of the derivative of the activation function, which can be done by dynamic programming.
Towards Green AI with tensor networks - Sustainability and innovation enabled by efficient algorithms
TLDR
This paper presents a promising tool for sustainable and thus Green AI: tensor networks (TNs), an established tool from multilinear algebra that has the capability to improve efficiency without compromising accuracy, and argues that better algorithms should be evaluated in terms of both accuracy and ef-fiency.
Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing
TLDR
A model-agnostic, cache-friendly model compression approach: Random Operation Access Specific Tile (ROAST) hashing, which introduces global weight sharing, which is empirically and theoretically superior to local weight sharing in HashedNet, and can be of independent interest in itself.
Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity
TLDR
A data-driven approach is taken to show the inter-dependence of data and system heterogeneity in real-world data and its impact on the overall model quality and fairness, and shows that modeling realistic system-induced data heterogeneity is essential to achieving fair federated recommendation learning.
Deep tensor networks with matrix product operators
TLDR
Deep tensor networks are introduced, which are exponentially wide neural networks based on the tensor network representation of the weight matrices and it is shown that the latter generalises well to different input sizes.
Compression of Deep Learning Models for NLP
TLDR
This tutorial will organize related work done by the ‘deep learning for NLP’ community in the past few years and present it as a coherent story to enable their deployment in real industry NLP projects.

References

SHOWING 1-10 OF 55 REFERENCES
Tensorizing Neural Networks
TLDR
This paper converts the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved.
Wide Compression: Tensor Ring Nets
TLDR
This work introduces Tensor Ring Networks (TR-Nets), which significantly compress both the fully connected layers and the convolutional layers of deep neural networks, and shows promise in scientific computing and deep learning, especially for emerging resource-constrained devices such as smartphones, wearables and IoT devices.
West: Word Encoded Sequence Transducers
TLDR
WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units is proposed and it is demonstrated that this transduction can lead to significant compression without compromising performance.
Adaptive Input Representations for Neural Language Modeling
TLDR
Adapt input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity are introduced and a systematic comparison of popular choices for a self-attentional architecture is performed.
Ultimate tensorization: compressing convolutional and FC layers alike
TLDR
This paper combines the proposed approach with the previous work to compress both convolutional and fully-connected layers of a network and achieve 80x network compression rate with 1.1% accuracy drop on the CIFAR-10 dataset.
Tensor-Train Recurrent Neural Networks for Video Classification
TLDR
A new, more general and efficient approach by factorizing the input-to-hidden weight matrix using Tensor-Train decomposition which is trained simultaneously with the weights themselves which provides a novel and fundamental building block for modeling high-dimensional sequential data with RNN architectures.
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
TLDR
GroupReduce is proposed, a novel compression method for neural language models, based on vocabulary-partition based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words).
Using the Output Embedding to Improve Language Models
TLDR
The topmost weight matrix of neural network language models is studied and it is shown that this matrix constitutes a valid word embedding and a new method of regularizing the output embedding is offered.
A Tensorized Transformer for Language Modeling
TLDR
A novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD) with tensor train decomposition is proposed, which can not only largely compress the model parameters but also obtain performance improvements.
Compressing recurrent neural network with tensor train
TLDR
This paper proposes an alternative RNN model to reduce the number of parameters significantly by representing the weight parameters based on Tensor Train (TT) format and implements the TT-format representation for several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU).
...
...