Statistical Model Compression for Small-Footprint Natural Language Understanding

@article{Strimel2018StatisticalMC,
  title={Statistical Model Compression for Small-Footprint Natural Language Understanding},
  author={Grant P. Strimel and Kanthashree Mysore Sathyendra and Stanislav Peshterliev},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.07520}
}
In this paper we investigate statistical model compression applied to natural language understanding (NLU) models. Small-footprint NLU models are important for enabling offline systems on hardware restricted devices, and for decreasing on-demand model loading latency in cloud-based systems. To compress NLU models, we present two main techniques, parameter quantization and perfect feature hashing. These techniques are complementary to existing model pruning strategies such as L1 regularization… 

Tables from this paper

SmallER: Scaling Neural Entity Resolution for Edge Devices

TLDR
This paper introduces SmallER, a scalable neural entity resolution system capable of running directly on edge devices and uses compressed tries to reduce the space required to store catalogs and a novel implementation of spatial partitioning trees to strike a balance between reducing runtime latency and preserving recall relative to full catalog search.

Fast Intent Classification for Spoken Language Understanding

TLDR
The experiments show that the BranchyNet scheme provides gains in terms of computational complexity without compromising model accuracy, and analytical studies regarding the improvements in the computational cost, distribution of utterances that egress from various exit points and the impact of adding more complexity to models with the BranchYNet scheme are conducted.

Learning a Neural Diff for Speech Models

TLDR
This work presents neural update approaches for release of subsequent speech model generations abiding by a data budget, and details two architecture-agnostic methods which learn compact representations for transmission to devices.

Fast Intent Classification for Spoken Language Understanding Systems

TLDR
The experiments show that the BranchyNet scheme provides gains in terms of computational complexity without compromising model accuracy, and analytical studies regarding the improvements in the computational cost, distribution of utterances that egress from various exit points and the impact of adding more complexity to models with the BranchYNet scheme are conducted.

F10-SGD: Fast Training of Elastic-net Linear Models for Text Classification and Named-entity Recognition

TLDR
F10-SGD is developed, a fast optimizer for text classification and NER elastic-net linear models that provides 4x reduction in training time compared to the OWL-QN optimizer without loss of accuracy or increase in model size.

Privacy accounting and quality control in the sage differentially private ML platform

TLDR
Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models, builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff.

Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform

TLDR
S Sage is presented, the first ML platform that enforces a global differential privacy guarantee across all models produced from a sensitive data stream and a novel iterative training process that trains a model on increasing amounts of data from a stream until, with high probability, the model meets developer-configured quality criteria.

Interaction vocale de commande de visualisation médicale

Dans cette demonstration, nous montrons un cas pratique d'utilisation de la reconnaissance vocale dans une salle de chirurgie sterilisee. En raison des contraintes medicales d'hygiene, tous les

References

SHOWING 1-10 OF 29 REFERENCES

Model Compression Applied to Small-Footprint Keyword Spotting

TLDR
Two ways to improve deep neural network acoustic models for keyword spotting without increasing CPU usage by using low-rank weight matrices throughout the DNN and knowledge distilled from an ensemble of much larger DNNs used only during training are investigated.

FastText.zip: Compressing text classification models

TLDR
This work proposes a method built upon product quantization to store the word embeddings, which produces a text classifier, derived from the fastText approach, which at test time requires only a fraction of the memory compared to the original one, without noticeably sacrificing the quality in terms of classification accuracy.

Personalized speech recognition on mobile devices

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5

Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting

TLDR
This paper proposes to apply singular value decomposition (SVD) to further reduce TDNN complexity, and results show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network baseline.

Randomized Language Models via Perfect Hash Functions

We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters. The

Small Statistical Models by Random Feature Mixing

The application of statistical NLP systems to resource constrained devices is limited by the need to maintain parameters for a large number of features and an alphabet mapping features to parameters.

A Maximum Entropy Approach to Natural Language Processing

TLDR
A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.

Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding

TLDR
The design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) is presented, which was the first Spoken Language Understanding Software Development Kit (SDK) for a virtual digital assistant, as far as the authors are aware.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TLDR
This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

Accurate and compact large vocabulary speech recognition on mobile devices

TLDR
An accurate, smallfootprint, large vocabulary speech recognizer for mobile devices and an accurate and compact system that runs well below real-time on a Nexus 4 Android phone is described.