Accelerating Natural Language Understanding in Task-Oriented Dialog

  title={Accelerating Natural Language Understanding in Task-Oriented Dialog},
  author={Ojas Ahuja and Shrey Desai},
Task-oriented dialog models typically leverage complex neural architectures and large-scale, pre-trained Transformers to achieve state-of-the-art performance on popular natural language understanding benchmarks. However, these models frequently have in excess of tens of millions of parameters, making them impossible to deploy on-device where resource-efficiency is a major concern. In this work, we show that a simple convolutional model compressed with structured pruning achieves largely… 

Figures and Tables from this paper

SoDA: On-device Conversational Slot Extraction
We propose a novel on-device neural sequence labeling model which uses embedding-free projections and character information to construct compact word representations to learn a sequence model using a


TinyBERT: Distilling BERT for Natural Language Understanding
A novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models is proposed and, by leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT.
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
This paper proposes to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks, and achieves comparable results with ELMo.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Q8BERT: Quantized 8Bit BERT
This work shows how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by 4x with minimal accuracy loss and the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity
A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding
A novel framework for SLU to better incorporate the intent information, which further guiding the slot filling is proposed, which achieves the state-of-the-art performance and outperforms other previous methods by a large margin.
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.
Self-Governing Neural Networks for On-Device Short Text Classification
The findings show that SGNNs are effective at capturing low-dimensional semantic text representations, while maintaining high accuracy, and extensive evaluation on dialog act classification shows significant improvement over state-of-the-art results.