ConveRT: Efficient and Accurate Conversational Representations from Transformers

@article{Henderson2020ConveRTEA,
  title={ConveRT: Efficient and Accurate Conversational Representations from Transformers},
  author={Matthew Henderson and I{\~n}igo Casanueva and Nikola Mrkvsi'c and Pei-hao Su and Tsung-Hsien and Ivan Vulic},
  journal={ArXiv},
  year={2020},
  volume={abs/1911.03688}
}
General-purpose pretrained sentence encoders such as BERT are not ideal for real-world conversational AI applications; they are computationally heavy, slow, and expensive to train. We propose ConveRT (Conversational Representations from Transformers), a pretraining framework for conversational tasks satisfying all the following requirements: it is effective, affordable, and quick to train. We pretrain using a retrieval-based response selection task, effectively leveraging quantization and… 

Figures and Tables from this paper

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
TLDR
This work demonstrates that full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data, and validate the robustness and versatility of the ConvFiT framework with such similarity-based inference on the standard ID evaluation sets.
ConVEx: Data-Efficient and Few-Shot Slot Labeling
TLDR
ConVEx’s reduced pretraining times and cost, along with its efficient fine-tuning and strong performance, promise wider portability and scalability for data-efficient sequence-labeling tasks in general.
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations
TLDR
Span-ConveRT, a light-weight model for dialog slot-filling which frames the task as a turn-based span extraction task, is introduced and it is shown that leveraging conversational knowledge coded in large pretrained conversational models such as ConveRT is especially useful for few-shot learning scenarios.
Efficient Intent Detection with Dual Sentence Encoders
TLDR
The usefulness and wide applicability of the proposed intent detectors are demonstrated, showing that they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets.
Example-Driven Intent Prediction with Observers
TLDR
This paper focuses on the intent classification problem which aims to identify user intents given utterances addressed to the dialog system, and proposes two approaches for improving the generalizability of utterance classification models: observers and example-driven training.
Building an Efficient and Effective Retrieval-based Dialogue System via Mutual Learning
TLDR
A fast bi-encoder is employed to replace the traditional feature-based pre-retrieval model and set the response reranking model as a more complicated architecture (such as cross-encoders) to combine the best of both worlds to build a retrieval system.
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models
TLDR
A novel loss function, CORAL, is proposed that directly optimizes recently proposed estimates of human preference for generated responses and can train dialog generation models without assuming non-existence of response other than the ground-truth.
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
TLDR
Results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system are presented.
Sentence encoding for Dialogue Act classification
In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and
Distilling Knowledge for Fast Retrieval-based Chat-bots
TLDR
This paper proposes a new cross-encoders architecture and transfer knowledge from this model to a bi-encoder model using distillation, which effectively boosts bi- encoder performance at no cost during inference time.
...
...

References

SHOWING 1-10 OF 95 REFERENCES
Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring
TLDR
This work develops a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features and achieves state-of-the-art results on three existing tasks.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations
TLDR
Span-ConveRT, a light-weight model for dialog slot-filling which frames the task as a turn-based span extraction task, is introduced and it is shown that leveraging conversational knowledge coded in large pretrained conversational models such as ConveRT is especially useful for few-shot learning scenarios.
Efficient Intent Detection with Dual Sentence Encoders
TLDR
The usefulness and wide applicability of the proposed intent detectors are demonstrated, showing that they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets.
DIET: Lightweight Language Understanding for Dialogue Systems
Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like
A Neural Conversational Model
TLDR
A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Comparison of Transfer-Learning Approaches for Response Selection in Multi-Turn Conversations
TLDR
This paper compares three transfer-learning approaches to response selection in dialogs, as part of the Dialog System Technology Challenge 7 (DSTC7) Track 1, and shows that BERT performed best, followed by the GPT model and then the MTEE model.
Universal Sentence Encoder
TLDR
It is found that transfer learning using sentence embeddings tends to outperform word level transfer with surprisingly good performance with minimal amounts of supervised training data for a transfer task.
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
TLDR
This paper proposes to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks, and achieves comparable results with ELMo.
...
...