ConveRT: Efficient and Accurate Conversational Representations from Transformers

@article{Henderson2020ConveRTEA,
  title={ConveRT: Efficient and Accurate Conversational Representations from Transformers},
  author={Matthew Henderson and I{\~n}igo Casanueva and Nikola Mrkvsi'c and Pei-hao Su and Tsung-Hsien and Ivan Vulic},
  journal={ArXiv},
  year={2020},
  volume={abs/1911.03688}
}
General-purpose pretrained sentence encoders such as BERT are not ideal for real-world conversational AI applications; they are computationally heavy, slow, and expensive to train. We propose ConveRT (Conversational Representations from Transformers), a pretraining framework for conversational tasks satisfying all the following requirements: it is effective, affordable, and quick to train. We pretrain using a retrieval-based response selection task, effectively leveraging quantization and… 

Figures and Tables from this paper

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
TLDR
This work demonstrates that full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data, and validate the robustness and versatility of the ConvFiT framework with such similarity-based inference on the standard ID evaluation sets.
Efficient Intent Detection with Dual Sentence Encoders
TLDR
The usefulness and wide applicability of the proposed intent detectors are demonstrated, showing that they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets.
Building an Efficient and Effective Retrieval-based Dialogue System via Mutual Learning
TLDR
A fast bi-encoder is employed to replace the traditional feature-based pre-retrieval model and set the response reranking model as a more complicated architecture (such as cross-encoders) to combine the best of both worlds to build a retrieval system.
Sentence encoding for Dialogue Act classification
In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and
Distilling Knowledge for Fast Retrieval-based Chat-bots
TLDR
This paper proposes a new cross-encoders architecture and transfer knowledge from this model to a bi-encoder model using distillation, which effectively boosts bi- encoder performance at no cost during inference time.
DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances
TLDR
To efficiently capture the discourse-level coherence among utterances, DialogBERT is presented, a novel conversational response generation model that enhances previous PLM-based dialogue models and employs a hierarchical Transformer architecture.
LexFit: Lexical Fine-Tuning of Pretrained Language Models
TLDR
It is shown that it is possible to expose and enrich lexical knowledge from the LMs, and to specialize them to serve as effective and universal “decontextualized” word encoders even when fed input words “in isolation” (i.e., without any context).
DS-TOD: Efficient Domain Specialization for Task-Oriented Dialog
TLDR
This work investigates the effects of domain specialization of pretrained language models (PLMs) for task-oriented dialog and proposes a resource-efficient and modular domain specialization by means of domain adapters – additional parameter-light layers in which to encode the domain knowledge.
Structural Pre-training for Dialogue Comprehension
TLDR
SPIDER, Structural PretraIned DialoguE Reader, is presented, to capture dialogue exclusive features and proposes two training objectives in addition to the original LM objectives, which regularizes the model to improve the factual correctness of summarized subject-verb-object triplets.
GenSF: Simultaneous Adaptation of Generative Pre-trained Models and Slot Filling
TLDR
GENSF (Generative Slot Filling), which leverages a generative pre-trained open-domain dialog model for slot filling, achieves state-of-the-art results on two slot filling datasets with strong gains in few-shot and zero-shot settings.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 96 REFERENCES
Efficient Intent Detection with Dual Sentence Encoders
TLDR
The usefulness and wide applicability of the proposed intent detectors are demonstrated, showing that they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
DIET: Lightweight Language Understanding for Dialogue Systems
Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like
A Neural Conversational Model
TLDR
A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.
Comparison of Transfer-Learning Approaches for Response Selection in Multi-Turn Conversations
TLDR
This paper compares three transfer-learning approaches to response selection in dialogs, as part of the Dialog System Technology Challenge 7 (DSTC7) Track 1, and shows that BERT performed best, followed by the GPT model and then the MTEE model.
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Universal Sentence Encoder
TLDR
It is found that transfer learning using sentence embeddings tends to outperform word level transfer with surprisingly good performance with minimal amounts of supervised training data for a transfer task.
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
TLDR
This paper proposes to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks, and achieves comparable results with ELMo.
Extreme Language Model Compression with Optimal Subwords and Shared Projections
TLDR
This work introduces a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions and employs a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary.
Q8BERT: Quantized 8Bit BERT
TLDR
This work shows how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by 4x with minimal accuracy loss and the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.
...
1
2
3
4
5
...