The Cascade Transformer: an Application for Efficient Answer Sentence Selection

@inproceedings{Soldaini2020TheCT,
  title={The Cascade Transformer: an Application for Efficient Answer Sentence Selection},
  author={Luca Soldaini and Alessandro Moschitti},
  booktitle={ACL},
  year={2020}
}
Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective… Expand
Reranking for Efficient Transformer-based Answer Selection
TLDR
It is shown that standard and efficient neural rerankers can be used to reduce the amount of sentence candidates fed to Transformer models without hurting Accuracy, thus improving efficiency up to four times. Expand
Modeling Context in Answer Sentence Selection Systems on a Latency Budget
TLDR
The best approach, which leverages a multi-way attention architecture to efficiently encode context, improves 6% to 11% over non-contextual state of the art in AS2 with minimal impact on system latency. Expand
A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection
TLDR
This paper argues that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, this model achieves the highest accuracy among the cost-efficient models, with two orders of magnitude fewer parameters than the current state of the art. Expand
Answer Generation for Retrieval-based Question Answering Systems
TLDR
This work proposes to generate answers from a set of AS2 top candidates by training a sequence to sequence transformer model to generate an answer from a candidate set. Expand
Answer Sentence Selection Using Local and Global Context in Transformer Models
TLDR
The results on three different benchmarks show that the combination of the local and global context in a Transformer model significantly improves the accuracy in Answer Sentence Selection. Expand
Pretrained Transformers for Text Ranking: BERT and Beyond
TLDR
This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques. Expand
Pretrained Transformers for Text Ranking: BERT and Beyond
TLDR
This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example, and lays out the foundations of pretrained transformers for text ranking. Expand
Learning to Rank in the Age of Muppets: Effectiveness–Efficiency Tradeoffs in Multi-Stage Ranking
It is well known that rerankers built on pretrained transformer models such as BERT have dramatically improved retrieval effectiveness in many tasks. However, these gains have come at substantialExpand
The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models
TLDR
A design pattern for tackling text ranking problems, dubbed “Expando-Mono-Duo”, that has been empirically validated for a number of ad hoc retrieval tasks in different domains and is open-sourced in the Pyserini IR toolkit and PyGaggle neural reranking library. Expand
Early Exiting BERT for Efficient Document Ranking
TLDR
Early exiting BERT is introduced for document ranking with a slight modification, BERT becomes a model with multiple output paths, and each inference sample can exit early from these paths, so computation can be effectively allocated among samples. Expand
...
1
2
...

References

SHOWING 1-10 OF 60 REFERENCES
TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection
TLDR
The proposed TANDA technique, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks, generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters and makes the adaptation step more robust to noise. Expand
A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection
TLDR
This paper proposes a novel attention mechanism named Dynamic-Clip Attention which is directly integrated into the Compare-Aggregate framework and focuses on filtering out noise in attention matrix, in order to better mine the semantic relevance of word-level vectors. Expand
Sharing Attention Weights for Fast Transformer
TLDR
This paper speed up Transformer via a fast and lightweight attention model and share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner. Expand
A Compare-Aggregate Model with Latent Clustering for Answer Selection
TLDR
A novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing by adopting a pretrained language model and proposing a novel latent clustering method to compute additional information within the target corpus. Expand
Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction
TLDR
This paper proposes Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains and shows that MCAN achieves state-of-the-art performance. Expand
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
TLDR
This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Expand
CEDR: Contextualized Embeddings for Document Ranking
TLDR
This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Integrating Question Classification and Deep Learning for improved Answer Selection
TLDR
The experiments show that Question Classes are a strong signal to Deep Learning models for Answer Selection, and enable the system to outperform the current state of the art in all variations of the experiments except one. Expand
...
1
2
3
4
5
...