The Cascade Transformer: an Application for Efficient Answer Sentence Selection

  title={The Cascade Transformer: an Application for Efficient Answer Sentence Selection},
  author={Luca Soldaini and Alessandro Moschitti},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective… 

Figures and Tables from this paper

Reranking for Efficient Transformer-based Answer Selection

It is shown that standard and efficient neural rerankers can be used to reduce the amount of sentence candidates fed to Transformer models without hurting Accuracy, thus improving efficiency up to four times.

Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems

The proposed Multiple Heads Student architecture is an efficient neural network designed to distill an ensemble of large transformers into a single smaller model, rivaling the state-of-the-art large AS2 models that have 2 .

Modeling Context in Answer Sentence Selection Systems on a Latency Budget

The best approach, which leverages a multi-way attention architecture to efficiently encode context, improves 6% to 11% over non-contextual state of the art in AS2 with minimal impact on system latency.

A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection

This paper argues that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, this model achieves the highest accuracy among the cost-efficient models, with two orders of magnitude fewer parameters than the current state of the art.

Answer Generation for Retrieval-based Question Answering Systems

This work proposes to generate answers from a set of AS2 top candidates by training a sequence to sequence transformer model to generate an answer from a candidate set.

Answer Sentence Selection Using Local and Global Context in Transformer Models

The results on three different benchmarks show that the combination of the local and global context in a Transformer model significantly improves the accuracy in Answer Sentence Selection.

Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection

This paper proposes three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets.

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

This paper shows that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks, and proposes a new pre-training objective that models the paragraph-level semantics across multiple input sentences.

Pretrained Transformers for Text Ranking: BERT and Beyond

This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques.

Pretrained Transformers for Text Ranking: BERT and Beyond

This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example, and lays out the foundations of pretrained transformers for text ranking.



TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection

The approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving the impressive MAP scores and confirms the positive impact of TandA in an industrial setting, using domain specific datasets subject to different types of noise.

A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection

This paper proposes a novel attention mechanism named Dynamic-Clip Attention which is directly integrated into the Compare-Aggregate framework and focuses on filtering out noise in attention matrix, in order to better mine the semantic relevance of word-level vectors.

Sharing Attention Weights for Fast Transformer

This paper speed up Transformer via a fast and lightweight attention model and share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner.

A Compare-Aggregate Model with Latent Clustering for Answer Selection

A novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing by adopting a pretrained language model and proposing a novel latent clustering method to compute additional information within the target corpus.

Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction

This paper proposes Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains and shows that MCAN achieves state-of-the-art performance.

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.

CEDR: Contextualized Embeddings for Document Ranking

This work investigates how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking and proposes a joint approach that incorporates BERT's classification vector into existing neural models and shows that it outperforms state-of-the-art ad-Hoc ranking baselines.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Integrating Question Classification and Deep Learning for improved Answer Selection

The experiments show that Question Classes are a strong signal to Deep Learning models for Answer Selection, and enable the system to outperform the current state of the art in all variations of the experiments except one.