Corpus ID: 233307381

Transductive Learning for Abstractive News Summarization

@article{Bravzinskas2021TransductiveLF,
  title={Transductive Learning for Abstractive News Summarization},
  author={Arthur Bravzinskas and Mengwen Liu and Ramesh Nallapati and Sujith Ravi and Markus Dreyer},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.09500}
}
Pre-trained language models have recently advanced abstractive summarization. These models are further fine-tuned on human-written references before summary generation in test time. In this work, we propose the first application of transductive learning to summarization. In this paradigm, a model can learn from the test set’s input before inference. To perform transduction, we propose to utilize input document summarizing sentences to construct references for learning in test time. These… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 39 REFERENCES
Summary Level Training of Sentence Rewriting for Abstractive Summarization
TLDR
A novel training signal is presented that directly maximizes summary-level ROUGE scores through reinforcement learning and incorporates BERT into the model, making good use of its ability on natural language understanding. Expand
Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
TLDR
An accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively to generate a concise overall summary is proposed, which achieves the new state-of-the-art on all metrics on the CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores. Expand
A Deep Reinforced Model for Abstractive Summarization
TLDR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries. Expand
Text Summarization with Pretrained Encoders
TLDR
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two. Expand
Bottom-Up Abstractive Summarization
TLDR
This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary, and shows that this approach improves the ability to compress text, while still generating fluent summaries. Expand
Abstractive Document Summarization with a Graph-Based Attentional Neural Model
TLDR
A novel graph-based attention mechanism in the sequence-to-sequence framework to address the saliency factor of summarization, which has been overlooked by prior works and is competitive with state-of-the-art extractive methods. Expand
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
TLDR
This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Expand
A Neural Attention Model for Abstractive Sentence Summarization
TLDR
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence. Expand
A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss
TLDR
By end-to-end training the model with the inconsistency loss and original losses of extractive and abstractive models, the model achieves state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset in a solid human evaluation. Expand
Get To The Point: Summarization with Pointer-Generator Networks
TLDR
A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Expand
...
1
2
3
4
...