BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

  title={BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle},
  author={Peter West and Ari Holtzman and Jan Buys and Yejin Choi},
The principle of the Information Bottleneck (Tishby et al. 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence that can best predict the next sentence. [] Key Method Our iterative algorithm under the Information Bottleneck…

Figures and Tables from this paper

Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

This work proposes a new state-of-the art for unsupervised sentence summarization according to ROUGE scores, and demonstrates that the commonly reported RouGE F1 metric is sensitive to summary length.

Unsupervised Extractive Summarization using Pointwise Mutual Information

This work proposes new metrics of relevance and redundancy using pointwise mutual information (PMI) between sentences, which can be easily computed by a pre-trained language model and outperforms similarity-based methods on datasets in a range of domains.

Leveraging Information Bottleneck for Scientific Document Summarization

An unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle with two separate steps that can be extended to a multi-view framework by different signals.

Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization

This work proposes a Non-Autoregressive Unsupervised Summarization (NAUS) approach, which does not require parallel data for training, and first performs edit-based search towards a heuristically defined score, and generates a summary as pseudo-groundtruth.

Unsupervised Opinion Summarization as Copycat-Review Generation

A generative model for a review collection is defined which capitalizes on the intuition that when generating a new review given a set of other reviews of a product, the authors should be able to control the “amount of novelty” going into the new review or, equivalently, vary the extent to which it deviates from the input.

The Summary Loop: Learning to Write Abstractive Summaries Without Examples

This work introduces a novel method that encourages the inclusion of key terms from the original document into the summary that attains higher levels of abstraction with copied passages roughly two times shorter than prior work, and learns to compress and merge sentences without supervision.

RepSum: Unsupervised Dialogue Summarization based on Replacement Strategy

The proposed strategy RepSum is applied to generate both extractive and abstractive summary with the guidance of the followed nˆth utterance generation and classification tasks and demonstrates the superiority of the proposed model compared with the state-of-the-art methods.

EASE: Extractive-Abstractive Summarization End-to-End using the Information Bottleneck Principle

EASE is proposed, an Extractive-abstractive framework that generates concise abstractive summaries that can be traced back to an extractive summary and is shown that the generated summaries are better than strong extractive and extractive-ABstractive baselines.

Improving Unsupervised Extractive Summarization with Facet-Aware Modeling

Experimental results show that the novel facet-aware centrality-based ranking model consistently outperforms strong baselines especially in longand multi-document scenarios and even performs comparably to some supervised models.

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

This paper shows that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective, and derives a learning objective that allows direct control of mask sparsity levels through a tunable sparse prior.



Neural Summarization by Extracting Sentences and Words

This work develops a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor that allows for different classes of summarization models which can extract sentences or words.

Simple Unsupervised Summarization by Contextual Matching

An unsupervised method for sentence summarization using only language modeling that employs two language models, one that is generic (i.e. pretrained), and the other that is specific to the target domain by using a product-of-experts criteria.

Unsupervised Sentence Compression using Denoising Auto-Encoders

Although the models are underperform supervised models based on ROUGE scores, their models are competitive with a supervised baseline based on human evaluation for grammatical correctness and retention of meaning.

A Neural Attention Model for Abstractive Sentence Summarization

This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.

Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond

This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time.

Get To The Point: Summarization with Pointer-Generator Networks

A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.

Language as a Latent Variable: Discrete Generative Models for Sentence Compression

This work forms a variational auto-encoder for inference in a deep generative model of text in which the latent representation of a document is itself drawn from a discrete language model distribution and shows that generative formulations of both abstractive and extractive compression yield state-of-the-art results when trained on a large amount of supervised data.

Headline Generation Based on Statistical Translation

This paper presents results on experiments using this approach, in which statistical models of the term selection and term ordering are jointly applied to produce summaries in a style learned from a training corpus.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Deep Recurrent Generative Decoder for Abstractive Text Summarization

A new framework for abstractive text summarization based on a sequence-to-sequence oriented encoder-decoder model equipped with a deep recurrent generative decoder (DRGN) achieves improvements over the state-of-the-art methods.