Mixture Content Selection for Diverse Sequence Generation

@article{Cho2019MixtureCS,
  title={Mixture Content Selection for Diverse Sequence Generation},
  author={Jaemin Cho and Minjoon Seo and Hannaneh Hajishirzi},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.01953}
}
Generating diverse sequences is important in many NLP applications such as question generation or summarization that exhibit semantically one-to-many relationships between source and the target sequences. [] Key Method The diversification stage uses a mixture of experts to sample different binary masks on the source sequence for diverse content selection. The generation stage uses a standard encoder-decoder model given each selected content from the source sequence.

Figures and Tables from this paper

Diversify Question Generation with Continuous Content Selectors and Question Type Modeling

This paper relates contextual focuses with content selectors, which are modeled by a continuous latent variable with the technique of conditional variational auto-encoder (CVAE) and achieves a better trade-off between generation quality and diversity compared with existing approaches.

A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

Experiments on summarization and question generation demonstrate that Composition Sampling is currently the best available decoding strategy for generating diverse meaningful outputs.

Exploring Explainable Selection to Control Abstractive Generation

This paper targets using a select and generate paradigm to enhance the capability of selecting explainable contents and then guiding to control the abstract generation, and proposes a newly designed pair-wise extractor to capture the sentence pair interactions and their centrality.

Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors

A novel latent structured variable model to generate high quality texts by en-riching contextual representation learning of encoder-decoder models is presented and an variational inference approach to approximate the posterior distribution of random context variables is proposed.

SA-HAVE: A Self-Attention based Hierarchical VAEs Network for Abstractive Summarization

A Self-Attention based word embedding and Hierarchical Variational AutoEncoders (SA-HVAE) model that first introduces self-attention into LSTM to alleviate information decay of encoding, and accomplish summarization with deep structure information inference through hierarchical VAEs.

Focus Attention: Promoting Faithfulness and Diversity in Summarization

Focus Attention Mechanism is introduced, a simple yet effective method to encourage decoders to proactively generate tokens that are similar or topical to the input document, and a Focus Sampling method is proposed to enable generation of diverse summaries, an area currently understudied in summarization.

TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge

This model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs, and can learn what and how to generate.

A Survey of Knowledge-enhanced Text Generation

A comprehensive review of the research on knowledge-enhanced text generation over the past five years is presented, which includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data.

Focused Questions and Answer Generation by Key Content Selection

This paper proposes a method of automatically generating answers and diversified sequences corresponding to those answers by introducing a new module called the "Focus Generator", which guides the decoder in an existing "encoder-decoder" model to generate questions based on selected focus contents.

Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy

This paper proposes a universal SiMT model with Mixture- of-Experts Wait-k Policy to achieve the best translation quality under arbitrary latency with only one trained model, and outperforms all the strong baselines under different latency, including the state-of-the-art adaptive policy.

References

SHOWING 1-10 OF 60 REFERENCES

Mixture Models for Diverse Machine Translation: Tricks of the Trade

It is found that disabling dropout noise in responsibility computation is critical to successful training and certain types of mixture models are more robust and offer the best trade-off between translation quality and diversity compared to variational models and diverse decoding approaches.

Get To The Point: Summarization with Pointer-Generator Networks

A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.

Guiding Generation for Abstractive Text Summarization Based on Key Information Guide Network

A guiding generation model that combines the extractive method and the abstractive method, and introduces a Key Information Guide Network (KIGN), which encodes the keywords to the key information representation, to guide the process of generation.

Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

This work proposes an objective that transfers supervision from neighboring examples, and develops a method to evaluate using standard task-specific metrics and measures of output diversity, finding consistent improvements over standard maximum likelihood training and other baselines.

Diverse Beam Search for Improved Description of Complex Scenes

Diverse Beam Search is proposed, a diversity promoting alternative to BS for approximate inference that produces sequences that are significantly different from each other by incorporating diversity constraints within groups of candidate sequences during decoding; moreover, it achieves this with minimal computational or memory overhead.

Bottom-Up Abstractive Summarization

This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary, and shows that this approach improves the ability to compress text, while still generating fluent summaries.

Sequence to Sequence Mixture Model for Diverse Machine Translation

A novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model is developed.

Analyzing Uncertainty in Neural Machine Translation

This study proposes tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations and shows that search works remarkably well but that models tend to spread too much probability mass over the hypothesis space.

Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

This work addresses the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture by formulating this task as a multiple-output structured-output prediction problem with a loss-function that effectively captures the setup of the problem.

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

This work presents a novel framework based on conditional variational autoencoders that capture the discourse-level diversity in the encoder and uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders.
...