PoMo: Generating Entity-Specific Post-Modifiers in Context

  title={PoMo: Generating Entity-Specific Post-Modifiers in Context},
  author={Jun Seok Kang and IV RobertL.Logan and Zewei Chu and Yang Chen and Dheeru Dua and Kevin Gimpel and Sameer Singh and Niranjan Balasubramanian},
We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, “Barack Obama, _______, supported the #MeToo movement.”, the phrase “a father of two girls” is a contextually relevant post-modifier. To this end, we build PoMo, a post-modifier dataset created automatically… 

The ApposCorpus: a new multilingual, multi-domain dataset for factual appositive generation

An extensive analysis of the data and the task are carried out, pointing to the various modeling challenges it poses, and the results obtained with standard language generation methods show that the task is indeed non-trivial, and leaves plenty of room for improvement.

FRUIT: Faithfully Reflecting Updated Information in Text

The novel generation task of *faithfully reflecting updated information in text* (FRUIT) where the goal is to update an existing article given new evidence, and shows that developing models that can update articles faithfully requires new capabilities for neural generation models, and opens doors to many new applications.

WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections

Qualitative analysis shows that the best approaches can generate fluent and high quality texts but they struggle with coherence and factuality, showing the potential for the WIKITABLET dataset to inspire future work on long-form generation.

IGA: An Intent-Guided Authoring Assistant

An interactive writing assistant that generates and rephrases text according to fine-grained author specifications and fine-tune a language model on a dataset heuristically-labeled with author intent is built.

Characterizing Collective Attention via Descriptor Context: A Case Study of Public Discussions of Crisis Events

A large-scale analysis of public online discussions of breaking news events on Facebook and Twitter finds that the use of contextual descriptors is indeed associated with proxies for social and informational expectations, including macro-level factors like the location's global salience and micro- level factors like audience engagement.

Generating Wikipedia Article Sections from Diverse Data Sources

This work creates a large-scale dataset, WIKITABLET, that pairs Wikipedia sections with their corresponding tabular data and various metadata and shows that the best approaches can generate fluent and high quality texts but they sometimes struggle with coherence.

Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification

This work introduces a new annotated dataset of 1.3K instances of elaborative simplification and analyzes how entities, ideas, and concepts are elaborated through the lens of contextual specificity, and establishes baselines for elaboration generation using large scale pre-trained language models.

Characterizing Collective Attention via Descriptor Context in Public Discussions of Crisis Events

A large-scale language analysis of public online discussions of breaking crisis events on Facebook and Twitter finds that authors' references to locations are influenced by both macro-level factors such as the location's global importance and micro-level social factors like audience characteristics, and there is a decrease in descriptor context use over time.



A Global Model for Concept-to-Text Generation

A joint model that captures content selection and surface realization in an unsupervised domain-independent fashion is presented and an algorithm for decoding is described that allows to intersect the grammar with additional information capturing fluency and syntactic well-formedness constraints.

Reference-Aware Language Models

Experiments on three representative applications show the coreference model variants outperform models based on deterministic attention and standard language modeling baselines.

What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

An end-to-end, domain-independent neural encoder-aligner-decoder model for selective generation, i.e., the joint task of content selection and surface realization, achieves the best selection and generation results reported to-date on the benchmark WeatherGov dataset, despite using no specialized features or linguistic resources.

Table-to-text Generation by Structure-aware Seq2seq Learning

The attention visualizations and case studies show that the novel structure-aware seq2seq architecture which consists of field-gating encoder and description generator with dual attention is capable of generating coherent and informative descriptions based on the comprehensive understanding of both the content and the structure of a table.

Collective Content Selection for Concept-to-Text Generation

This work presents an efficient method for automatically learning content selection rules from a corpus and its related database and treats content selection as a collective classification problem, thus allowing it to capture contextual dependencies between input items.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Challenges in Data-to-Document Generation

A new, large-scale corpus of data records paired with descriptive documents is introduced, a series of extractive evaluation methods for analyzing performance are proposed, and baseline results are obtained using current neural generation methods.

Neural Text Generation from Structured Data with Application to the Biography Domain

A neural model for concept-to-text generation that scales to large, rich domains and significantly out-performs a classical Kneser-Ney language model adapted to this task by nearly 15 BLEU is introduced.

Building applied natural language generation systems

An overview of Natural Language Generation from an applied system-building perspective, with the emphasis on established techniques that can be used to build simple but practical working systems now.

A Neural Knowledge Language Model

A Neural Knowledge Language Model (NKLM) which combines symbolic knowledge provided by a knowledge graph with the RNN language model, and shows that the NKLM significantly improves the perplexity while generating a much smaller number of unknown words.