Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification

  title={Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification},
  author={Neha Srikanth and Junyi Jessy Li},
Much of modern day text simplification research focuses on sentence-level simplification, transforming original, more complex sentences to simplified versions. However, adding content can often be useful when difficult concepts and reasoning need to be explained. In this work, we present the first data-driven study of content addition in document simplification, which we call elaborative simplification. We introduce a new annotated dataset of 1.3K instances of elaborative simplification and… 

Generating Scientific Definitions with Controllable Complexity

A novel reranking approach is introduced and it is found in human evaluations that it offers superior fluency while also controlling complexity, compared to several controllable generation baselines.

Definition Modelling for Appropriate Specificity

The proposed method addresses the over/under-specificity problems by leveraging a pre-trained encoder-decoder model, namely Text-to-Text Transfer Transformer, and introducing a re-ranking mechanism to model specificity in definitions.

Generating Scientific Definitions with Controllable Complexity

A novel reranking approach is introduced and it is shown in human evaluations that it offers superior fluency while also controlling complexity, compared to several controllable generation base-lines.

Text Simplification with Autoregressive Models

This paper presents the second-place solution on the public leaderboard and the fifth-place Solution based on different usages of RuGPT3 models for Russian, comparable with the novel state-of-the-art approaches.

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification

This work uses a non-autoregressive model to iteratively edit an input sequence and incorporates lexical complexity information seamlessly into the refinement process to generate simplifications that better match the desired output complexity than strong autoregressive baselines.

Paragraph-level Simplification of Medical Texts

A new corpus of parallel texts in English comprising technical andLay summaries of all published evidence pertaining to different clinical topics is introduced and a new metric based on likelihood scores from a masked language model pretrained on scientific texts is proposed, showing that this automated measure better differentiates between technical and lay summaries than existing heuristics.



PoMo: Generating Entity-Specific Post-Modifiers in Context

PoMo, a post-modifier dataset created automatically from news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event, is built.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Problems in Current Text Simplification Research: New Data Can Help

This opinion paper argues that focusing on Wikipedia limits simplification research, and introduces a new simplification dataset that is a significant improvement over Simple Wikipedia, and presents a novel quantitative-comparative approach to study the quality of simplification data resources.

Bleu: a Method for Automatic Evaluation of Machine Translation

This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

Inter-Coder Agreement for Computational Linguistics

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

ToTTo: A Controlled Table-To-Text Generation Dataset

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table

Learning to Update Natural Language Comments Based on Code Changes

This work proposes an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications.

Unsupervised Commonsense Question Answering with Self-Talk

An unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks, inspired by inquiry-based discovery learning, which improves performance on several benchmarks and competes with models that obtain knowledge from external KBs.

Data-Driven Sentence Simplification: Survey and Benchmark

Research on SS is surveyed, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays.