Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification

@inproceedings{Srikanth2021ElaborativeSC,
  title={Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification},
  author={Neha Srikanth and Junyi Jessy Li},
  booktitle={FINDINGS},
  year={2021}
}
Much of modern day text simplification research focuses on sentence-level simplification, transforming original, more complex sentences to simplified versions. However, adding content can often be useful when difficult concepts and reasoning need to be explained. In this work, we present the first data-driven study of content addition in document simplification, which we call elaborative simplification. We introduce a new annotated dataset of 1.3K instances of elaborative simplification and… 
Generating Scientific Definitions with Controllable Complexity
TLDR
A novel reranking approach is introduced and it is found in human evaluations that it offers superior fluency while also controlling complexity, compared to several controllable generation baselines.
Definition Modelling for Appropriate Specificity
TLDR
The proposed method addresses the over/under-specificity problems by leveraging a pre-trained encoder-decoder model, namely Text-to-Text Transfer Transformer, and introducing a re-ranking mechanism to model specificity in definitions.
Generating Scientific Definitions with Controllable Complexity
TLDR
A novel reranking approach is introduced and it is shown in human evaluations that it offers superior fluency while also controlling complexity, compared to several controllable generation base-lines.
Text Simplification with Autoregressive Models
TLDR
This paper presents the second-place solution on the public leaderboard and the fifth-place Solution based on different usages of RuGPT3 models for Russian, comparable with the novel state-of-the-art approaches.
A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification
TLDR
This work uses a non-autoregressive model to iteratively edit an input sequence and incorporates lexical complexity information seamlessly into the refinement process to generate simplifications that better match the desired output complexity than strong autoregressive baselines.
Paragraph-level Simplification of Medical Texts
TLDR
A new corpus of parallel texts in English comprising technical andLay summaries of all published evidence pertaining to different clinical topics is introduced and a new metric based on likelihood scores from a masked language model pretrained on scientific texts is proposed, showing that this automated measure better differentiates between technical and lay summaries than existing heuristics.

References

SHOWING 1-10 OF 61 REFERENCES
PoMo: Generating Entity-Specific Post-Modifiers in Context
TLDR
PoMo, a post-modifier dataset created automatically from news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event, is built.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Problems in Current Text Simplification Research: New Data Can Help
TLDR
This opinion paper argues that focusing on Wikipedia limits simplification research, and introduces a new simplification dataset that is a significant improvement over Simple Wikipedia, and presents a novel quantitative-comparative approach to study the quality of simplification data resources.
Bleu: a Method for Automatic Evaluation of Machine Translation
TLDR
This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Inter-Coder Agreement for Computational Linguistics
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
ToTTo: A Controlled Table-To-Text Generation Dataset
We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table
Learning to Update Natural Language Comments Based on Code Changes
TLDR
This work proposes an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications.
Unsupervised Commonsense Question Answering with Self-Talk
TLDR
An unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks, inspired by inquiry-based discovery learning, which improves performance on several benchmarks and competes with models that obtain knowledge from external KBs.
Data-Driven Sentence Simplification: Survey and Benchmark
TLDR
Research on SS is surveyed, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays.
...
...