Discourse Level Factors for Sentence Deletion in Text Simplification

@inproceedings{Zhong2020DiscourseLF,
  title={Discourse Level Factors for Sentence Deletion in Text Simplification},
  author={Y. Zhong and Chao Jiang and Wei Xu and Junyi Jessy Li},
  booktitle={AAAI},
  year={2020}
}
This paper presents a data-driven study focusing on analyzing and predicting sentence deletion — a prevalent but understudied phenomenon in document simplification — on a large English text simplification corpus. We inspect various document and discourse factors associated with sentence deletion, using a new manually annotated sentence alignment corpus we collected. We reveal that professional editors utilize different strategies to meet readability standards of elementary and middle schools… 
On the Helpfulness of Document Context to Sentence Simplification
TLDR
This paper is the first to investigate the helpfulness of document context on sentence simplification and apply it to the sequence-to-sequence model and proposes a new model that makes full use of the context information.
Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification
TLDR
This work introduces a new annotated dataset of 1.3K instances of elaborative simplification and analyzes how entities, ideas, and concepts are elaborated through the lens of contextual specificity, and establishes baselines for elaboration generation using large scale pre-trained language models.
Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks
TLDR
This work proposes the new and domain-independent NLG task of structuring and ordering a (possibly large) set of EDUs, and presents a solution for this task that combines neural dependency tree induction with pointer networks, and can be trained on large discourse treebanks that have only recently become available.
Controllable Text Simplification with Explicit Paraphrasing
TLDR
A novel hybrid approach is proposed that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles and establishes a new state-of-the-art for the task.
Paragraph-level Simplification of Medical Texts
TLDR
A new corpus of parallel texts in English comprising technical andLay summaries of all published evidence pertaining to different clinical topics is introduced and a new metric based on likelihood scores from a masked language model pretrained on scientific texts is proposed, showing that this automated measure better differentiates between technical and lay summaries than existing heuristics.
Contextualized Embeddings for Connective Disambiguation in Shallow Discourse Parsing
TLDR
A novel model is studied that integrates contextualized word embeddings and predicts whether a connective candidate is part of a discourse relation or not and shows the benefit of training the tasks of connective disambiguation and sense classification together at the same time.
Multitask Models for Controlling the Complexity of Neural Machine Translation
TLDR
A novel dataset of news articles available in English and Spanish and written for diverse reading grade levels is collected to train multitask sequence to sequence models that translate Spanish into English targeted at an easier reading grade level than the original Spanish.
Personalized Education in the Artificial Intelligence Era: What to Expect Next
TLDR
The challenges of AI/ML-based personalized education are investigated, while providing a brief review of state-of-the-art research, and potential solutions are discussed.
Predicting Sentence Deletions for Text Simplification Using a Functional Discourse Structure
TLDR
This work focuses on sentence deletions for text simplification and uses a news genre-specific functional discourse structure, which categorizes sentences based on their contents and their function roles in telling a news story, for predicting sentence deletion.
The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error
TLDR
It is found that non-news datasets are slightly easier to transfer to than news datasets when the training and test sets are very different, and a statistic from the theoretical domain adaptation literature which can be directly tied to error-gap is proposed.
...
...

References

SHOWING 1-10 OF 53 REFERENCES
Text Simplification from Professionally Produced Corpora
TLDR
This work investigates the application of the recently created Newsela corpus, the largest collection of professionally written simplifications available, in TS tasks, and shows that the corpus can be used to learn sentence simplification patterns in more effective ways than corpora used in previous work.
Learning When to Simplify Sentences for Natural Text Simplification
TLDR
A binary classifier is applied to decide in which circumstances a sentence should or not be split – which is the most important syntactic simplification operation – so that the resulting simplified text is natural and not over simplified.
Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification
TLDR
This study addresses the automatic simplification of texts in Spanish in order to make them more accessible to people with cognitive disabilities by identifying and quantify relevant operations to be implemented in a text simplification system.
Neural CRF Model for Sentence Alignment in Text Simplification
TLDR
A novel neural CRF alignment model is proposed which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity.
Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints
TLDR
A discriminative model for single-document summarization that integrally combines compression and anaphoricity constraints that outperforms prior work on both ROUGE as well as on human judgments of linguistic quality.
A Monolingual Tree-based Translation Model for Sentence Simplification
TLDR
A Tree-based Simplification Model (TSM) is proposed, which, to the knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally.
Preserving Discourse Structure when Simplifying Text
TLDR
This paper presents and evaluates techniques for detecting and correcting disruptions in discourse structure caused by syntactic restructuring and looks at the issues of preserving the rhetorical relationships between the original clauses and phrases and preserving the anaphoric link structure of the text.
CATS: A Tool for Customized Alignment of Text Simplification Corpora
TLDR
This paper presents a freely available, language-independent tool for sentence alignment from parallel/comparable TS resources (document-aligned resources), which additionally offers the possibility for filtering sentences depending on the level of their semantic overlap.
Reducing Text Complexity through Automatic Lexical Simplification: an Empirical Study for Spanish
TLDR
The word length and frequency distribution of two sets of texts that make up a parallel corpus are observed, and a lexical simplification module of an automatic simplification system for Spanish is developed, intended for readers with cognitive disabilities.
EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
TLDR
This work presents the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach, and is judged by humans to produce overall better and simpler output sentences.
...
...