ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

  title={ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences},
  author={Yanjun Gao and Ting-Hao 'Kenneth' Huang and R. Passonneau},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation… 

Figures and Tables from this paper

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

A novel dataset and a new model for this ‘split and rephrase’ task, which contains higher quality training examples than the previous Split and Rephrase corpora, and shows that models trained on BiSECT can perform a wider variety of split operations and improve upon previous state-of-the-art approaches in automatic and human evaluations.

Target-Level Sentence Simplification as Controlled Paraphrasing

This work investigates a novel formulation of sentence simplification as paraphrasing with controlled decoding, which aims to alleviate the major burden of relying on large amounts of in-domain parallel training data, while at the same time allowing for modular and adaptive simplification.

Zero-Shot Opinion Summarization with GPT-3

This paper explores several pipeline methods for applying GPT-3 to summarize a large collection of user reviews in a zero-shot fashion, and evaluates against several new measures targeting faithfulness, factuality, and genericity to contrast these different methods.



Learning Clause Representation from Dependency-Anchor Graph for Connective Prediction

A novel clause embedding method that applies graph learning to a data structure the authors refer to as a dependency-anchor graph, which demonstrates a significant improvement over tree-based models, confirming the importance of emphasizing the subject and verb phrase.

MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions

A new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences is compiled that is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure.

Split and Rephrase

A new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences, which could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems.

A Cascade Model for Proposition Extraction in Argumentation

A model to tackle a fundamental but understudied problem in computational argumentation: proposition extraction by handling anaphora resolution, text segmentation, reported speech, questions, imperatives, missing subjects, and revision is presented.

SegBot: A Generic Neural Text Segmentation Model with Pointer Network

This work proposes a generic end-to-end segmentation model called SegBot, which outperforms state-of-the-art models on both topic and EDU segmentation tasks.

Deep Biaffine Attention for Neural Dependency Parsing

This paper uses a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels, and shows which hyperparameter choices had a significant effect on parsing accuracy, allowing it to achieve large gains over other graph-based approach.

Sentence Simplification with Deep Reinforcement Learning

This work addresses the simplification problem with an encoder-decoder model coupled with a deep reinforcement learning framework, and explores the space of possible simplifications while learning to optimize a reward function that encourages outputs which are simple, fluent, and preserve the meaning of the input.

QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

A novel representation of discourse relations as QA pairs is proposed, which in turn allows us to crowd-source wide-coverage data annotated with discourse relations, via an intuitively appealing interface for composing such questions and answers.

Split and Rephrase: Better Evaluation and Stronger Baselines

A new train-development-test data split and neural models augmented with a copy-mechanism are presented, outperforming the best reported baseline by 8.68 BLEU and fostering further progress on the task.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.