ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

  title={ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences},
  author={Yanjun Gao and Ting-Hao 'Kenneth' Huang and R. Passonneau},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation… 

Figures and Tables from this paper

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

A novel dataset and a new model for this ‘split and rephrase’ task, which contains higher quality training examples than the previous Split and Rephrase corpora, and shows that models trained on BiSECT can perform a wider variety of split operations and improve upon previous state-of-the-art approaches in automatic and human evaluations.

Zero-Shot Opinion Summarization with GPT-3

This paper explores several pipeline methods for applying GPT-3 to summarize a large collection of user reviews in a zero-shot fashion, and evaluates against several new measures targeting faithfulness, factuality, and genericity to contrast these different methods.



MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions

A new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences is compiled that is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure.

Split and Rephrase

A new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences, which could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems.

A Cascade Model for Proposition Extraction in Argumentation

A model to tackle a fundamental but understudied problem in computational argumentation: proposition extraction by handling anaphora resolution, text segmentation, reported speech, questions, imperatives, missing subjects, and revision is presented.

SegBot: A Generic Neural Text Segmentation Model with Pointer Network

This work proposes a generic end-to-end segmentation model called SegBot, which outperforms state-of-the-art models on both topic and EDU segmentation tasks.

Deep Biaffine Attention for Neural Dependency Parsing

This paper uses a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels, and shows which hyperparameter choices had a significant effect on parsing accuracy, allowing it to achieve large gains over other graph-based approach.

Sentence Simplification with Deep Reinforcement Learning

This work addresses the simplification problem with an encoder-decoder model coupled with a deep reinforcement learning framework, and explores the space of possible simplifications while learning to optimize a reward function that encourages outputs which are simple, fluent, and preserve the meaning of the input.

QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

A novel representation of discourse relations as QA pairs is proposed, which in turn allows us to crowd-source wide-coverage data annotated with discourse relations, via an intuitively appealing interface for composing such questions and answers.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Incorporating Copying Mechanism in Sequence-to-Sequence Learning

This paper incorporates copying into neural network-based Seq2Seq learning and proposes a new model called CopyNet with encoder-decoder structure which can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence.

The Penn Discourse Treebank

A preliminary analysis of inter-annotator agreement is presented – both the level of agreement and the types of inter -annotator variation.