Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory

@article{Chen2020FindingFA,
  title={Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory},
  author={Hannah Chen and Yangfeng Ji and David E. Evans},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.01856}
}
Most NLP datasets are manually labeled, so suffer from inconsistent labeling or limited size. We propose methods for automatically improving datasets by viewing them as graphs with expected semantic properties. We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property. We use structural balance theory to identify likely mislabelings in the graph, and flip… Expand

Figures and Tables from this paper

Data Augmentation Methods for Anaphoric Zero Pronouns
TLDR
Five data augmentation methods are used to generate and detect anaphoric zero pronouns automatically and use the augmented data as additional training materials for two anaphic zero pronoun systems for Arabic. Expand

References

SHOWING 1-10 OF 13 REFERENCES
A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase Detection in Short Texts
TLDR
This work presents a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts and shows that it produces a comparable or state-of-the-art performance on all three benchmark datasets. Expand
Detecting Duplicate Questions with Deep Learning
TLDR
It is found that while logistic regression on the pure distance measures produces decent results, feeding a concatenation of different transformations of the output sentence vectors through another set of neural network layers yields significantly improves performance to a level comparable to current state-of-the-art models. Expand
PREFER: Using a Graph-Based Approach to Generate Paraphrases for Language Learning
TLDR
This paper adopts the PageRank algorithm to rank and filter the paraphrases generated by the pivot-based paraphrase generation method and shows that the results show that the method successfully preserves both the semantic meaning and syntactic structure of the query phrase. Expand
Squibs: What Is a Paraphrase?
TLDR
This article lists a set of 25 operations that generate quasi-paraphrases, and provides the distribution of naturally occurring quasi- Paraphrase corpora in English text. Expand
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
TLDR
This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
TLDR
A continual pre-training framework named ERNIE 2.0 which builds and learns incrementally pre- Training corpora tasks through constant multi-task learning is proposed which outperforms BERT and XLNet on 16 tasks. Expand
Exploring Network Structure, Dynamics, and Function using NetworkX
TLDR
Some of the recent work studying synchronization of coupled oscillators is discussed to demonstrate how NetworkX enables research in the field of computational networks. Expand
A note on two problems in connexion with graphs
  • E. Dijkstra
  • Mathematics, Computer Science
  • Numerische Mathematik
  • 1959
TLDR
A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given. Expand
...
1
2
...