• Corpus ID: 9895822

Semi-Markov Phrase-Based Monolingual Alignment

  title={Semi-Markov Phrase-Based Monolingual Alignment},
  author={Xuchen Yao and Benjamin Van Durme and Chris Callison-Burch and Peter Clark},
We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive… 

Figures and Tables from this paper

Neural semi-Markov CRF for Monolingual Word Alignment

A novel neural semi-Markov CRF alignment model, which unifies word and phrase alignments through variable-length spans and demonstrates good generalizability to three out-of-domain datasets and shows great utility in two downstream applications: automatic text simplification and sentence pair classification tasks.

Neural Network Alignment for Sentential Paraphrases

We present a monolingual alignment system for long, sentence- or clause-level alignments, and demonstrate that systems designed for word- or short phrase-based alignment are ill-suited for these

Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence

We present a simple, easy-to-replicate monolingual aligner that demonstrates state-of-the-art performance while relying on almost no supervision and a very small number of external resources. Based

Monolingual Phrase Alignment Based on Word Embedding

We present a word embedding-based monolingual phrase aligner. In monolingual phrase alignment, an aligner identifies the set of phrasal paraphrases in a sentence pair. Previous methods required

Feature-Rich Two-Stage Logistic Regression for Monolingual Alignment

A top-performing supervised aligner that operates on short text snippets that employs a large feature set to encode similarities among semantic units in context in context, and address cooperation and competition for alignment among units in the same snippet.

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

A novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment that allows for the large-scale expansion of existing datasets or the rapid creation of new datasets using a small, manually produced seed corpus.

SAPPHIRE: Simple Aligner for Phrasal Paraphrase with Hierarchical Representation

Experimental results using the standard dataset for phrase alignment evaluation show that SAPPHIRE outperforms the previous method and establishes the state-of-the-art performance.

SPADE: Evaluation Dataset for Monolingual Phrase Alignment

This is the first dataset to shed lights on syntactic and phrasal paraphrases under linguistically motivated grammar and Benchmarks to show performances of humans and the state-of-the-art method are presented as a reference for future SPADE users.

Extracting Lexically Divergent Paraphrases from Twitter

A new model suited to identify paraphrases within the short messages on Twitter, and a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter is presented.

A Joint Model for Answer Sentence Ranking and Answer Extraction

A simple and intuitive joint probabilistic model is proposed that addresses both answer sentence ranking and answer extraction via joint computation but task-specific application of that quantity.



A Joint Phrasal and Dependency Model for Paraphrase Alignment

A new model for monolingual alignment is presented in which the score of an alignment decomposes over both the set ofaligned phrases as well as a set of aligned dependency arcs.

A Phrase-Based Hidden Semi-Markov Approach to Machine Translation

A latent variable phrase-based translation model inspired by the hidden semi-Markov models, that does not degrade the system and is observed that both Baum-Welch and Viterbi trainings obtain the very same result, suggesting that most of the probability mass is gathered into one single bilingual segmentation.

Gappy Phrasal Alignment By Agreement

A principled and efficient phrase-to-phrase alignment model, useful in machine translation as well as other related natural language processing problems, that shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

A Lightweight and High Performance Monolingual Word Aligner

A discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences to give state-of-the-art result.

Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

This work examines a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach and proposes a straightforward exact decoding technique based on integer linear programming that yields order- of-magnitude improvements in decoding speed.

Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment

This work applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pairs and automatically determines how to apply these patterns to rewrite new sentences.

A Phrase-Based Alignment Model for Natural Language Inference

The MANLI system is presented, a new NLI aligner designed to address the alignment problem, which uses a phrase-based alignment representation, exploits external lexical resources, and capitalizes on a new set of supervised training data.

Discriminative Word Alignment with Conditional Random Fields

A novel approach for inducing word alignments from sentence aligned data using a Conditional Random Field, a discriminative model, which is estimated on a small supervised training set, and which has efficient training and decoding processes which both find globally optimal solutions.

What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA

A probabilistic quasi-synchronous grammar, inspired by one proposed for machine translation, and parameterized by mixtures of a robust nonlexical syntax/alignment model with a(n optional) lexical-semantics-driven log-linear model is proposed.

Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering

This work captures the alignment by using a novel probabilistic model that models tree-edit operations on dependency parse trees and treats alignments as structured latent variables, and offers a principled framework for incorporating complex linguistic features.