• Corpus ID: 9895822

Semi-Markov Phrase-Based Monolingual Alignment

  title={Semi-Markov Phrase-Based Monolingual Alignment},
  author={Xuchen Yao and Benjamin Van Durme and Chris Callison-Burch and Peter Clark},
We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive… 

Figures and Tables from this paper

Neural semi-Markov CRF for Monolingual Word Alignment
A novel neural semi-Markov CRF alignment model, which unifies word and phrase alignments through variable-length spans and demonstrates good generalizability to three out-of-domain datasets and shows great utility in two downstream applications: automatic text simplification and sentence pair classification tasks.
Neural Network Alignment for Sentential Paraphrases
We present a monolingual alignment system for long, sentence- or clause-level alignments, and demonstrate that systems designed for word- or short phrase-based alignment are ill-suited for these
Iterative Paraphrastic Augmentation with Discriminative Span Alignment
A novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment that allows for the large-scale expansion of existing datasets or the rapid creation of new datasets using a small, manually produced seed corpus.
Extracting Lexically Divergent Paraphrases from Twitter
A new model suited to identify paraphrases within the short messages on Twitter, and a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter is presented.
A Joint Model for Answer Sentence Ranking and Answer Extraction
A simple and intuitive joint probabilistic model is proposed that addresses both answer sentence ranking and answer extraction via joint computation but task-specific application of that quantity.
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Methods to marry pre-trained contextualized word embeddings derived from multilingually trained language models but fine-tuning them on parallel text with objectives designed to improve alignment quality are examined, and methods to effectively extract alignments from these fine- tuned models are proposed.
Multi-Structured Models for Transforming and Aligning Text
Yahoo Research New York, NY · Senior Research Scientist Sep 2018—present · Research Scientist Dec 2014—Aug 2018 Transfer learning techniques for hierarchical and few-shot multi-label text
Higher-order Lexical Semantic Models for Non-factoid Answer Reranking
This work introduces a higher-order formalism that allows all these lexical semantic models to chain direct evidence to construct indirect associations between question and answer texts, by casting the task as the traversal of graphs that encode direct term associations.
Hungarian Layer: A Novel Neural Layer for Paraphrase Identification
This paper empower neural architecture with Hungarian algorithm to extract the aligned unmatched parts and applies BiLSTM to parse the input sentences into hidden representations, which outperforms other baselines, substantially and significantly.


A Joint Phrasal and Dependency Model for Paraphrase Alignment
A new model for monolingual alignment is presented in which the score of an alignment decomposes over both the set ofaligned phrases as well as a set of aligned dependency arcs.
Gappy Phrasal Alignment By Agreement
A principled and efficient phrase-to-phrase alignment model, useful in machine translation as well as other related natural language processing problems, that shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.
Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
This work examines a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach and proposes a straightforward exact decoding technique based on integer linear programming that yields order- of-magnitude improvements in decoding speed.
Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
This work applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pairs and automatically determines how to apply these patterns to rewrite new sentences.
A Phrase-Based Alignment Model for Natural Language Inference
The MANLI system is presented, a new NLI aligner designed to address the alignment problem, which uses a phrase-based alignment representation, exploits external lexical resources, and capitalizes on a new set of supervised training data.
Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
This work constructs a large corpus resource of comparable texts, including an evaluation set with manual predicate alignments, and presents a novel approach for aligning predicates across comparable texts using graph-based clustering with Mincuts.
Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering
This work captures the alignment by using a novel probabilistic model that models tree-edit operations on dependency parse trees and treats alignments as structured latent variables, and offers a principled framework for incorporating complex linguistic features.
Constructing Corpora for the Development and Evaluation of Paraphrase Systems
A definition of paraphrase based on word alignments is adopted and it is shown that it yields high inter-annotator agreement, and an alternative agreement statistic is employed which is appropriate for structured alignment tasks.
Learning Alignments and Leveraging Natural Logic
An approach to textual inference that improves alignments at both the typed dependency level and at a deeper semantic level is described, and a complementary semantic component based on natural logic shows an added gain of 3.13% accuracy on the RTE3 test set.
HMM-Based Word Alignment in Statistical Translation
A new model for word alignment in statistical translation using a first-order Hidden Markov model for the word alignment problem as they are used successfully in speech recognition for the time alignment problem.