Corpus ID: 2391673

A Lightweight and High Performance Monolingual Word Aligner

@inproceedings{Yao2013ALA,
  title={A Lightweight and High Performance Monolingual Word Aligner},
  author={Xuchen Yao and Benjamin Van Durme and Chris Callison-Burch and P. Clark},
  booktitle={ACL},
  year={2013}
}
Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while… Expand
Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence
We present a simple, easy-to-replicate monolingual aligner that demonstrates state-of-the-art performance while relying on almost no supervision and a very small number of external resources. BasedExpand
Neural Network Alignment for Sentential Paraphrases
We present a monolingual alignment system for long, sentence- or clause-level alignments, and demonstrate that systems designed for word- or short phrase-based alignment are ill-suited for theseExpand
Semi-Markov Phrase-Based Monolingual Alignment
We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasetsExpand
Neural semi-Markov CRF for Monolingual Word Alignment
TLDR
A novel neural semi-Markov CRF alignment model, which unifies word and phrase alignments through variable-length spans and demonstrates good generalizability to three out-of-domain datasets and shows great utility in two downstream applications: automatic text simplification and sentence pair classification tasks. Expand
Feature-Rich Two-Stage Logistic Regression for Monolingual Alignment
TLDR
A top-performing supervised aligner that operates on short text snippets that employs a large feature set to encode similarities among semantic units in context in context, and address cooperation and competition for alignment among units in the same snippet. Expand
Iterative Paraphrastic Augmentation with Discriminative Span Alignment
TLDR
A novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment that allows for the large-scale expansion of existing datasets or the rapid creation of new datasets using a small, manually produced seed corpus. Expand
An Optimal Quadratic Approach to Monolingual Paraphrase Alignment
TLDR
This work model the problem of monolingual textual alignment as a Quadratic Assignment Problem (QAP) which simultaneously maximizes the global lexicosemantic and syntactic similarities of two sentence-level texts and proposes a branch-and-bound approach to efficiently find an optimal solution. Expand
Features String Similarity Feature Distributional Feature POS TAGs Feature Positional Feature Distortion Feature Contextual Feature Wordnet
Mapping between the source words and the target words in a set of parallel sentences are a crucial part of Question Answering (QA) systems. If an accurate aligner is used in QA systems then theExpand
I do not disagree: leveraging monolingual alignment to detect disagreement in dialogue
TLDR
This work introduces semantic environment features derived by comparing quote and response sentences which align well and shows that this method improves classifier accuracy relative to the baseline method namely in the retrieval of disagreeing pairs, which improves from 69% to 77%. Expand
Computación y Sistemas, Vol. 22, No. 4, 2018
Mapping between the source words and the target words in a set of parallel sentences are a crucial part of Question Answering (QA) systems. If an accurate aligner is used in QA systems then theExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
TLDR
This work examines a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach and proposes a straightforward exact decoding technique based on integer linear programming that yields order- of-magnitude improvements in decoding speed. Expand
Discriminative Word Alignment with Conditional Random Fields
TLDR
A novel approach for inducing word alignments from sentence aligned data using a Conditional Random Field, a discriminative model, which is estimated on a small supervised training set, and which has efficient training and decoding processes which both find globally optimal solutions. Expand
A Phrase-Based Alignment Model for Natural Language Inference
TLDR
The MANLI system is presented, a new NLI aligner designed to address the alignment problem, which uses a phrase-based alignment representation, exploits external lexical resources, and capitalizes on a new set of supervised training data. Expand
Gappy Phrasal Alignment By Agreement
TLDR
A principled and efficient phrase-to-phrase alignment model, useful in machine translation as well as other related natural language processing problems, that shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime. Expand
Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
TLDR
This work constructs a large corpus resource of comparable texts, including an evaluation set with manual predicate alignments, and presents a novel approach for aligning predicates across comparable texts using graph-based clustering with Mincuts. Expand
Constructing Corpora for the Development and Evaluation of Paraphrase Systems
TLDR
A definition of paraphrase based on word alignments is adopted and it is shown that it yields high inter-annotator agreement and an alternative agreement statistic is employed which is appropriate for structured alignment tasks. Expand
Tailoring Word Alignments to Syntactic Machine Translation
TLDR
This work proposes a novel model for unsupervised word alignment which explicitly takes into account target language constituent structure, while retaining the robustness and efficiency of the HMM alignment model. Expand
Time-Efficient Creation of an Accurate Sentence Fusion Corpus
TLDR
This paper presents a methodology for collecting fusions of similar sentence pairs using Amazon's Mechanical Turk, selecting the input pairs in a semi-automated fashion and evaluates the results using a novel technique for automatically selecting a representative sentence from multiple responses. Expand
Learning Alignments and Leveraging Natural Logic
TLDR
An approach to textual inference that improves alignments at both the typed dependency level and at a deeper semantic level is described, and a complementary semantic component based on natural logic shows an added gain of 3.13% accuracy on the RTE3 test set. Expand
Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering
TLDR
This work captures the alignment by using a novel probabilistic model that models tree-edit operations on dependency parse trees and treats alignments as structured latent variables, and offers a principled framework for incorporating complex linguistic features. Expand
...
1
2
3
...