• Publications
  • Influence
PPDB: The Paraphrase Database
TLDR
The 1.0 release of the paraphrase database, PPDB, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 140million paraphrase patterns, which capture many meaning-preserving syntactic transformations.
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
TLDR
PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0's heuristic rankings.
cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models
The Multilingual Paraphrase Database
TLDR
A massive expansion of the paraphrase database (PPDB) is released that now includes a collection of paraphrases in 23 different languages, derived from large volumes of bilingual parallel data.
Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
TLDR
This work extends bilingual paraphrase extraction to syntactic paraphrases and demonstrates its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization.
Joshua 4.0: Packing, PRO, and Paraphrases
TLDR
The main contributions in this release are the introduction of a compact grammar representation based on packed tries, and the integration of the implementation of pairwise ranking optimization, J-PRO.
Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation
TLDR
Joshua implements all of the algorithms required for translation via synchronous context free grammars: chart-parsing, n-gram language model integration, beam- and cube-pruning, and k-best extraction, and suffix-array grammar extraction and minimum error rate training.
Triplet Lexicon Models for Statistical Machine Translation
TLDR
A lexical trigger model for statistical machine translation using triplets incorporating long-distance dependencies that can go beyond the local context of phrases or n-gram based language models is described.
Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor
TLDR
The main focus is describing Thrax, a flexible, open source synchronous context-free grammar extractor that is built on Apache Hadoop for efficient distributed performance and can easily be extended with support for new grammars, feature functions, and output formats.
Domain-Specific Paraphrase Extraction
TLDR
A novel method is developed for extracting domainspecific paraphrases by adapting the bilingual pivoting paraphrase method to bias the training data to be more like the target domain of biology.
...
1
2
3
...