Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
PPDB: The Paraphrase Database
The 1.0 release of the paraphrase database, PPDB, contains over 220 million paraphrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 140million paraphrase patterns, which capture many meaning-preserving syntactic transformations.
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
- Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, Chris Callison-Burch
- Computer ScienceACL
- 1 July 2015
PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0's heuristic rankings.
cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models…
The Multilingual Paraphrase Database
A massive expansion of the paraphrase database (PPDB) is released that now includes a collection of paraphrases in 23 different languages, derived from large volumes of bilingual parallel data.
Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
- Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles, Benjamin Van Durme
- Computer ScienceEMNLP
- 27 July 2011
This work extends bilingual paraphrase extraction to syntactic paraphrases and demonstrates its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization.
Joshua 4.0: Packing, PRO, and Paraphrases
- Juri Ganitkevitch, Yuan Cao, J. Weese, Matt Post, Chris Callison-Burch
- Computer ScienceWMT@NAACL-HLT
- 7 June 2012
The main contributions in this release are the introduction of a compact grammar representation based on packed tries, and the integration of the implementation of pairwise ranking optimization, J-PRO.
Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation
Joshua implements all of the algorithms required for translation via synchronous context free grammars: chart-parsing, n-gram language model integration, beam- and cube-pruning, and k-best extraction, and suffix-array grammar extraction and minimum error rate training.
Triplet Lexicon Models for Statistical Machine Translation
A lexical trigger model for statistical machine translation using triplets incorporating long-distance dependencies that can go beyond the local context of phrases or n-gram based language models is described.
Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor
- J. Weese, Juri Ganitkevitch, Chris Callison-Burch, Matt Post, Adam Lopez
- Computer ScienceWMT@EMNLP
- 30 July 2011
The main focus is describing Thrax, a flexible, open source synchronous context-free grammar extractor that is built on Apache Hadoop for efficient distributed performance and can easily be extended with support for new grammars, feature functions, and output formats.
Domain-Specific Paraphrase Extraction
- Ellie Pavlick, Juri Ganitkevitch, Tsz Ping Chan, Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch
- Computer ScienceACL
- 1 July 2015
A novel method is developed for extracting domainspecific paraphrases by adapting the bilingual pivoting paraphrase method to bias the training data to be more like the target domain of biology.