• Publications
  • Influence
Hierarchical Phrase-Based Translation
We present a statistical machine translation model that uses hierarchical phrasesphrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from aExpand
A Hierarchical Phrase-Based Model for Statistical Machine Translation
The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information, which can be seen as a shift to the formal machinery of syntax-based translation systems without any linguistic commitment. Expand
Online Large-Margin Training of Syntactic and Structural Translation Features
This work explores the use of the MIRA algorithm of Crammer et al. as an alternative to MERT and shows that by parallel processing and exploiting more of the parse forest, it can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. Expand
Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar
This work describes the induction of a probabilistic LTAG model from the Penn Treebank and finds that this induction method is an improvement over the EM-based method of (Hwa, 1998), and that the induced model yields results comparable to lexicalized PCFG. Expand
Decoding with Large-Scale Neural Language Models Improves Translation
This work develops a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and incorporates it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Expand
Better k-best Parsing
It is shown how the improved output of the efficient algorithms for k-best trees in the framework of hypergraph parsing has the potential to improve results from parse reranking systems and other applications. Expand
DyNet: The Dynamic Neural Network Toolkit
DyNet is a toolkit for implementing neural network models based on dynamic declaration of network structure that has an optimized C++ backend and lightweight graph representation and is designed to allow users to implement their models in a way that is idiomatic in their preferred programming language. Expand
Hope and Fear for Discriminative Training of Statistical Translation Models
This work compares several learning algorithms and describes in detail some novel extensions suited to properties of the translation task: no single correct output, a large space of structured outputs, and slow inference. Expand
11,001 New Features for Statistical Machine Translation
On a large-scale Chinese-English translation task, the Margin Infused Relaxed Algorithm is used to add a large number of new features to two machine translation systems: the Hiero hierarchical phrase-based translation system and the syntax-basedtranslation system. Expand
Word Sense Disambiguation Improves Statistical Machine Translation
It is shown for the first time that integrating a WSD system improves the performance of a state-of-the-art statistical MT system on an actual translation task, and the improvement is statistically significant. Expand