Learn More
We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training,(More)
We present an extension of phrase-based statistical machine translation models that enables the straightforward integration of additional annotation at the word-level — may it be linguistic markup or automatically generated word classes. In a number of experiments we show that factored translation models lead to better translation performance, both in terms(More)
The 2006 Language Engineering Workshop Open Source Toolkit for Statistical Machine Translation had the objective to advance the current state-of-the-art in statistical machine translation through richer input and richer annotation of the training data. The workshop focused on three topics: factored translation models, confusion network decoding, and the(More)
In this paper we provide the largest published comparison of translation quality for phrase-based SMT and neural machine translation across 30 translation directions. For ten directions we also include hierarchical phrase-based MT. Experiments are performed for the recently published United Nations Parallel Corpus v1.0 and its large six-way sentence-aligned(More)
The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize , the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrase-based(More)
Despite many differences between phrase-based, hierarchical , and syntax-based translation models, their training and testing pipelines are strikingly similar. Drawing on this fact, we extend the Moses toolkit to implement hierarchical and syntactic models, making it the first open source toolkit with end-to-end support for all three of these popular models(More)
Translation systems are complex, and most metrics do little to pinpoint causes of error or isolate system differences. We use a simple technique to discover induction errors, which occur when good translations are absent from model search spaces. Our results show that a common pruning heuristic drastically increases induction error, and also strongly(More)
Environmental exposures affect gamete function and fertility, but the mechanisms are poorly understood. Here, we show that pheromones sensed by ciliated neurons in the Caenorhabditis elegans nose alter the lipid microenvironment within the oviduct, thereby affecting sperm motility. In favorable environments, pheromone-responsive sensory neurons secrete a(More)