Learn More
Preordering of source side sentences has proved to be useful in improving statistical machine translation. Most work has used a parser in the source language along with rules to map the source language word order into the target language word order. The requirement to have a source language parser is a major drawback, which we seek to overcome in this(More)
We present a new model called LATTICERNN, which generalizes recurrent neural networks (RNNs) to process weighted lattices as input, instead of sequences. A LATTICERNN can encode the complete structure of a lattice into a dense representation , which makes it suitable to a variety of problems, including rescoring, classifying, parsing, or translating(More)
Recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition. Prior works have shown that these models obtain lower perplexity and word error rate (WER) compared to both standard n-gram language models (LMs) and more advanced language models including maximum entropy and random(More)
For low resource languages, collecting sufficient training data to build acoustic and language models is time consuming and often expensive. But large amounts of text data, such as on-line newspapers, web forums or online encyclopedias, usually exist for languages that have a large population of native speakers. This text data can be easily collected from(More)
For resource rich languages, recent works have shown Neu-ral Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition, out performing standard n-gram language models (LMs). For low resource languages, however, the performance of NNLMs has not been well explored. In this paper, we evaluate the effectiveness(More)
We demonstrate that statistical machine translation (SMT) can be improved substantially by imposing clause-based reordering constraints during decoding. Our analysis of clause-wise translation of different types of clauses shows that it is beneficial to apply these constraints for finite clauses, but not for non-finite clauses. In our experiments in(More)
In particular for “low resource” Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of(More)
Answer extraction from discussion boards is an extensively studied problem. Most of the existing work is focused on supervised methods for extracting answers using similarity features and forum-specific features. Although this works well for the domain or forum data that it has been trained on, it is difficult to use the same models for a domain where the(More)
The phrase based systems for machine translation are limited by the phrases that they see during the training. For highly inflected languages, it is uncommon to see all the forms of a word in the parallel corpora used during training. This problem is amplified for verbs in highly inflected languages where the correct form of the word depends on factors like(More)
Phrase-based machine translation like other data driven approaches, are often plagued by irregularities in the translations of words in morphologically rich languages. The phrase-pairs and the language models are unable to capture the long range dependencies which decide the inflection. This paper makes the first attempt at learning constraints between the(More)
  • 1