Bayesian approaches have been shown to reduce the amount of overfitting that occurs when running the EM algorithm, by placing prior probabilities on the model parameters. We apply one such Bayesian technique, vari-ational Bayes, to the IBM models of word alignment for statistical machine translation. We show that using variational Bayes improves the… (More)
This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are " hallucinated " for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating… (More)
We present our work on semi-supervised learning of discrim-inative models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select… (More)
Bayesian approaches have been shown to reduce the amount of overfitting that occurs when running the EM algorithm, by placing prior probabilities on the model parameters. We apply one such Bayesian technique, variational Bayes, to GIZA++, a widely-used piece of software that computes word alignments for statistical machine translation. We show that using… (More)
Discriminative language modeling is a structured classification problem. Log-linear models have been previously used to address this problem. In this paper, the standard dot-product feature representation used in log-linear models is replaced by a non-linear function parameterized by a neural network. Embeddings are learned for each word and features are… (More)
The perceptron algorithm was used in  to estimate discrim-inative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In… (More)
4 Conversation-oriented semi-supervised features 33 4.