Learn More
We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature values were combined in a log-linear model to select the highest scoring candidate translation from an n-best list.(More)
Acknowledgments I owe my thanks to a number of people, each of whom contributed in their own way towards this research and in the preparation of this document. First of all, I thank Prof. Aravind Joshi for his continued support during the period of this research. I have beneeted signiicantly from his deep insights and his passion for subtle details which(More)
We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition , we consider the problem of boot-strapping parsers(More)
In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed(More)
This paper describes the application of discrim-inative reranking techniques to the problem of machine translation. For each sentence in the source language, we obtain from a baseline statistical machine translation system, a ranked Ò-best list of candidate translations in the target language. We introduce two novel perceptron-inspired reranking algorithms(More)
This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively retrained on each other's output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser's output(More)
Statistical machine translation (SMT) models need large bilingual corpora for training , which are unavailable for some language pairs. This paper provides the first serious experimental study of active learning for SMT. We use active learning to improve the quality of a phrase-based SMT system, and show significant improvements in translation compared to a(More)
Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with(More)