Learn More
We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature values were combined in a log-linear model to select the highest scoring candidate translation from an n-best list.(More)
In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed(More)
We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition , we consider the problem of boot-strapping parsers(More)
Statistical machine translation (SMT) models need large bilingual corpora for training , which are unavailable for some language pairs. This paper provides the first serious experimental study of active learning for SMT. We use active learning to improve the quality of a phrase-based SMT system, and show significant improvements in translation compared to a(More)
This paper describes the application of discrim-inative reranking techniques to the problem of machine translation. For each sentence in the source language, we obtain from a baseline statistical machine translation system, a ranked Ò-best list of candidate translations in the target language. We introduce two novel perceptron-inspired reranking algorithms(More)
This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively retrained on each other's output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser's output(More)
Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with(More)
In recent evaluations of machine translation systems, statistical systems have out-performed classical approaches based on interpretation, transfer, and generation. Nonetheless, the output of statistical systems often contains obvious grammatical errors. This can be attributed to the fact that the syntactic well-formedness is only influenced by local n-gram(More)