Learn More
We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature values were combined in a log-linear model to select the highest scoring candidate translation from an n-best list.(More)
In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed(More)
Statistical machine translation (SMT) models need large bilingual corpora for training, which are unavailable for some language pairs. This paper provides the first serious experimental study of active learning for SMT. We use active learning to improve the quality of a phrase-based SMT system, and show significant improvements in translation compared to a(More)
In recent evaluations of machine translation systems, statistical systems have outperformed classical approaches based on interpretation, transfer, and generation. Nonetheless, the output of statistical systems often contains obvious grammatical errors. This can be attributed to the fact that the syntactic well-formedness is only influenced by local n-gram(More)
We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of bootstrapping parsers(More)
This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively re-trained on each other’s output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser’s output(More)
Fei Xia Department of Computer and Information Science University of Pennsylvania 3401 Walnut Street, Suite 400A Philadelphia PA 19104, USA fxia@linc.cis.upenn.edu Abstract In this paper, we report our work on extracting lexicalized tree adjoining grammars (LTAGs) from partially bracketed corpora. The algorithm rst fully brackets the corpora, then extracts(More)