Data Set Used
We present a relatively large-scale initiative in high-quality MT based on semantic transfer, reviewing the motivation for this approach, general architecture and components involved, and preliminary experience from a first round of system integration (to be accompanied by a hands-on system demonstration, if appropriate). The translation problem is one… (More)
We present a hybrid MT architecture, combining state-of-the-art linguistic processing with advanced stochastic techniques. Grounded in a theoretical reflection on the division of labor between rule-based and probabilistic elements in the MT task, we summarize per-component approaches to ranking, including empirical results when evaluated in isolation.… (More)
In this paper we present a method for greatly reducing parse times in LFG parsing , while at the same time maintaining parse accuracy. We evaluate the methodology on data from English, German and Norwegian and show that the same patterns hold across languages. We achieve a speedup of 67% on the English data and 49% on the German data. On a small amount of… (More)
We present an overview over the Nor-wegian LOGON initiative, an ongoing project to produce high-quality MT from Norwegian to English based on grammar-based analysis, semantic transfer, and full generation in the target language.
This paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (Lexical-Functional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and… (More)
Parallel grammars and parallel treebanks can be a useful method for studying linguistic diversity and commonality. We use this approach to study how arguments to similar predicates are realized across languages. To that end, we formulate formal principles for aligning at phrase and word levels based on translational correspondences at predicate-argument… (More)
In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control… (More)