Learn More
We present a hybrid MT architecture, combining state-of-the-art linguistic processing with advanced stochastic techniques. Grounded in a theoretical reflection on the division of labor between rule-based and probabilistic elements in the MT task, we summarize per-component approaches to ranking, including empirical results when evaluated in isolation.(More)
We present a relatively large-scale initiative in high-quality MT based on semantic transfer, reviewing the motivation for this approach, general architecture and components involved, and preliminary experience from a first round of system integration (to be accompanied by a hands-on system demonstration, if appropriate). The translation problem is one(More)
In this paper we present a method for greatly reducing parse times in LFG parsing, while at the same time maintaining parse accuracy. We evaluate the methodology on data from English, German and Norwegian and show that the same patterns hold across languages. We achieve a speedup of 67% on the English data and 49% on the German data. On a small amount of(More)
In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control(More)
We extend discriminant-based disambiguation techniques to LFG grammars. We present the design and implementation of lexical, morphological, c-structure and f-structure discriminants for an LFG-based parser. Chief considerations in the computation of discriminants are capturing all distinctions between analyses and relating linguistic properties to words in(More)
This paper briefly describes the current state of the evolving INESS infrastructure in Norway which is developing treebanks as well as making treebanks more accessible to the R&D community. Recent work includes the hosting of more treebanks, including parallel treebanks, and increasing the number of parsed and disambiguated sentences in the Norwegian LFG(More)
This paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (LexicalFunctional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and(More)
The TREPIL project (Norwegian treebank pilot project 2004-2008) is aimed at developing and testing methods for the construction of a Norwegian parsed corpus. Annotation of c-structures, f-structures and mrs-structures is based on automatic parsing with human validation and disambiguation. Parsing is done with a large LFG grammar and the XLE parser. We(More)