• Publications
  • Influence
Discosuite - A parser test suite for German discontinuous structures
A test suite for testing the performance of dependency and constituency parsers on non-projective dependencies and discontinuous constituents for German, based on the newly released TIGER treebank version 2.2.2, which includes a linguistic analysis of the phenomena that cause discontinuity in the TIGer annotation.
PLCFRS Parsing Revisited: Restricting the Fan-Out to Two
This paper presents a parser for binary PLCFRS of fan-out two, together with a novel monotonous estimate for A parsing, and conducts experiments on modified versions of the German NeGra treebank and the Discontinuous Penn Treebank in which all trees have block degree two.
Synchronous Linear Context-Free Rewriting Systems for Machine Translation
This work presents a hierarchical aligner in form of a deduction system and finds that by restrictingk to 2 on both sides, 100% of the data can be covered.
Hierarchical Machine Translation With Discontinuous Phrases
A hierarchical statistical machine translation system which supports discontinuous constituents based on synchronous linear context-free rewriting systems and demonstrates the feasibility of training and decoding with more expressive translation models such as SLCFRS and shows a modest improvement over a context- free baseline.
German and English Treebanks and Lexica for Tree-Adjoining Grammars
A treebank and lexicon for German and English, developed for PLTAG parsing, which include the NP annotation by Vadas and Curran, and include the prediction lexicon necessary for PL TAG.
On Complex Word Alignment Configurations
This work investigates instances of complex alignment configurations in data sets of four different language pairs to shed more light on the nature and cause of those configurations, finding that only a small fraction of the complex configurations are due to real annotation errors.
Enriching Phrase-Based Statistical Machine Translation with POS Information
This work presents an extension to phrasebased statistical machine translation models which incorporates linguistic knowledge, namely part-of-speech information. Scores are added to the standard