Sina Zarrieß

Learn More
Based on a study of verb translations in the Europarl corpus, we argue that a wide range of MWE patterns can be identified in translations that exhibit a correspondence between a single lexical item in the source language and a group of lexical items in the target language. We show that these correspondences can be reliably detected on dependency-parsed,(More)
We propose a technique to generate nonprojective word orders in an efficient statistical linearization system. Our approach predicts liftings of edges in an unordered syntactic tree by means of a classifier, and uses a projective algorithm for tree linearization. We obtain statistically significant improvements on six typologically different languages:(More)
A common use of language is to refer to visually present objects. Modelling it in computers requires modelling the link between language and perception. The “words as classifiers” model of grounded semantics views words as classifiers of perceptual contexts, and composes the meaning of a phrase through composition of the denotations of its component words.(More)
We suggest a generation task that integrates discourse-level referring expression generation and sentence-level surface realization. We present a data set of German articles annotated with deep syntax and referents, including some types of implicit referents. Our experiments compare several architectures varying the order of a set of trainable modules. The(More)
In this paper, we report on the design of a part-of-speech-tagset for Wolof and on the creation of a semi-automatically annotated gold standard. The main motivation for this resource is to obtain data for training automatic taggers with machine learning approaches. Hence, we take machine learning considerations into account during tagset design and present(More)
We introduce here a participating system of the CoNLL-2013 Shared Task “Grammatical Error Correction”. We focused on the noun number and article error categories and constructed a supervised learning system for solving these tasks. We carried out feature engineering and we found that (among others) the f-structure of an LFG parser can provide very(More)
We compare the impact of sentenceinternal vs. sentence-external features on word order prediction in two generation settings: starting out from a discriminative surface realisation ranking model for an LFG grammar of German, we enrich the feature set with lexical chain features from the discourse context which can be robustly detected and reflect rough(More)
Large scale deep grammars can achieve high coverage of corpus data, yet cannot produce full-fledged solutions for each sentence. In this paper, we present a dependency-based sentence simplification approach to obtain full parses of simplified sentences that failed to have a complete analysis in their original form. In order to remove the erroneous parts(More)
PentoRef is a corpus of task-oriented dialogues collected in systematically manipulated settings. The corpus is multilingual, with English and German sections, and overall comprises more than 20000 utterances. The dialogues are fully transcribed and annotated with referring expressions mapped to objects in corresponding visual scenes, which makes the corpus(More)