Kilian Evang

Learn More
What would be a good method to provide a large collection of semantically annotated texts with formal, deep semantics rather than shallow? We argue that a bootstrapping approach comprising state-of-the-art NLP tools for parsing and semantic interpretation, in combination with a wiki-like interface for collaborative annotation of experts, and a game with a(More)
Tokenization is widely regarded as a solved problem due to the high accuracy that rulebased tokenizers achieve. But rule-based tokenizers are hard to maintain and their rules language specific. We show that highaccuracy word and sentence segmentation can be achieved by using supervised sequence labeling on the character level combined with unsupervised(More)
Obtaining gold standard data for word sense disambiguation is important but costly. We show how it can be done using a “Game with a Purpose” (GWAP) called Wordrobe. This game consists of a large set of multiple-choice questions on word senses generated from the Groningen Meaning Bank. The players need to answer these questions, scoring points depending on(More)
We use the NLP toolchain that is used to construct the Groningen Meaning Bank to address the task of detecting negation cue and scope, as defined in the shared task “Resolving the Scope and Focus of Negation”. This toolchain applies the C&C tools for parsing, using the formalism of Combinatory Categorial Grammar, and applies Boxer to produce semantic(More)
Data-driven approaches in computational semantics are not common because there are only few semantically annotated resources available. We are building a large corpus of public-domain English texts and annotate them semi-automatically with syntactic structures (derivations in Combinatory Categorial Grammar) and semantic representations (Discourse(More)
This paper provides an overview of the debugging framework Kahina, discussing its architecture as well as its application to debugging in different constraint-based grammar engineering environments. The exposition focuses on and motivates the hybrid nature of the system between source-level debugging by means of a tracer and high-level analysis by means of(More)
We use a Combinatory Categorial Grammar (CCG) parser with a structured perceptron learner to address Shared Task 6 of SemEval-2014, Supervised Semantic Parsing of Robotic Spatial Commands. Our system reaches an accuracy of 79% ignoring spatial context and 87% using the spatial planner, showing that CCG can successfully be applied to the task.
In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG)(More)
Our goal is to provide a web-based platform for the long-term preservation and distribution of a heterogeneous collection of linguistic resources. We discuss the corpus preprocessing and normalisation phase that results in sets of multi-rooted trees. At the same time we transform the original metadata records, just like the corpora annotated using different(More)