Learn More
We describe an annotation scheme and a tool developed for creating linguistically annotated corpora for non-configurational languages. Since the requirements for such a formalism differ from those posited for configurational languages, several features have been added, influencing the architecture of the scheme. The resulting scheme reflects a(More)
We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs(More)
In this paper, we report on the development of an annotation scheme and annotation tools for unrestricted Ger-man text. Our representation format is based on argument structure, but also permits the extraction of other kinds of representations. We discuss several methodolog-ical issues and the analysis of some phenomena. Additional focus is on the tools(More)
We describe a stochastic approach to partial parsing, i.e., the recognition of syntactic structures of limited depth. The technique utilises Markov Models, but goes beyond usual bracketing approaches , since it is capable of recog-nising not only the boundaries, but also the internal structure and syntactic category of simple as well as complex NP's, PP's,(More)
R ´ esumé-Abstract We report on the syntactic annotation of a German newspaper corpus. The annotations consists of context-free structures, additionally allowing crossing branches, with labeled nodes (phrases) and edges (grammatical functions). Furthermore, we present a new, interactive semi-automatic annotation process that allows efficient and reliable(More)
This paper describes applications of stochastic and symbolic NLP methods to treebank annotation. In particular we focus on (1) the automation of treebank annotation, (2) the comparison of connicting annotations for the same sentence and (3) the automatic detection of inconsistencies. These techniques are currently employed for building a German treebank.
This report describes the progress made towards a declarative, language independent, high-level encoding format for HPSG grammars. The design of the encoding format ensures eecient imple-mentability y et at the same time renders grammars to be highly compact thereby enhancing read-ability. P articular attention has been paid to the expressivityyeeciency(More)
Looking at relative clause extraposition in Ger-man as a concrete example, the paper demonstrates how linguistic model building, corpus study and psycholingui-stic experiments combine into an integrational research programme that aims at an improved understanding and linguistically as well as cognitively adequate modelling of human language performance.(More)