Wojciech Skut

Learn More
We describe an annotation scheme and a tool developed for creating linguistically annotated corpora for non-configurational languages. Since the requirements for such a formalism differ from those posited for configurational languages, several features have been added, influencing the architecture of the scheme. The resulting scheme reflects a(More)
We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs(More)
In this paper, we report on the development of an annotation scheme and annotation tools for unrestricted German text. Our representation format is based on argument structure, but also permits the extraction of other kinds of representations. We discuss several methodological issues and the analysis of some phenomena. Additional focus is on the tools(More)
Data-oriented and corpus-based methods have become one of the most important areas of applied as well as theoretical NLP. Currently, the methods prevailingly belong to the supervised learning paradigm, i.e., they require as training material large corpora annotated with linguistic information. Since the preparation of such corpora usually involves manual(More)
We describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs(More)
Looking at relative clause extraposition in German as a concrete example, the paper demonstrates how linguistic model building, corpus study and psycholinguistic experiments combine into an integrational research programme that aims at an improved understanding and linguistically as well as cognitively adequate modelling of human language performance.(More)
Thorsten Brants and Wojciech Skut Universit at des Saarlandes Computational Linguistics D-66041 Saarbr ucken, Germany fbrants,skutg@coli.uni-sb.de Abstract This paper describes applications of stochastic and symbolic NLP methods to treebank annotation. In particular we focus on (1) the automation of treebank annotation, (2) the comparison of con icting(More)