Learn More
The contents of the Prague Dependency Treebank (recently released by the Linguistic Data Consortium in its version 1.0) is described, from morphology to surface syntax to the deep (underlying) syntax layers of annotation. For each layer, the basic assumptions are given, followed by a more detailed description of the annotation scheme. Annotation software(More)
  • Jan Hajič, Jarmila Panevová, Zdeňka Urešová, Alevtina Bémová, Veronika Kolářová, Petr Pajas
  • 2003
The valency theory as a part of the theory of Functional Generative Description ([16]) of language meaning has been around for some time ([14]). However, it is for the first time that a large-scale corpus (the Prague Dependency Treebank (PDT, [4]) has been fully annotated with valency information based on this theory, i.e., with fully referenced valency(More)
The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words. Our approach to annotation is based on the Prague Dependency Treebank, which serves as an excellent model due to the similarity of the languages, the existence of a detailed annotation guide and an annotation editor. The initial(More)
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives(More)
This paper presents recent advances in an established treebank annotation framework comprising of an abstract XML-based data format, fully customizable editor of tree-based annotations, a toolkit for all kinds of automated data processing with support for cluster computing, and a work-in-progress database-driven search engine with a graphical user interface(More)
In the first part of this technical report we describe our approach to design a new data format, based on XML (Extensible Markup Language) and aimed to provide a better and unifying alternative to various legacy data formats used in various areas of corpus linguistics and specifically in the field of structured annotation. We introduce the first version of(More)