Data Set Used
The contents of the Prague Dependency Treebank (recently released by the Linguistic Data Consortium in its version 1.0) is described, from morphology to surface syntax to the deep (underlying) syntax layers of annotation. For each layer, the basic assumptions are given, followed by a more detailed description of the annotation scheme. Annotation software… (More)
The valency theory as a part of the theory of Functional Generative Description () of language meaning has been around for some time (). However, it is for the first time that a large-scale corpus (the Prague Dependency Treebank (PDT, ) has been fully annotated with valency information based on this theory, i.e., with fully referenced valency… (More)
We present a new English→Czech machine translation system combining linguistically motivated layers of language description (as defined in the Prague Dependency Treebank annotation scenario) with statistical NLP approaches .
The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words. Our approach to annotation is based on the Prague Dependency Treebank, which serves as an excellent model due to the similarity of the languages, the existence of a detailed annotation guide and an annotation editor. The initial… (More)
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives… (More)
This paper presents a system for querying treebanks. The system consists of a powerful query language with natural support for cross-layer queries, a client interface with a graphical query builder and visual-izer of the results, a command-line client interface, and two substitutable query engines: a very efficient engine using a re-lational database… (More)
This paper presents recent advances in an established treebank annotation framework comprising of an abstract XML-based data format, fully customizable editor of tree-based annotations, a toolkit for all kinds of automated data processing with support for cluster computing, and a work-in-progress database-driven search engine with a graphical user interface… (More)
Jarmile Panevové a prof. Petru Sgallovi.
In the first part of this technical report we describe our approach to design a new data format, based on XML (Extensible Markup Language) and aimed to provide a better and unifying alternative to various legacy data formats used in various areas of corpus linguistics and specifically in the field of structured annotation. We introduce the first version of… (More)