Jan Stepánek

Learn More
For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syntactic and semantic dependencies in multiple languages.(More)
We present the Prague Dependency Treebank 2.5, the newest version of PDT and the first to be released under a free license. We show the benefits of PDT 2.5 in comparison to other state-of-the-art treebanks. We present the new features of the 2.5 release, how they were obtained and how reliably they are annotated. We also show how they can be used in queries(More)
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives(More)
We propose HamleDT – HArmonized Multi-LanguagE Dependency Treebank. HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. While the license terms prevent us from directly redistributing the corpora, most of them are easily acquirable for(More)
Phylogenetic relationships based on the chloroplast genome of Taraxacum were studied. Representative samples of 44 sections or species groups and a number of isolated species were analyzed. On the basis of the sequence variation in psbA– trnH and in trnL–trnF, mutations associated with RFLPs were monitored. Five RFLPs without homoplasy were recognized and(More)
Vegetative regeneration of individual genotypes of Asian Reynoutria taxa, which are invasive in the Czech Republic, was studied in R. sachalinensis (five genotypes), R. japonica (a single genotype present in the country), and their hybrid R. ×bohemica (nine genotypes). Identity of genotypes was confirmed by isozyme analysis. Ten rhizome segments of each(More)
In the first part of this technical report we describe our approach to design a new data format, based on XML (Extensible Markup Language) and aimed to provide a better and unifying alternative to various legacy data formats used in various areas of corpus linguistics and specifically in the field of structured annotation. We introduce the first version of(More)
Various methods and tools used for the post-annotation checking of Prague Dependency Treebank 2.0 data are being described in this article. The annotation process of the treebank was complicated by several factors: for example, the corpus was divided into several layers that must reflect each other. Moreover, the annotation rules changed and evolved during(More)
The genus Reynoutria is represented by four taxa in the Czech Republic – R. japonica var. japonica and compacta, R. sachalinensis and R. × bohemica. Using isoenzyme analysis, we determined the degree of genotype variability in all taxa and compared clones of R. japonica var. japonica from the Czech Republic with those from Great Britain. While the rarely(More)