• Publications
  • Influence
Announcing Prague Czech-English Dependency Treebank 2.0
A high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference are given. Expand
TectoMT: Modular NLP Framework
TectoMT allows for fast and efficient development of NLP applications by exploiting a wide range of software modules already integrated in TECToMT, such as tools for sentence segmentation, tokenization, morphological analysis, POS tagging, shallow and deep syntax parsing, named entity recognition, anaphora resolution, tree-to-tree translation, natural language generation, word-level alignment of parallel corpora, and other tasks. Expand
Towards a Slovene Dependency Treebank
The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words. Our approach to annotation is based on the Prague Dependency Treebank,Expand
TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer
We present a new English→Czech machine translation system combining linguistically motivated layers of language description (as defined in the Prague Dependency Treebank annotation scenario) withExpand
Universal Dependencies 2.1
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets. Expand
Named Entities in Czech: Annotating Data and Developing NE Tagger
A two-level NE classification is introduced for manual annotation of two thousand sentences in Czech, and a software system aimed at automatic detection and classification of NEs in Czech texts is developed. Expand
The Joy of Parallelism with CzEng 1.0
Key properties of the released resource including the distribution of text domains, the corpus data formats, and a toolkit to handle the provided rich annotation are described, including the procedure of the rich annotation (incl. co-reference resolution) and of the automatic filtering. Expand
Czech Named Entity Corpus and SVM-based Recognizer
A recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used, which outperforms the results previously reported for NE recognition in Czech. Expand
Valency Information in VALLEX 2.0: Logical Structure of the Lexicon
The primary goal of the following text is to briefly describe the content of VALLEX 2.0 data from a structural point of view. Expand
KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer
We present KLcpos3 , a language similarity measure based on Kullback-Leibler divergence of coarse part-of-speech tag trigram distributions in tagged corpora. It has been designed for multilingualExpand