Learn More
The Prague Czech-English Dependency Treebank (PCEDT) is a syntactically annotated Czech-English parallel corpus. The Penn Treebank has been translated to Czech, and its annotation automatically transformed into dependency annotation scheme. The dependency annotation of Czech is done from plain text by automatic procedures. A small subset of corresponding(More)
Evaluating Optical Music Recognition (OMR) is notoriously difficult and automated end-to-end OMR evaluation metrics are not available to guide development. In “Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images”, Byrd and Simonsen recently stress that a benchmarking standard is needed in the OMR community, both(More)
Optical Music Recognition (OMR) has long been without an adequate dataset and ground truth for evaluating OMR systems, which has been a major problem for establishing a state of the art in the field. Furthermore, machine learning methods require training data. We analyze how the OMR processing pipeline can be expressed in terms of gradually more complex(More)
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets(More)
This paper describes the key role of a stochastic morphological tagger in an MT system between very closely related languages. The MT system Česílko exploits the close relatedness of both natural languages in question (Czech and Slovak), which allows substantial simplification of the translation method used. It also uses to a great advantage the(More)
This paper presents a process for leveraging structural relationships and reusable phrases when translating large-scale ontologies. Digital libraries are becoming more and more prevalent. An important step in providing universal access to such material is to provide multi-lingual access to the underlying principles of organization via ontologies, thesauri,(More)
We describe results of investigation of a specific type of discontinuous constructions, namely non-projective constructions concerning verbs and their arguments. This topic is especially important for languages with a relatively free word order, such as Czech, which is the language we have primarily worked with. For comparison, we have included some results(More)
In this paper we present UIMA – the Unstructured Information Management Architecture, an architecture and software framework for creating, discovering, composing and deploying a broad range of multi-modal analysis capabilities and integrating them with search technologies. We describe the elementary components of the framework and how they are deployed into(More)