Learn More
The Prague Czech-English Dependency Treebank (PCEDT) is a syntactically annotated Czech-English parallel corpus. The Penn Treebank has been translated to Czech, and its annotation automatically transformed into dependency annotation scheme. The dependency annotation of Czech is done from plain text by automatic procedures. A small subset of corresponding(More)
This paper presents a process for leveraging structural relationships and reusable phrases when translating large-scale ontologies. Digital libraries are becoming more and more prevalent. An important step in providing universal access to such material is to provide multilingual access to the underlying principles of organization via ontologies, thesauri,(More)
This paper describes the key role of a stochastic morphological tagger in an MT system between very closely related languages. The MT system Česílko exploits the close relatedness of both natural languages in question (Czech and Slovak), which allows substantial simplification of the translation method used. It also uses to a great advantage the(More)
Evaluating Optical Music Recognition (OMR) is notoriously difficult and automated end-to-end OMR evaluation metrics are not available to guide development. In " Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images " , Byrd and Simon-sen recently stress that a benchmarking standard is needed in the OMR community,(More)
In this paper we present UIMA – the Unstructured Information Management Architecture, an architecture and software framework for creating, discovering, composing and deploying a broad range of multi-modal analysis capabilities and integrating them with search technologies. We describe the elementary components of the framework and how they are deployed into(More)
—Optical Music Recognition (OMR) has long been without an adequate dataset and ground truth for evaluating OMR systems, which has been a major problem for establishing a state of the art in the field. Furthermore, machine learning methods require training data. We analyze how the OMR processing pipeline can be expressed in terms of gradually more complex(More)