Learn More
The paper describes a corpus of texts produced by non-native speakers of Czech. We discuss its annotation scheme, consisting of three interlinked tiers, designed to handle a wide range of error types present in the input. Each tier corrects different types of errors; links between the tiers allow capturing errors in word order and complex discontinuous(More)
This paper introduces InterCorp, a parallel corpus including texts in Czech and 27 other languages, available for online searches via a web interface. After discussing some issues and merits of a multilingual resource we argue that it has an important role especially for languages with fewer native speakers, supporting both comparative research and studies(More)
Using an error-annotated learner corpus as the basis, the goal of this paper is twofold: (i) to evaluate the practicality of the annotation scheme by computing inter-annotator agreement on a non-trivial sample of data, and (ii) to find out whether the application of automated linguistic annotation tools (tag-gers, spell checkers and grammar checkers) on the(More)
Building treebanks is a prerequisite for various experiments and research tasks in the area of NLP. Under a recently awarded grant, 1 we are developing (i) a formal definition of a (dependency based) tree, and (ii) a mid-size treebank based on this definition. The annotated corpus is designed to have three layers: morphosyntactic (linear) tagging, syntactic(More)
We present an approach to building a learner corpus of Czech, manually corrected and annotated with error tags using a complex grammar based taxonomy of errors in spelling, morphology, morphosyntax, lexicon and style. This grammar-based annotation is supplemented by a formal classification of errors based on surface alternations. To supply additional(More)
Dekomprese v popisu jazyka aneb hlubiny i m ˇ elčiny deklarativň e Alexandr Rosen Ústav teoretické a komputační lingvistiky Universita Karlova v Praze • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit " A constraint-based approach to dependency syntax applied to some issues of Czech word order " " Deklarativní formalizace teorie(More)