Learn More
The paper describes a corpus of texts produced by non-native speakers of Czech. We discuss its annotation scheme, consisting of three interlinked tiers, designed to handle a wide range of error types present in the input. Each tier corrects different types of errors; links between the tiers allow capturing errors in word order and complex discontinuous(More)
Using an error-annotated learner corpus as the basis, the goal of this paper is twofold: (i) to evaluate the practicality of the annotation scheme by computing inter-annotator agreement on a non-trivial sample of data, and (ii) to find out whether the application of automated linguistic annotation tools (tag-gers, spell checkers and grammar checkers) on the(More)
We present an approach to building a learner corpus of Czech, manually corrected and annotated with error tags using a complex grammar based taxonomy of errors in spelling, morphology, morphosyntax, lexicon and style. This grammar-based annotation is supplemented by a formal classification of errors based on surface alternations. To supply additional(More)
Dekomprese v popisu jazyka aneb hlubiny i m ˇ elčiny deklarativň e Alexandr Rosen Ústav teoretické a komputační lingvistiky Universita Karlova v Praze • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit " A constraint-based approach to dependency syntax applied to some issues of Czech word order " " Deklarativní formalizace teorie(More)
We present Korektor – a flexible and powerful purely statistical text correction tool for Czech that goes beyond a traditional spell checker. We use a combination of several language models and an error model to offer the best ordering of correction proposals and also to find errors that cannot be detected by simple spell checkers, namely spelling errors(More)