• Corpus ID: 11256837

KoKo: an L1 Learner Corpus for German

@inproceedings{Abel2014KoKoAL,
  title={KoKo: an L1 Learner Corpus for German},
  author={Andreas Abel and Aivars Glaznieks and Lionel Nicolas and Egon W. Stemle},
  booktitle={LREC},
  year={2014}
}
We introduce the KoKo corpus, a collection of German L1 learner texts annotated with learner errors, along with the methods and tools used in its construction and evaluation. The corpus contains both texts and corresponding survey information from 1,319 pupils and amounts to around 716,000 tokens. The evaluation of the performed transcriptions and annotations shows an accuracy of orthographic error annotations of approximately 80% as well as high accuracies of transcriptions (> 99%), automatic… 

Tables and Topics from this paper

An Extended Version of the KoKo German L1 Learner Corpus
TLDR
An extended version of the KoKo corpus is described, a corpus of written German L1 learner texts from three different German-speaking regions in three different countries that is richly annotated with learner language features on different linguistic levels such as errors or other linguistic characteristics that are not deficit-oriented, and is enriched with a wide range of metadata.
Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus
TLDR
A new longitudinal L1 learner corpus for German is presented, which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and is thereby tailored to research and tool development for orthographic issues in primary school.
The SweLL Language Learner Corpus
The article presents a new language learner corpus for Swedish, SweLL, and the methodology from collection and pesudonymisation to protect personal information of learners to annotation adapted to
Toward a Paradigm Shift in Collection of Learner Corpora
TLDR
The first version of the longitudinal Revita Learner Corpus (ReLCo), for Russian, is presented, which is collected and annotated fully automatically, while students perform exercises using the Revita language-learning platform.
The Litkey Corpus: A richly annotated longitudinal corpus of German texts written by primary school children
TLDR
A longitudinal corpus of texts on short picture stories written by German primary school children between grades 2 and 4 and grades 3 and 4 is presented, providing a detailed assessment of the properties of words that tend to increase the likelihood of spelling errors.
Grammar Error Correction in Morphologically Rich Languages: The Case of Russian
TLDR
This work presents a corrected and error-tagged corpus of Russian learner writing and develops models that make use of existing state-of-the-art methods that have been well studied for English to correct writing mistakes in morphologically rich languages such as Russian.
Establishing a Standardised Procedure for Building Learner Corpora
TLDR
This paper presents a generic workflow to build learner corpora while taking into account the needs of the users, and addresses the linguists’ research needs as well as the availability and usability of language technology tools necessary to meet them.
Corpus for Children's Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
TLDR
The corpus consists of the elicitation techniques, an overview of the data collected and the transcriptions of the texts both with and without spelling errors, aligned on a word by word basis, as well as the scanned in texts.
CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence
TLDR
The corpus was built in the framework of an interdisciplinary study jointly carried out by computational linguistics and experimental pedagogists and aimed at tracking the development of written language competence over the years and students’ background information.
Building the Arabic learner corpus and a system for Arabic error annotation
TLDR
This thesis aims to introduce a detailed and original methodology for developing a new learner corpus for Arabic based on systematic design criteria which was successfully able to recruit 992 people including language learners, data collectors, evaluators, annotators and collaborators from more than 30 educational institutions in Saudi Arabia and the UK.
...
1
2
...

References

SHOWING 1-10 OF 63 REFERENCES
The ALeSKo learner corpus
TLDR
Possible applications of the ALesKo learner corpus are described, which is a small-scale comparable corpus consisting of two subcorpora: annotated essays by advanced Chinese learners of German and comparable essays by German native speakers, in foreign language acquisition research and language teaching.
Error-tagged learner corpora and CALL: a promising synergy
TLDR
The three-tiered error annotation system designed to annotate the French Interlanguage Database (FRIDA) corpus was described, which was used to focus the CALL exercises on learners' attested difficulties and to improve the error diagnosis system integrated in the CALL program.
Multi-level error annotation in learner corpora
TLDR
It is argued for a multi-level standoff architecture (rather than a flat token-tag architecture) for error-tagged learner corpora, and it is shown how multilevel approaches to learner Corpora can help solve some of the problems that occur in error tagging if flat annotation models are used.
Building a learner corpus
TLDR
This work focuses on the more technical aspects of the compilation of the first learner corpus of Czech: the transcription of hand-written source texts, process of annotation, and options for exploiting the result, together with tools used for these tasks and decisions behind the choices.
Criterial Features in L2 English: Specifying the Reference Levels of the Common European Framework
This volume introduces a new concept, 'criterial features', for the learning, teaching and testing of English as a second language. The work is based on research conducted within the English Profile
Establishing a Standardised Procedure for Building Learner Corpora
TLDR
This paper presents a generic workflow to build learner corpora while taking into account the needs of the users, and addresses the linguists’ research needs as well as the availability and usability of language technology tools necessary to meet them.
A Trilingual Learner Corpus illustrating European Reference Levels
TLDR
The aim of this paper is to both present the MERLIN project with the motivation behind and its corpus and to discuss its current state.
ERROR TAGGING SYSTEMS FOR LEARNER CORPORA
Learner corpora are used to investigate computerised learner language so as to gain insights into foreign language learning. One of the methodologies that can be applied to this type of research is
ANNIS: A Search Tool for Multi-Layer Annotated Corpora
TLDR
The different features of the architecture as well as actual use cases for corpus linguistic research on such diverse areas as information structure, learner language and discourse level phenomena are presented.
Natural Language Processing and Language Learning
As a relatively young field of research and development started by work on cryptanalysis and machine translation around 50 years ago, Natural Language Processing (NLP) is concerned with the automated
...
1
2
3
4
5
...