Markus Dickinson

Learn More
Introduction Treebanks used as: • " gold standard " training and testing material for computational linguists • data for linguists to search through for theoretically relevant patterns Introduction Treebanks used as: • " gold standard " training and testing material for computational linguists • data for linguists to search through for theoretically(More)
Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training(More)
Many evaluation issues for grammatical error detection have previously been overlooked, making it hard to draw meaningful comparisons between different approaches, even when they are evaluated on the same corpus. To begin with, the three-way contingency between a writer’s sentence, the annotator’s correction, and the system’s output makes evaluation more(More)
While error detection approaches have been developed for various types of corpus annotation, so far only limited attention has been paid to the recall of those methods. We show how the recall of the so-called variation n-gram method can be increased by examining comparable part-of-speech tag sequences instead of the recurring strings themselves. To guide(More)
Annotated corpora are essential for training and testing algorithms in natural language processing (NLP), but even so-called gold-standard corpora contain a significant number of annotation errors (cf. Dickinson 2005, and references therein). For part-of-speech annotation, these errors have been shown to be problematic for both training and evaluation of(More)
Building from the CHILDES dependency annotation scheme and on interlanguage POS annotation, we describe a syntactic annotation scheme developed for the data of second language learners. We encode subcategorization frames and underlying dependencies, in addition to the usual surface dependencies. The annotation scheme is relatively independent of language(More)
To speed up the process of categorizing learner errors and obtaining data for languages which lack error-annotated data, we describe a linguistically-informed method for generating learner-like morphological errors, focusing on Russian. We outline a procedure to select likely errors, relying on guiding stem and suffix combinations from a segmented lexicon(More)