Adriane Boyd

Learn More
Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training(More)
In this paper, we present a machine learning system for identifying non-referential it. Types of non-referential it are examined to determine relevant linguistic patterns. The patterns are incorporated as features in a machine learning system which performs a binary classification of it as referential or non-referential in a POS-tagged corpus. The selection(More)
Second language acquisition research since the 90s has emphasized the importance of supporting awareness of language categories and forms, and input enhancement techniques have been proposed to make target language features more salient for the learner. We present an NLP architecture and webbased implementation providing automatic visual input enhancement(More)
While error detection approaches have been developed for various types of corpus annotation, so far only limited attention has been paid to the recall of those methods. We show how the recall of the so-called variation n-gram method can be increased by examining comparable part-of-speech tag sequences instead of the recurring strings themselves. To guide(More)
This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of(More)
We propose a method for modeling pronunciation variation in the context of spell checking for non-native writers of English. Spell checkers, typically developed for native speakers, fail to address many of the types of spelling errors peculiar to non-native speakers, especially those errors influenced by differences in phonology. Our model of pronunciation(More)
The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains 2,290 learner texts produced in standardized language certifications covering CEFR levels A1–C1. The MERLIN annotation scheme(More)
We extend our n-gram-based data-driven prediction approach from the Helping Our Own (HOO) 2011 Shared Task (Boyd and Meurers, 2011) to identify determiner and preposition errors in non-native English essays from the Cambridge Learner Corpus FCE Dataset (Yannakoudakis et al., 2011) as part of the HOO 2012 Shared Task. Our system focuses on three error(More)
Recent parsing research has started addressing the questions a) how parsers trained on different syntactic resources differ in their performance and b) how to conduct a meaningful evaluation of the parsing results across such a range of syntactic representations. Two German treebanks, Negra and TüBa-D/Z, constitute an interesting testing ground for such(More)