Share This Author
Domain Adaptation for Parsing
We compare two different methods in domain adaptation applied to constituent parsing: parser combination and cotraining, each used to transfer information from the source domain of news to the target…
The IUCL+ System: Word-Level Language Identification via Extended Markov Models
The IUCL+ system combines character n-gram probabilities, lexical probabilities, word label transition probabilities and existing named entity recognitiontools within a Markovmodel framework that weights these components and assigns a label.
Word-level language identification in The Chymistry of Isaac Newton
- Levi King, Sandra Kübler, W. Hooper
- Linguistics, Computer ScienceDigit. Scholarsh. Humanit.
- 7 July 2014
The task of word-based language identification in multilingual texts, in which every word needs to be classified with regard to its language, is introduced and a novel method based on character n-grams in combination with a weighting scheme that allows to model the probability of language switches at different points in sentences is presented.
Shallow Semantic Analysis of Interactive Learner Sentences
This paper collects data from a task which models some aspects of interaction, namely a picture description task (PDT), and uses a decision tree to classify sentences into syntactic types and extract the logical subject, verb, and object.
Leveraging known Semantics for Spelling Correction
This work explores the use of spelling correction tools and language modeling to correct misspellings that often lead to errors in obtaining semantic forms, and shows that such tools can significantly reduce the number of unanalyzable cases.
Shallow Semantic Reasoning from an Incomplete Gold Standard for Learner Language
Different models of representing and scoring non-native speaker responses to a picture, including bags of dependencies, automatically determining the relevant parts of an image from a set of native speaker (NS) responses are explored.
Annotating picture description task responses for content analysis
By examining the decisions made in this corpus development, this work highlights the questions facing anyone working with learner language properties like variability, acceptability and native-likeness.
IUCL: Combining Information Sources for SemEval Task 5
The Indiana University system for SemEval Task 5, the L2 writing assistant task, is described, incorporating phrase tables extracted from bitexts, an L2 language model, a multilingual dictionary, and dependency-based collocational models derived from large samples of targetlanguage text.
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data
Recent progress towards developing more inclusive ASR systems is discussed, namely, the importance of building new data sets representing linguistic diversity, and exploring novel training approaches to improve performance for all users.