• Publications
  • Influence
Domain Adaptation for Parsing
We compare two different methods in domain adaptation applied to constituent parsing: parser combination and cotraining, each used to transfer information from the source domain of news to the target
The IUCL+ System: Word-Level Language Identification via Extended Markov Models
TLDR
The IUCL+ system combines character n-gram probabilities, lexical probabilities, word label transition probabilities and existing named entity recognitiontools within a Markovmodel framework that weights these components and assigns a label.
Adding Context Information to Part Of Speech Tagging for Dialogues
TLDR
This work investigates the performance of Markov model and maximum entropy POS taggers given a small data set of spontaneous dialogues in a collaborative search task, and investigates whether adding information about the speaker or about the dialogue move of the sentence can improve results.
Projecting Farsi POS Data To Tag Pashto
TLDR
This work makes a series of modifications to both tag transition and lexical emission parameter files generated from a hidden Markov model tagger, TnT, trained on the source language (Farsi) to help tag a lower resourced language, Pashto, following Feldman and Hana (2010).
Mirroring the real world in social media: twitter, geolocation, and sentiment analysis
TLDR
This research seeks to characterize the relationship between the language used on Twitter and the results of the 2011 NBA Playoff games, and finds that the hypothesized difference in language should have predictive power over the tweet labels does and indeed it does.
Fast Domain Adaptation for Part of Speech Tagging for Dialogues
TLDR
This work investigates a fast method for domain adaptation, which provides additional in-domain training data from an unannotated data set by applying POS taggers with different biases to the unannotate data set and then choosing the set of sentences on which the taggers agree.
Parallel Syntactic Annotation in CReST
TLDR
The CReST corpus is the first of its kind, providing parallel syntactic annotation based on three different gram- mar formalisms for a dialogue corpus, thus providing a high quality resource for linguistic comparisons, but also for parser evaluation across frameworks.