• Publications
  • Influence
Choosing Transfer Languages for Cross-Lingual Learning
TLDR
This paper considers the task of automatically selecting optimal transfer languages as a ranking problem, and builds models that consider the aforementioned features to perform this prediction, and demonstrates that this model predicts good transfer languages much better than ad hoc baselines considering single features in isolation.
Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique
TLDR
This work presents a novel unsupervised word-level language detection technique for code-switched text for an arbitrarily large number of languages, which does not require any manually annotated training data.
Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?
TLDR
The study indicates that Hindi (i.e., the native language) is preferred over English for expression of negative opinion and swearing, and develops classifiers for opinion detection in these languages.
Temporally-Informed Analysis of Named Entity Recognition
TLDR
This work analyzes and proposes methods that make better use of temporally-diverse training data, with a focus on the task of named entity recognition, and empirically demonstrates the effect of temporal drift on performance and how the temporal information of documents can be used to obtain better models compared to those that disregard temporal information.
Zero-shot Neural Transfer for Cross-lingual Entity Linking
TLDR
This work proposes pivot-basedentity linking, which leverages information from a highresource “pivot” language to train character-level neural entity linking models that are transferred to the source lowresource language in a zero-shot manner.
Soft Gazetteers for Low-Resource Named Entity Recognition
TLDR
This work proposes a method of “soft gazetteers” that incorporates ubiquitously available information from English knowledge bases, such as Wikipedia, into neural named entity recognition models through cross-lingual entity linking.
Towards Zero-resource Cross-lingual Entity Linking
TLDR
This work examines the effect of resource assumptions and quantifies how much the availability of these resource affects overall quality of existing XEL systems and proposes three improvements to both entity candidate generation and disambiguation that make better use of the limited resources the authors do have in resource-scarce scenarios.
Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages
TLDR
It is found that code-switching is positively associated with Wikipedia editor success, particularly borrowing technical language on pages with topics less directly related to Arabic-speaking regions.
Practical Comparable Data Collection for Low-Resource Languages via Images
We propose a method of curating high-quality comparable training data for low-resource languages with monolingual annotators. Our method involves using a carefully selected set of images as a pivot
Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text
TLDR
This work proposes a framework to synthesize code-mixed text by using a TTS database in a single language, identifying the language that each word was from, normalizing spellings of a language written in a non-standardized script and mapping the phonetic space of mixed language to thelanguage that the T TS database was recorded in.
...
1
2
3
...