• Publications
  • Influence
POS Tagging of English-Hindi Code-Mixed Social Media Content
TLDR
The initial efforts to create a multi-level annotated corpus of Hindi-English codemixed text collated from Facebook forums are described, and language identification, back-transliteration, normalization and POS tagging of this data are explored. Expand
The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives
TLDR
An in-depth analysis of COMICS demonstrates that neither text nor image alone can tell a comic book story, so a computer must understand both modalities to keep up with the plot. Expand
"I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook
TLDR
The classification of Code-Mixed words based on frequency and linguistic typology underline the fact that while there are easily identifiable cases of borrowing and mixing at the two ends, a large majority of the words form a continuum in the middle, emphasizing the need to handle these at different levels for automatic processing of the data. Expand
Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System
TLDR
A CRF based system for word-level language identification of code-mixed text that uses lexical, contextual, character n-gram, and special character features, and therefore, can easily be replicated across languages. Expand
Robust Cross-Lingual Hypernymy Detection Using Dependency Context
TLDR
BSparse-Dep is proposed, a family of unsupervised approaches for cross-lingual hypernymy detection, which learns sparse, bilingual word embeddings based on dependency contexts and is robust, showing promise for low-resource settings. Expand
Sparse Bilingual Word Representations for Cross-lingual Lexical Entailment
TLDR
To address the challenge of comparing contexts across languages, a novel method for inducing sparse bilingual word representations from monolingual and parallel texts is proposed and significantly outperforms strong baselines based on translation and on existing word representations. Expand
CLIP@UMD at SemEval-2016 Task 8: Parser for Abstract Meaning Representation using Learning to Search
TLDR
A novel technique to parse English sentences into Abstract Meaning Representation (AMR) using SEARN, a Learning to Search approach, by modeling the concept and the relation learning in a unified framework with an absolute improvement over the state-of-the-art. Expand
Detecting Asymmetric Semantic Relations in Context: A Case-Study on Hypernymy Detection
TLDR
WHiC lets us analyze complementary properties of two approaches of inducing vector representations of word meaning in context, and shows that such contextualized word representations also improve detection of a wider range of semantic relations in context. Expand
The UMD machine translation systems at IWSLT 2015
TLDR
The University of Maryland machine translation systems submitted to the IWSLT 2015 French-English and Vietnamese-English tasks are described and novel data selection techniques to select relevant information from the large French- English training corpora are applied, and neural language models are tested. Expand
Improving unsupervised query segmentation using parts-of-speech sequence information
We present a generic method for augmenting unsupervised query segmentation by incorporating Parts-of-Speech (POS) sequence information to detect meaningful but rare n-grams. Our initial experimentsExpand
...
1
2
...