Yogarshi Vyas

Learn More
Code-mixing is frequently observed in user generated content on social media, especially from multilingual users. The linguistic complexity of such content is compounded by presence of spelling variations , transliteration and non-adherance to formal grammar. We describe our initial efforts to create a multi-level annotated corpus of Hindi-English(More)
In this paper we describe our approach to the Abstract Meaning Representation (AMR) parsing shared task as part of SemEval 2016. We develop a novel technique to parse En-glish sentences into AMR using Learning to Search. We decompose the AMR parsing task into three subtasks-that of predicting the concepts , the relations, and the root. Each of these(More)
We describe a CRF based system for word-level language identification of code-mixed text. Our method uses lexical, contextual, character n-gram, and special character features, and therefore, can easily be replicated across languages. Its performance is benchmarked against the test sets provided by the shared task on code-mixing (Solorio et al., 2014) for(More)
1 Code-Mixing is a frequently observed phenomenon in social media content generated by multilingual users. The processing of such data for linguistic analysis as well as computational modelling is challenging due to the linguistic complexity resulting from the nature of the mixing as well as the presence of non-standard variations in spellings and grammar,(More)
We develop a novel technique to parse English sentences into Abstract Meaning Representation (AMR) using SEARN, a Learning to Search approach, by modeling the concept and the relation learning in a unified framework. We evaluate our parser on multiple datasets from varied domains and show an absolute improvement of 2% to 6% over the state-of-the-art.(More)
We present a simple method for representing text that explicitly encodes differences between two corpora in a domain adaptation or data selection scenario. We do this by replacing every word in the corpora with its part-of-speech tag plus a suffix that indicates the relative bias of the word, or how much likelier it is to be in the task corpus versus the(More)
We introduce the task of cross-lingual lexical entailment, which aims to detect whether the meaning of a word in one language can be inferred from the meaning of a word in another language. We construct a gold standard for this task, and propose an unsupervised solution based on distributional word representations. As commonly done in the monolingual(More)
We describe the University of Maryland machine translation systems submitted to the IWSLT 2015 French-English and Vietnamese-English tasks. We built standard hierarchical phrase-based models, extended in two ways: (1) we applied novel data selection techniques to select relevant information from the large French-English training corpora, and (2) we(More)
We present a generic method for augmenting unsupervised query segmentation by incorporating Parts-of-Speech (POS) sequence information to detect meaningful but rare n-grams. Our initial experiments with an existing English POS tagger employing two different POS tagsets and an unsupervised POS induction technique specifically adapted for queries show that(More)
  • 1