• Publications
  • Influence
Code Mixing: A Challenge for Language Identification in the Language of Social Media
TLDR
In social media communication, multilingual speakers often switch between languages, and, in such an environment, automatic language identification becomes both a necessary and challenging task. Expand
  • 195
  • 19
  • PDF
Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text
TLDR
Language identification at the document level has been considered an almost solved problem in some application areas, but language detectors fail in the social media context due to phenomena such as utterance internal code-switching, lexical borrowings, and phonetic typing; all implying that language identification has to be carried out at the word level. Expand
  • 100
  • 18
  • PDF
Language Independent Named Entity Recognition in Indian Languages
TLDR
This paper reports about the development of a Named Entity Recognition (NER) system for South and South East Asian languages, particularly for Bengali, Hindi, Telugu, Oriya and Urdu as part of the IJCNLP-08 NER Shared Task. Expand
  • 85
  • 9
  • PDF
Code-Mixing in Social Media Text. The Last Language Identification Frontier?
TLDR
This paper reports an initial study to understand the characteristics of code-mixing in the social media context and presents a system developed to detect language boundaries in code-mixed social media text, here exemplified by Facebook messages in mixed English-Bengali and English-Hindi. Expand
  • 52
  • 8
  • PDF
Comparing the Level of Code-Switching in Corpora
TLDR
We define an objective measure of corpus level complexity of code-switched texts and apply it to social media corpora. Expand
  • 36
  • 8
  • PDF
Shared Task on Sentiment Analysis in Indian Languages (SAIL) Tweets - An Overview
TLDR
Sentiment Analysis in Twitter has been considered as a vital task for a decade from various academic and commercial perspectives. Expand
  • 62
  • 4
  • PDF
Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages
TLDR
The paper reports work on collecting and annotating code-mixed English-Hindi social media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-Grained part-ofspeech tag set. Expand
  • 83
  • 4
  • PDF
Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017
TLDR
This paper presents overview of the shared task on sentiment analysis of code-mixed data pairs of Hindi-English and Bengali-English collected from the different social media platform. Expand
  • 49
  • 4
  • PDF
Poetic Machine: Computational Creativity for Automatic Poetry Generation in Bengali
TLDR
The paper reports an initial study on computational poetry generation for Bengali. Expand
  • 31
  • 3
  • PDF
Topic-Based Bengali Opinion Summarization
TLDR
In this paper the development of an opinion summarization system that works on Bengali News corpus has been described. Expand
  • 40
  • 3
  • PDF
...
1
2
3
4
5
...