• Publications
  • Influence
Unsupervised Cross-lingual Representation Learning at Scale
TLDR
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time. Expand
The World Conversation: Web Page Metadata Generation From Social Sources
TLDR
This paper presents a technique called social signatures that given a link to a web page, pulls the most important keywords from the social chatter around it, a high level representation of the web page from a social media perspective. Expand
Optical Character Recognition for Handwritten Hindi
Optical Character Recognition (OCR) is the electronic conversion of scanned images of hand written text into machine encoded text. In this project various image pre-processing, features extractionExpand
Kondenzer: Exploration and visualization of archived social media
TLDR
Kondenzer is presented - an offline system for condensing, archiving and visualizing social data that creates digests of social data using a combination of filtering, duplicate removal and efficient clustering. Expand
Exploiting entities in social media
TLDR
This paper proposes to aggregate tweets by pivoting on entities and mapping them to topics that are already defined in websites such as Wikipedia and Freebase, showing that such an approach indeed works well and presents encouraging results and various interesting applications centered on entities. Expand