• Publications
  • Influence
Citation Worthiness of Sentences in Scientific Reports
TLDR
In this paper, we introduce the task of citation worthiness for scientific texts at a sentence-level granularity. Expand
  • 4
100, 000 Podcasts: A Spoken English Document Corpus
TLDR
We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts, comprising nearly 60,000 hours of speech. Expand
  • 1
Evolving Stream Processing Designing adaptive classifiers for evolving stream data is a challenging task
Designing adaptive classifiers for evolving stream data is a challenging task due to the size and changing nature of data streams. Among existing classifiers, ensemble-based approach is one of theExpand
Training Effective Neural CLIR by Bridging the Translation Gap
TLDR
We introduce Smart Shuffling, a cross-lingual embedding (CLE) method that draws from statistical word alignment approaches to leverage dictionaries, producing dense representations that are significantly more effective for cross-language information retrieval (CLIR) than prior CLE methods. Expand