• Publications
  • Influence
Quantification of protein group coherence and pathway assignment using functional association
TLDR
Two scores which quantify the functional coherence of sets of proteins have been developed and have been shown to have the ability to accurately distinguish biologically relevant groups of proteins from random ones as well as a good discriminative power for detecting interacting pairs of proteins.
Hope Speech Detection: A Computational Analysis of the Voice of Peace
TLDR
It is argued the importance of automatic identification of user-generated web content that can diffuse hostility and address this prediction task, dubbed hope-speech detection, in the context of heated discussions in a politically tense situation where two nations are at the brink of a full-fledged war.
Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models
TLDR
A novel application of fill-in-the-blank cloze statements against a recent high-performance language modeling algorithm, BERT, is presented, able to aggregate political sentiment and reveal community perception and track evolving national priorities and issues of interest.
Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas
TLDR
This work constructs a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones and advocates that beyond the burgeoning field of hate speech detection, automatic detection of help speech can lend voice to the voiceless people and make the internet safer for marginalized communities.
Kashmir: A Computational Analysis of the Voice of Peace
TLDR
It is argued the importance of automatic identification of user-generated web content that can diffuse hostility and address this prediction task, dubbed hope-speech detection, in the context of heated discussions in a politically tense situation where two nations are at the brink of a full-fledged war.
Harnessing Code Switching to Transcend the Linguistic Barrier
TLDR
This paper provides a systematic approach to sample code mixed documents leveraging a polyglot embedding based method that requires minimal supervision and holds promise in substantially reducing web moderation efforts.
Discovering Bilingual Lexicons in Polyglot Word Embeddings
TLDR
A novel finding is presented that a surprisingly simple constrained nearest-neighbor sampling technique in this embedding space can retrieve bilingual lexicons, even in harsh social media data sets predominantly written in English and Romanized Hindi and often exhibiting code switching.
Query Transformations for Result Merging
TLDR
How term-dependence models and query expansion strategies influence result-merging is explored in this work, which documents experiments to modify queries to merge results in the federated-search pipeline.
Annotation Efficient Language Identification from Weak Labels
TLDR
A minimally supervised NLP technique is leveraged to obtain weak language labels from a large-scale Indian social media corpus leading to a robust and annotation-efficient language-identification technique spanning nine Romanized Indian languages.
Harnessing Code Switching to Transcend the Linguistic Barrier
TLDR
This paper provides a systematic approach to sample code mixed documents leveraging a polyglot embedding based method that requires minimal supervision and holds promise in substantially reducing web moderation efforts.
...
1
2
3
...