• Publications
  • Influence
Self-citation is the hallmark of productive authors, of any gender
TLDR
It is found that self-citation is the hallmark of productive authors, of any gender, who cite their novel journal publications early and in similar venues, and more often cross citation-barriers such as language and indexing. Expand
Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets
TLDR
Effectiveness of multi-dataset-multi-task learning in training neural models for four sequence tagging tasks for Twitter data, namely, part of speech tagging, chunking, super sense tagging, and named entity recognition is studied. Expand
Quantifying Conceptual Novelty in the Biomedical Literature
TLDR
It is found that individual concept novelty is rare, while combinatorial novelty is the norm across all papers in MEDLINE published since 1985, and these novelty measures exhibit complex correlations with article impact and authors' professional age. Expand
Semi-supervised Named Entity Recognition in noisy-text
TLDR
The models described in this paper are based on linear chain conditional random fields (CRFs), use the BIEOU encoding scheme, and leverage random feature dropout for up-sampling the training data, and include word clusters and pre-trained distributed word representations, updated gazetteer features, and global context predictions. Expand
WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia
TLDR
A human-in-the-loop workflow that first extracts an initial category tree from crowd-sourced Wikipedia data, and then combines community detection, machine learning, and hand-crafted heuristics or rules to prune the initial tree resulted in WikiCSSH; a large-scale, hierarchically-organized subject heading vocabulary for the domain of computer science (CS). Expand
3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identification in Indo-European Languages
TLDR
The team 3Idiots’s approach for participating in the 2019 shared task on hate speech and offensive content (HASOC) identification in Indo-European languages relies on finetuning pre-trained monolingual and multilingual transformer (BERT) based neural network models and investigates an approach based on labels joined from all sub-tasks. Expand
Enthusiasm and support: alternative sentiment classification for social movements on social media
TLDR
It is suggested that enthusiastic and supportive tweets are more prevalent in tweets about social causes than other types of tweets on Twitter. Expand
Capturing Signals of Enthusiasm and Support Towards Social Issues from Twitter
TLDR
This paper analyzes the robustness of a prior framework for tagging tweets across the dimensions of enthusiasm and support, and offers an alternative or supplemental classification schema and prediction model to standard sentiment analysis and stance detection. Expand
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization
TLDR
This work provides a free, open and GUI-based sentiment analysis tool that allows for a) relabeling predictions and/or adding labeled instances to retrain the weights of a given model, and b) customizing lexical resources to account for false positives and false negatives in sentiment lexicons. Expand
Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020
TLDR
The multilingual joint training approach to be the best trade-off between computational efficiency of model deployment and model’s evaluation performance and the utility of task label marginalization, joint label classification, and joint training on multilingual datasets. Expand
...
1
2
3
4
...