• Publications
  • Influence
StereoSet: Measuring stereotypical bias in pretrained language models
TLDR
StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion, is presented and it is shown that popular models like BERT, GPT-2, RoBERTa, and XLnet exhibit strong stereotypical biases.
Identifying Depression on Twitter
TLDR
This work employs a crowdsourced method to compile a list of Twitter users who profess to being diagnosed with depression, and posits a new methodology for constructing a classifier by treating social as a text-classification problem, rather than a behavioral one on social media platforms.
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
TLDR
GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics, is introduced and the description of the data for the 2021 shared task at the associated GEM Workshop is described.
FAKTA: An Automatic End-to-End Fact Checking System
TLDR
FAKTA predicts the factuality of given claims and provides evidence at the document and sentence level to explain its predictions, and FAKTA integrates various components of a fact-checking process.
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation
TLDR
This work uses the quality-diversity (Q-D) trade-off to investigate three popular sampling methods (top-k, nucleus and tempered sampling), and designs two sets of new sampling methods that satisfy three key properties: entropy reduction, order preservation, and slope preservation.
Neural Multi-Task Learning for Stance Prediction
TLDR
This work presents a multi-task learning model that leverages large amount of textual information from existing datasets to improve stance prediction and obtains state-of-the-art performance on a public benchmark dataset, Fake News Challenge.
Neural Educational Recommendation Engine (NERE)
TLDR
This paper proposes a novel approach, i.e. Neural Educational Recommendation Engine (NERE), to recommend educational content by leveraging student behaviors rather than ratings, and positions this work as one of the first educational recommender systems for the K-12 space.
Automating Network Error Detection using Long-Short Term Memory Networks
TLDR
It is shown that LSTMs have a 70% accuracy on classifying network errors, and it is demonstrated that K-Means is able to classify messages, but not necessary provide meaningful clusters.
Context-Aware Systems for Sequential Item Recommendation
TLDR
This paper proposes a novel approach, i.e. Neural Educational Recommendation Engine (NERE), to recommend educational content by leveraging student behaviors rather than ratings, and positions this work as one of the first educational recommender systems for the K-12 space.