Share This Author
The Risk of Racial Bias in Hate Speech Detection
This work proposes *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.
On the Opportunities and Risks of Foundation Models
This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities, to their applications, and what they are even capable of due to their emergent properties.
Show Your Work: Improved Reporting of Experimental Results
- Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
- Computer ScienceEMNLP
- 6 September 2019
It is demonstrated that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best, and a novel technique is presented: expected validation performance of the best-found model as a function of computation budget.
Neural Models for Documents with Metadata
A general neural framework is proposed, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models, and achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity.
The Media Frames Corpus: Annotations of Frames Across Issues
We describe the first version of the Media Frames Corpus: several thousand news articles on three policy issues, annotated in terms of media framing. We motivate framing as a phenomenon of study for…
Tracking the Development of Media Frames within and across Policy Issues
Framing is a central concept in political communication and a powerful political tool. Thus, it is hugely important to understand: a) what frames are used to define specific issues, b) what general…
A Neural Framework for Generalized Topic Models
This paper combines certain motivating ideas behind variations on topic models with modern techniques for variational inference to produce a flexible framework for topic modeling that allows for rapid exploration of different models.
With Little Power Comes Great Responsibility
- Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, Dan Jurafsky
- Computer ScienceEMNLP
- 13 October 2020
It is concluded that underpowered experiments are common in the NLP literature and an overview of best practices for power analysis in NLP is given and a series of notebooks are released to assist with future power analyses.
Variational Pretraining for Semi-supervised Text Classification
VAMPIRE is introduced, a lightweight pretraining framework for effective text classification when data and computing resources are limited and it is found that fine-tuning to in-domain data is crucial to achieving decent performance from contextual embeddings when working with limited supervision.
The Values Encoded in Machine Learning Research
- Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, Michelle Bao
- Computer ScienceFAccT
- 29 June 2021
A method and annotation scheme for studying the values encoded in documents such as research papers is introduced and systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power is found.