Share This Author
Effective Domain Mixing for Neural Machine Translation
This work shows that training NMT systems on naively mixed data can degrade performance versus models fit to each constituent domain, and proposes three models that do so by jointly learning domain discrimination and translation.
Automatically Neutralizing Subjective Bias in Text
- Reid Pryzant, Richard Diehl Martinez, Nathan Dass, S. Kurohashi, Dan Jurafsky, Diyi Yang
- Computer ScienceAAAI
- 21 November 2019
Large-scale human evaluation across four domains (encyclopedias, news headlines, books, and political speeches) suggests that these algorithms are a first step towards the automatic identification and reduction of bias.
JESC: Japanese-English Subtitle Corpus
JESC is a large Japanese-English parallel corpus covering the underrepresented domain of conversational dialogue and consists of more than 3.2 million examples, making it the largest freely available dataset of its kind.
Deconfounded Lexicon Induction for Interpretable Social Science
Two deep learning algorithms are introduced that are more predictive and less confound-related than those of standard feature weighting and lexicon induction techniques like regression and log odds and used to induce lexicons that are predictive of timely responses to consumer complaints, enrollment from course descriptions, and sales from product descriptions.
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
The statistical challenge of estimating causal effects with text is introduced, encompassing settings where text is used as an outcome, treatment, or to address confounding, and potential uses of causal inference are explored to improve the robustness, fairness, and interpretability of NLP models.
Predicting Sales from the Language of Product Descriptions
A novel neural network architecture is proposed that leverages an adversarial objective to control for confounding factors, and attentional scores over its input to automatically elicit textual features as a domain-specific lexicon and shows that these textual features can predict the sales of each product.
Causal Effects of Linguistic Properties
- Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, Dhanya Sridhar
- Computer ScienceNAACL
- 24 October 2020
TextCause, an algorithm for estimating causal effects of linguistic properties, is introduced and it is shown that the proposed method outperforms related approaches when estimating the effect of Amazon review sentiment on semi-simulated sales figures.
Interpretable Neural Architectures for Attributing an Ad’s Performance to its Writing Style
It is found that quick, easy, and authoritative language is associated with success, while lackluster embellishment is related to failure, which agrees with the advertising industry’s emperical wisdom, automatically revealing insights which previously required manual A/B testing to discover.
Monitoring Ethiopian Wheat Fungus with Satellite Imagery and Deep Feature Learning
- Reid Pryzant, S. Ermon, D. Lobell
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 1 July 2017
This work introduces a scalable, accurate, and inexpensive method for tracking outbreaks with publicly available remote sensing data that outperforms competing techniques, and demonstrates its predictive foresight.
Automatic Rule Induction for Efficient Semi-Supervised Learning
Automatic Rule Induction is proposed, a simple and general-purpose framework for the automatic discovery and integration of symbolic rules into pretrained transformer models that can improve state-of-the-art methods with no manual effort and minimal computational over-head.