• Publications
  • Influence
Optimizing Semantic Coherence in Topic Models
TLDR
A novel statistical topic model based on an automated evaluation metric based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
A Reductions Approach to Fair Classification
TLDR
The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier with the lowest (empirical) error subject to the desired constraints.
Rethinking LDA: Why Priors Matter
TLDR
The prior structure advocated substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language.
Evaluation methods for topic models
TLDR
It is demonstrated experimentally that commonly-used methods are unlikely to accurately estimate the probability of held-out documents, and two alternative methods that are both accurate and efficient are proposed.
Topic modeling: beyond bag-of-words
TLDR
A hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model is explored.
Datasheets for datasets
TLDR
Documentation to facilitate communication between dataset creators and consumers and consumers is presented.
Polylingual Topic Models
TLDR
This work introduces a polylingual topic model that discovers topics aligned across multiple languages and demonstrates its usefulness in supporting machine translation and tracking topic trends across languages.
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
TLDR
A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems.
Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?
TLDR
This first systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems identifies areas of alignment and disconnect between the challenges faced by teams in practice and the solutions proposed in the fair ML research literature.
Manipulating and Measuring Model Interpretability
TLDR
A sequence of pre-registered experiments showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box).
...
...