VALUE: Understanding Dialect Disparity in NLU

  title={VALUE: Understanding Dialect Disparity in NLU},
  author={Caleb Ziems and Jiaao Chen and Camille Harris and Jessica Brooke Anderson and Diyi Yang},
English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we… 

Tables from this paper


Learning to Recognize Dialect Features
Evaluation on a test set of 22 dialect features of Indian English demonstrates that these models learn to recognize many features with high accuracy, and that a few minimal pairs can be as effective for training as thousands of labeled examples.
Dialect-Specific Models for Automatic Speech Recognition of African American Vernacular English
The effect on transcription accuracy of an automatic voice recognition system when AAVE data is used is explored and the importance of increasing diversity in the field of natural language processing is highlighted.
Incorporating Dialectal Variability for Socially Equitable Language Identification
This work proposes a new dataset and a character-based sequence-to-sequence model for LID designed to support dialectal and multilingual language varieties and substantially increases the availability of texts written by underrepresented populations, enabling the development of “socially inclusive” NLP tools.
Towards Augmenting Lexical Resources for Slang and African American English
This work uses word embeddings and clustering algorithms to group semantically similar words in three datasets, two of which contain high incidence of slang and AAE and proposes the new Cluster Split Score as a metric for machine-generated clusters.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Demographic Dialectal Variation in Social Media: A Case Study of African-American English
A case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter and proposes a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages, and verifies that this language follows well-known AAE linguistic phenomena.
Noise-Robust Morphological Disambiguation for Dialectal Arabic
This work presents a neural morphological tagging and disambiguation model for Egyptian Arabic, with various extensions to handle noisy and inconsistent content.
Syntactic Variation and Linguistic Competence: The Case of Aave Copula Absence
This thesis explores the implications for competence theories of syntax of the data on variation found by sociolinguists working in the Labovian tradition, through a case study of variable copula
It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
P perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples that expose biases in popular NLP models, and show that adversarially fine-tuning them for a single epoch significantly improves robustness without sacrificing performance on clean data.
Language and linguistics on trial: Hearing Rachel Jeantel (and other vernacular speakers) in the courtroom and beyond
Abstract: Rachel Jeantel was the leading prosecution witness when George Zimmerman was tried for killing Trayvon Martin, but she spoke in African American Vernacular English (AAVE) and her crucial