Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching

  title={Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching},
  author={Alissa Ostapenko and Shuly Wintner and Melinda Fricke and Yulia Tsvetkov},
Natural language processing (NLP) models trained on people-generated data can be unreliable because, without any constraints, they can learn from spurious correlations that are not relevant to the task. We hypothesize that enriching models with speaker information in a controlled, educated way can guide them to pick up on relevant inductive biases. For the speaker-driven task of predicting code-switching points in English–Spanish bilingual dialogues, we show that adding sociolinguistically… 

Mixed-effects transformers for hierarchical adaptation

This paper introduces the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes — lightweight modules prepended to the input — to account for structured variation.

KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding

KALM, a KALM achieves state-of-the-art performance on three long document understanding tasks across 6 datasets/settings and reveals that the three knowledge-aware contexts are complementary and they all contribute to model performance, while the importance and information exchange patterns of different contexts vary with respect to different tasks and datasets.



A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies

A survey of code-switching covering the literature in linguistics with a reflection on the key issues in language technologies and how massive language models fail to represent diverse C-S types.

Codeswitching: A Bilingual Toolkit for Opportunistic Speech Planning

Recent empirical studies are reviewed and corpus evidence is provided that highlight how codeswitching serves as an opportunistic strategy for optimizing performance in cooperative communication and provides an alternative means to convey meaning, with implications for bilingual speech planning and language control more generally.

Unsupervised Cross-lingual Representation Learning at Scale

It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

Personalizing Dialogue Agents: I have a dog, do you have pets too?

This work collects data and train models tocondition on their given profile information; and information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction.

What Code-Switching Strategies are Effective in Dialog Systems?

This work collects and releases COMMONAMIGOS, a corpus of 587 human–computer text conversations between the authors' dialogue system and human users in mixed Spanish and English, and gives recommendations for future effective code-switching dialogue systems, highlighting user’s language proficiency and gender as critical considerations.

Topics to Avoid: Demoting Latent Confounds in Text Classification

This work proposes a method that represents the latent topical confounds and a model which “unlearns” confounding features by predicting both the label of the input text and the confound; but it shows that this model generalizes better and learns features that are indicative of the writing style rather than the content.

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

This work proposes a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data and demonstrates that this approach performs well on both sequence labeling tasks.

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

This work proposes influence tuning—a procedure that leverages model interpretations to update the model parameters towards a plausible interpretation (rather than an interpretation that relies on spurious patterns in the data) in addition to learning to predict the task labels.

Finetuned Language Models Are Zero-Shot Learners

It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

A Survey of Race, Racism, and Anti-Racism in NLP

This work surveys 79 papers from the ACL anthology that mention race to reveal various types of race-related bias in all stages of NLP model development, highlighting the need for proactive consideration of how NLP systems can uphold racial hierarchies.