Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour

  title={Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour},
  author={Fangyu Liu and Julian Martin Eisenschlos and Jeremy R. Cole and Nigel Collier},
Language models (LMs) trained on raw texts have no direct access to the physical world. Gordon and Van Durme (2013) point out that LMs can thus suffer from reporting bias: texts rarely report on common facts, instead focusing on the unusual aspects of a situation. If LMs are only trained on text corpora and naively memorise local co-occurrence statistics, they thus naturally would learn a biased view of the physical world. While prior studies have repeatedly verified that LMs of smaller scales… 

Figures and Tables from this paper



The World of an Octopus: How Reporting Bias Influences a Language Model’s Perception of Color

The results show that the distribution of colors that a language model recovers correlates more strongly with the inaccurate distribution found in text than with the ground-truth, supporting the claim that reporting bias negatively impacts and inherently limits text-only training.

Visual Commonsense in Pretrained Unimodal and Multimodal Models

The Visual Commonsense Tests (ViComTe) dataset is created and results indicate that multimodal models better reconstruct attribute distributions, but are still subject to reporting bias, and increasing model size does not enhance performance, suggesting that the key to visual commonsense lies in the data.

Do Neural Language Models Overcome Reporting Bias?

It is found that while pre-trained language models' generalization capacity allows them to better estimate the plausibility of frequent but unspoken of actions, outcomes, and properties, they also tend to overestimate that of the very rare, amplifying the bias that already exists in their training corpus.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

On Reality and the Limits of Language Data

The objective of this work is to explore how far can language data alone enable computers to understand the necessary truth about the physical world using a novel and tightly controlled reasoning test and to highlight what models might learn directly from pure linguistic data.

Meaning without reference in large language models

The widespread success of large language models (LLMs) has been met with skepticism that they possess anything like human concepts or meanings. Contrary to claims that LLMs possess no meaning

Learning Transferable Visual Models From Natural Language Supervision

It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.

Reporting bias and knowledge acquisition

This paper questions the idea that the frequency with which people write about actions, outcomes, or properties is a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

PaLM: Scaling Language Modeling with Pathways

A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.