What Artificial Neural Networks Can Tell Us About Human Language Acquisition

  title={What Artificial Neural Networks Can Tell Us About Human Language Acquisition},
  author={Alex Warstadt and Samuel R. Bowman},
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language. However, the learning environments and biases of current artificial learners and humans diverge in ways that weaken the impact of the evidence obtained from learning simulations. For example, today’s most effective neural language models are trained on roughly one thousand times the amount of linguistic data available to a typical child. To increase the… 

Figures and Tables from this paper

Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training

It is shown that models that are trained on developmentally plausible amounts of language data achieve near-maximal performance on human neural and behavioral benchmarks and that although some training is necessary for the models’ ability to predict human responses to language, a developmentally realistic amount of training may suffice.

Dissociating language and thought in large language models: a cognitive perspective

Short abstract (100 words): Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their capabilities remain split. Here, we evaluate

A Solvable Model of Neural Scaling Laws

Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps and then translated into power-law scalings of the test loss and how the finite extent of the data’s spectral power law causes the model’'s performance to plateau.

Does Vision Accelerate Hierarchical Generalization of Neural Language Learners?

Neural language models (LMs) are arguably less data-efficient than humans— why does this gap occur? In this study, we hypothe-size that this gap stems from the learners’ accessibility to modalities

A Discerning Several Thousand Judgments: GPT-3 Rates the Article + Adjective + Numeral + Noun Construction

Knowledge of syntax includes knowledge of rare, idiosyncratic constructions. LLMs must overcome frequency biases in order to master such constructions. In this study, I prompt GPT-3 to give

Call for Papers - The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

A platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children, and a shared evaluation pipeline which scores models on a variety of benchmarks and tasks, in-cluding targeted syntactic evaluations and natural language understanding.

How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech

When acquiring syntax, children consistently 001 choose hierarchical rules over competing non- 002 hierarchical possibilities. Is this preference 003 due to a learning bias for hierarchical struc-

A fine-grained comparison of pragmatic language understanding in humans and language models

Pragmatics is an essential part of communication, but it remains unclear what mechanisms underlie human pragmatic communication and whether NLP systems capture pragmatic language understanding. To



One model for the learning of language

It is shown that relatively small amounts of positive evidence can support learning of rich classes of generative computations over structures, and the model provides an idealized learning setup upon which additional cognitive constraints and biases can be formalized.

Infant artificial language learning and language acquisition

Can neural networks acquire a structural bias from raw linguistic data?

This work finds that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing, suggesting tentative evidence that some linguistic universals can be acquired by learners without innate biases.

Emergent linguistic structure in artificial neural networks trained by self-supervision

Methods for identifying linguistic hierarchical structure emergent in artificial neural networks are developed and it is shown that components in these models focus on syntactic grammatical relationships and anaphoric coreference, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists.

Word Acquisition in Neural Language Models

It is found that the effects of concreteness, word length, and lexical class are pointedly different in children and language models, reinforcing the importance of interaction and sensorimotor experience in child language acquisition.

A Targeted Assessment of Incremental Processing in Neural Language Models and Humans

It is shown that models systematically under-predict the difference in magnitude of incremental processing difficulty between grammatical and ungrammatical sentences, which calls into question whether contemporary language models are approaching human-like performance for sensitivity to syntactic violations.

Human few-shot learning of compositional instructions

This work studies the compositional skills of people through language-like instruction learning tasks, showing that people can learn and use novel functional concepts from very few examples, and compose concepts in complex ways that go beyond the provided demonstrations.

BabyBERTa: Learning More Grammar With Small-Scale Child-Directed Language

It is found that a smaller version of RoBERTa-base that never predicts unmasked tokens, which is term BabyBERTa, acquires grammatical knowledge comparable to that of pre-trained RoBER Ta-base - and does so with approximately 15X fewer parameters and 6,000X fewer words.

Information-Theoretic Probing for Linguistic Structure

An information-theoretic operationalization of probing as estimating mutual information that contradicts received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation.