What Artificial Neural Networks Can Tell Us About Human Language Acquisition
@article{Warstadt2022WhatAN, title={What Artificial Neural Networks Can Tell Us About Human Language Acquisition}, author={Alex Warstadt and Samuel R. Bowman}, journal={ArXiv}, year={2022}, volume={abs/2208.07998} }
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language. However, the learning environments and biases of current artificial learners and humans diverge in ways that weaken the impact of the evidence obtained from learning simulations. For example, today’s most effective neural language models are trained on roughly one thousand times the amount of linguistic data available to a typical child. To increase the…
8 Citations
Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training
- Computer Science, PsychologybioRxiv
- 2022
It is shown that models that are trained on developmentally plausible amounts of language data achieve near-maximal performance on human neural and behavioral benchmarks and that although some training is necessary for the models’ ability to predict human responses to language, a developmentally realistic amount of training may suffice.
Dissociating language and thought in large language models: a cognitive perspective
- PsychologyArXiv
- 2023
Short abstract (100 words): Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their capabilities remain split. Here, we evaluate…
A Solvable Model of Neural Scaling Laws
- Computer ScienceArXiv
- 2022
Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps and then translated into power-law scalings of the test loss and how the finite extent of the data’s spectral power law causes the model’'s performance to plateau.
Does Vision Accelerate Hierarchical Generalization of Neural Language Learners?
- Linguistics
- 2023
Neural language models (LMs) are arguably less data-efficient than humans— why does this gap occur? In this study, we hypothe-size that this gap stems from the learners’ accessibility to modalities…
A Discerning Several Thousand Judgments: GPT-3 Rates the Article + Adjective + Numeral + Noun Construction
- Linguistics
- 2023
Knowledge of syntax includes knowledge of rare, idiosyncratic constructions. LLMs must overcome frequency biases in order to master such constructions. In this study, I prompt GPT-3 to give…
Call for Papers - The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
- Computer ScienceArXiv
- 2023
A platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children, and a shared evaluation pipeline which scores models on a variety of benchmarks and tasks, in-cluding targeted syntactic evaluations and natural language understanding.
How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech
- PsychologyArXiv
- 2023
When acquiring syntax, children consistently 001 choose hierarchical rules over competing non- 002 hierarchical possibilities. Is this preference 003 due to a learning bias for hierarchical struc-…
A fine-grained comparison of pragmatic language understanding in humans and language models
- PsychologyArXiv
- 2022
Pragmatics is an essential part of communication, but it remains unclear what mechanisms underlie human pragmatic communication and whether NLP systems capture pragmatic language understanding. To…
References
SHOWING 1-10 OF 174 REFERENCES
One model for the learning of language
- Computer ScienceProceedings of the National Academy of Sciences
- 2022
It is shown that relatively small amounts of positive evidence can support learning of rich classes of generative computations over structures, and the model provides an idealized learning setup upon which additional cognitive constraints and biases can be formalized.
Infant artificial language learning and language acquisition
- PsychologyTrends in Cognitive Sciences
- 2000
Can neural networks acquire a structural bias from raw linguistic data?
- LinguisticsCogSci
- 2020
This work finds that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing, suggesting tentative evidence that some linguistic universals can be acquired by learners without innate biases.
Emergent linguistic structure in artificial neural networks trained by self-supervision
- Computer Science, LinguisticsProceedings of the National Academy of Sciences
- 2020
Methods for identifying linguistic hierarchical structure emergent in artificial neural networks are developed and it is shown that components in these models focus on syntactic grammatical relationships and anaphoric coreference, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists.
Word Acquisition in Neural Language Models
- PsychologyTACL
- 2022
It is found that the effects of concreteness, word length, and lexical class are pointedly different in children and language models, reinforcing the importance of interaction and sensorimotor experience in child language acquisition.
A Targeted Assessment of Incremental Processing in Neural Language Models and Humans
- PsychologyACL
- 2021
It is shown that models systematically under-predict the difference in magnitude of incremental processing difficulty between grammatical and ungrammatical sentences, which calls into question whether contemporary language models are approaching human-like performance for sensitivity to syntactic violations.
Human few-shot learning of compositional instructions
- EducationCogSci
- 2019
This work studies the compositional skills of people through language-like instruction learning tasks, showing that people can learn and use novel functional concepts from very few examples, and compose concepts in complex ways that go beyond the provided demonstrations.
BabyBERTa: Learning More Grammar With Small-Scale Child-Directed Language
- Linguistics, Computer ScienceCONLL
- 2021
It is found that a smaller version of RoBERTa-base that never predicts unmasked tokens, which is term BabyBERTa, acquires grammatical knowledge comparable to that of pre-trained RoBER Ta-base - and does so with approximately 15X fewer parameters and 6,000X fewer words.
What infants know about syntax but couldn't have learned: experimental evidence for syntactic structure at 18 months
- LinguisticsCognition
- 2003
Information-Theoretic Probing for Linguistic Structure
- Computer ScienceACL
- 2020
An information-theoretic operationalization of probing as estimating mutual information that contradicts received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation.