• Corpus ID: 229153786

Mapping the Timescale Organization of Neural Language Models

  title={Mapping the Timescale Organization of Neural Language Models},
  author={Hsiang-Yun Sherry Chien and Jinhang Zhang and Christopher John Honey},
In the human brain, sequences of language input are processed within a distributed and hierarchical architecture, in which higher stages of processing encode contextual information over longer timescales. In contrast, in recurrent neural networks which perform natural language processing, we know little about how the multiple timescales of contextual information are functionally organized. Therefore, we applied tools developed in neuroscience to map the "processing timescales" of individual… 


Constructing and Forgetting Temporal Context in the Human Cerebral Cortex
It is found that when two groups of participants heard the same sentence in a narrative, preceded by different contexts, the neural responses of each group were initially different, but gradually fell into alignment.
What Limits Our Capacity to Process Nested Long-Range Dependencies in Sentence Comprehension?
An alternative approach is introduced, derived from the recent work on artificial neural networks optimized for language modeling, and it is predicted that capacity limitation derives from the emergence of sparse and feature-specific syntactic units.
Hierarchical process memory: memory as an integral component of information processing
Incorporating Context into Language Encoding Models for fMRI
The models built here show a significant improvement in encoding performance relative to state-of-the-art embeddings in nearly every brain area and suggest that LSTM language models learn high-level representations that are related to representations in the human brain.
The emergence of number and syntax units in LSTM language models
It is found that long-distance number information is largely managed by two “number units” and the behaviour of these units is partially controlled by other units independently shown to track syntactic structure, paving the way to a more general understanding of grammatical encoding in LSTMs.
Slow Cortical Dynamics and the Accumulation of Information over Long Timescales
Topographic Mapping of a Hierarchy of Temporal Receptive Windows Using a Narrated Story
The results suggest that the time scale of processing is a functional property that may provide a general organizing principle for the human cerebral cortex.
Colorless green recurrent networks dream hierarchically
Support is brought to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence by making reliable predictions about long-distance agreement and do not lag much behind human performance.
Distinct timescales of population coding across cortex
It is shown that population codes can be essential to achieve long coding timescales and that coupling is a variable property of cortical populations that affects the timescale of information coding and the accuracy of behaviour.