Multimodal foundation models are better simulators of the human brain
@article{Lu2022MultimodalFM, title={Multimodal foundation models are better simulators of the human brain}, author={Haoyu Lu and Qiongyi Zhou and Nanyi Fei and Zhiwu Lu and Mingyu Ding and Jingyuan Wen and Changde Du and Xin Zhao and Haoran Sun and Huiguang He and J. Wen}, journal={ArXiv}, year={2022}, volume={abs/2208.08263} }
— Multimodal learning, especially large-scale multimodal pre-training, has developed rapidly over the past few years and led to the greatest advances in artificial intelligence (AI). Despite its effectiveness, understanding the underlying mechanism of multimodal pre-training models still remains a grand challenge. Revealing the explainability of such models is likely to enable breakthroughs of novel learning paradigms in the AI field. To this end, given the multimodal nature of the human brain, we…
One Citation
When Abstract Becomes Concrete: Naturalistic Encoding of Concepts in the Brain
- PsychologybioRxiv
- 2022
Language is acquired and processed in complex and dynamic naturalistic contexts, involving simultaneous processing of connected speech, faces, bodies, objects, etc. How words and their associated…
References
SHOWING 1-10 OF 89 REFERENCES
Cortical response to naturalistic stimuli is largely predictable with deep neural networks
- Biology, PsychologyScience Advances
- 2021
This work builds group-level models of neural activity that incorporate several inductive biases about neural information processing, including hierarchical processing, temporal assimilation, and auditory-visual interactions, and illustrates that encoding models learn high-level concepts that generalize to task-bound paradigms.
What can 5 . 17 billion regression fits tell us about artificial models of the human visual system ?
- Psychology, Biology
- 2021
A large-scale benchmarking analysis of 72 modern deep neural network models is performed to characterize with robust statistical power how differences in architecture and training task contribute to the prediction of human fMRI activity across 16 distinct regions of the human visual system.
Towards artificial general intelligence via a multimodal foundation model
- Computer ScienceNature communications
- 2022
This work develops a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks, and demonstrates that strong imagination ability is now possessed by the foundation model.
Unsupervised neural network models of the ventral visual stream
- Computer Science, BiologyProceedings of the National Academy of Sciences
- 2021
Neural network models learned with deep unsupervised contrastive embedding methods achieve neural prediction accuracy in multiple ventral visual cortical areas that equals or exceeds that of models derived using today’s best supervised methods and that the mapping of these neural network models’ hidden layers is neuroanatomically consistent across the ventral stream.
Using goal-driven deep learning models to understand sensory cortex
- BiologyNature Neuroscience
- 2016
It is outlined how the goal-driven HCNN approach can be used to delve even more deeply into understanding the development and organization of sensory cortical processing.
Limits to visual representational correspondence between convolutional neural networks and the human brain
- Psychology, Computer ScienceNature communications
- 2021
It is shown that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations, indicating some fundamental differences exist in how the brain and CNNs represent visual information.
Multisensory interactions in primate auditory cortex: fMRI and electrophysiology
- BiologyHearing Research
- 2009
Multisensory integration: methodological approaches and emerging principles in the human brain
- PsychologyJournal of Physiology-Paris
- 2004
The multisensory function of the human primary visual cortex
- Biology, PsychologyNeuropsychologia
- 2016
Incorporating Context into Language Encoding Models for fMRI
- Computer Science, PsychologybioRxiv
- 2018
The models built here show a significant improvement in encoding performance relative to state-of-the-art embeddings in nearly every brain area and suggest that LSTM language models learn high-level representations that are related to representations in the human brain.