• Corpus ID: 237091588

On the Opportunities and Risks of Foundation Models

  title={On the Opportunities and Risks of Foundation Models},
  author={Rishi Bommasani and Drew A. Hudson and Ehsan Adeli and Russ Altman and Simran Arora and Sydney von Arx and Michael S. Bernstein and Jeannette Bohg and Antoine Bosselut and Emma Brunskill and Erik Brynjolfsson and S. Buch and Dallas Card and Rodrigo Castellon and Niladri S. Chatterji and Annie S. Chen and Kathleen A. Creel and Jared Davis and Dora Demszky and Chris Donahue and Moussa Doumbouya and Esin Durmus and Stefano Ermon and John Etchemendy and Kawin Ethayarajh and Li Fei-Fei and Chelsea Finn and Trevor Gale and Lauren E. Gillespie and Karan Goel and Noah D. Goodman and Shelby Grossman and Neel Guha and Tatsunori Hashimoto and Peter Henderson and John Hewitt and Daniel E. Ho and Jenny Hong and Kyle Hsu and Jing Huang and Thomas F. Icard and Saahil Jain and Dan Jurafsky and Pratyusha Kalluri and Siddharth Karamcheti and Geoff Keeling and Fereshte Khani and O. Khattab and Pang Wei Koh and Mark S. Krass and Ranjay Krishna and Rohith Kuditipudi and Ananya Kumar and Faisal Ladhak and Mina Lee and Tony Lee and Jure Leskovec and Isabelle Levent and Xiang Lisa Li and Xuechen Li and Tengyu Ma and Ali Malik and Christopher D. Manning and Suvir P. Mirchandani and Eric Mitchell and Zanele Munyikwa and Suraj Nair and Avanika Narayan and Deepak Narayanan and Benjamin Newman and Allen Nie and Juan Carlos Niebles and Hamed Nilforoshan and J. F. Nyarko and Giray Ogut and Laurel J. Orr and Isabel Papadimitriou and Joon Sung Park and Chris Piech and Eva Portelance and Christopher Potts and Aditi Raghunathan and Robert Reich and Hongyu Ren and Frieda Rong and Yusuf H. Roohani and Camilo Ruiz and Jack Ryan and Christopher R'e and Dorsa Sadigh and Shiori Sagawa and Keshav Santhanam and Andy Shih and Krishna Parasuram Srinivasan and Alex Tamkin and Rohan Taori and Armin W. Thomas and Florian Tram{\`e}r and Rose E. Wang and William Wang and Bohan Wu and Jiajun Wu and Yuhuai Wu and Sang Michael Xie and Michihiro Yasunaga and Jiaxuan You and Matei A. Zaharia and Michael Zhang and Tianyi Zhang and Xikun Zhang and Yuhui Zhang and Lucia Zheng and Kaitlyn Zhou and Percy Liang},
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles… 

SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data

Recent success in fine-tuning large models, that are pretrained on broad data at scale, on downstream tasks has led to a significant paradigm shift in deep learning, from task-centric model design to

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

The proposed Language-Interfaced Fine-Tuning (LIFT) does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling “no-code machine learning with LMs,” and performs relatively well across a wide range of lowdimensional classification and regression tasks.

What do tokens know about their characters and how do they know it?

The mechanisms through which PLMs acquire English-language character information during training are investigated and it is argued that this knowledge is acquired through multiple phenomena, including a systematic relationship between particular characters and particular parts of speech, as well as natural variability in the tokenization of related strings.

Should attention be all we need? The epistemic and ethical implications of unification in machine learning

It is argued that many of the arguments in favor of unification in the natural sciences fail to transfer over to the machine learning case, or transfer over only under assumptions that might not hold.

White-box Testing of NLP models with Mask Neuron Coverage

A set of white-box testing methods that are customized for transformer-based NLP models are proposed, including MASK NEU RON COVERAGE ( MN C OVER ) that measures how thoroughly the attention layers in models are exercised during testing.

Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

This work explores alternate and more general forms of goal specification that are expected to be easier for humans to specify and use such as images obtained from the internet, hand sketches that provide a visual description of the desired task, or simple language descriptions.

Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference

It is shown that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks such as Mini-ImageNet, CIFAR-FS, CDFSL and Meta-Dataset.

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

This work shows that model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue – in which new multimodal tasks are formulated as a guided language- based exchange between different pre-existing foundation models, without additional language-based exchange.

Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers

This work provides a language- based interface for human-robot collaboration, which allows a user to reshape existing trajectories for an autonomous agent using multi-modal attention transformers.

The Worst of Both Worlds: A Comparative Analysis of Errors in Learning from Data in Psychology and Machine Learning

Risks that arise when sources of errors are misdiagnosed and the need to acknowledge the role of human inductive biases in learning and reform are discussed, including overreliance on asymptotic theory and non-credible beliefs about real-world data generating processes.