Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness

  title={Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness},
  author={Paul Pu Liang},
Having a rich multimodal inner language is an important component of human intelligence that enables several necessary core cognitive functions such as multimodal prediction, translation, and generation. Building upon the Conscious Turing Machine (CTM), a machine model for consciousness proposed by Blum and Blum [13], we describe the desiderata of a multimodal language called B RAIN ISH , comprising words, images, audio, and sensations combined in representations that the CTM’s processors use… 

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

This paper proposes a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends, and defines two key principles of modality heterogeneity and interconnections that have driven subsequent innovations.

A theory of consciousness from a theoretical computer science perspective: Insights from the Conscious Turing Machine

  • L. BlumM. Blum
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2022
Evidence that a theoretical computer science (TCS) perspective can add to the understanding of consciousness by providing a simple framework for employing tools from computational complexity theory and machine learning is provided.



The Consciousness Prior

A new prior is proposed for learning representations of high-level concepts of the kind the authors manipulate with language, inspired by cognitive neuroscience theories of consciousness, that makes it natural to map conscious states to natural language utterances or to express classical AI knowledge in a form similar to facts and rules.

A multimodal parallel architecture: A cognitive framework for multimodal interactions

Foundations of Multimodal Co-learning

Gated-Attention Architectures for Task-Oriented Language Grounding

An end-to-end trainable neural architecture for task-oriented language grounding in 3D environments which assumes no prior linguistic or perceptual knowledge and requires only raw pixels from the environment and the natural language instruction as input.

Artificial Intelligence and Consciousness

Consciousness is no longer a threatening notion in the community of AI. In human beings, consciousness corresponds to a collection of different features of human cognition. AI researchers are

Origins of theory of mind, cognition and communication.

  • A. Meltzoff
  • Psychology
    Journal of communication disorders
  • 1999

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

This work introduces the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder, and shows that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic.

The human imagination: the cognitive neuroscience of visual mental imagery

  • J. Pearson
  • Psychology, Biology
    Nature Reviews Neuroscience
  • 2019
Recent insights into the neural mechanisms that underlie visual imagery are discussed, how imagery can be objectively and reliably measured, and how it affects general cognition are discussed.

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

This work proposes a neural-symbolic visual question answering system that first recovers a structural scene representation from the image and a program trace from the question, then executes the program on the scene representation to obtain an answer.

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a