Invariant Recognition Shapes Neural Representations of Visual Input.

  title={Invariant Recognition Shapes Neural Representations of Visual Input.},
  author={Andrea Tacchetti and Leyla Isik and Tomaso A. Poggio},
  journal={Annual review of vision science},
Recognizing the people, objects, and actions in the world around us is a crucial aspect of human perception that allows us to plan and act in our environment. Remarkably, our proficiency in recognizing semantic categories from visual input is unhindered by transformations that substantially alter their appearance (e.g., changes in lighting or position). The ability to generalize across these complex transformations is a hallmark of human visual intelligence, which has been the focus of wide… 

Figures from this paper

Generative Models for Active Vision

An overview of the generative models that the brain must employ to engage in active vision is provided, specifying the processes that explain retinal cell activity and proprioceptive information from oculomotor muscle fibers.

Convolutional neural networks do not develop brain-like transformation tolerant visual representations

Using fMRI pattern analysis, it is shown that high representational consistency across position and size changes indeed exists in human higher visual regions and is lower in early visual areas and increases as information ascends the ventral visual processing pathway.

Objects seen as scenes: Neural circuitry for attending whole or parts

Feature blindness: A challenge for understanding and modelling visual object recognition

While learning in CNNs is driven by the statistical properties of the environment, humans are highly constrained by their previous biases, which suggests that cognitive constraints play a key role in how humans learn to recognise novel objects.

Ramp-shaped neural tuning supports graded population-level representation of the object-to-scene continuum

The results together suggest that depicted spatial scale is coded parametrically in large-scale population codes across the entire ventral occipito-temporal cortex.

Learning robust visual representations using data augmentation invariance

The results show that the proposed data augmentation invariance approach is a simple, yet effective and efficient (10 % increase in training time) way of increasing the invariance of the models while obtaining similar categorization performance.

Stable readout of observed actions from format-dependent activity of monkey’s anterior intraparietal neurons

It is proposed that by integrating signals multiplicatively about others’ action and their visual format, the AIP can provide a stable readout of OMA identity at the population level, and is found no fully invariant OMA-selective neuron is found.

Learning From Brains How to Regularize Machines

This work denoised the notoriously variable neural activity using strong predictive models trained on this large corpus of responses from the mouse visual system, and used the neural representation similarity to regularize CNNs trained on image classification by penalizing intermediate representations that deviated from neural ones.

Position- and scale-invariant object-centered spatial selectivity in monkey frontoparietal cortex dynamically adapts to task demand

It is shown that neurons in the same areas can encode object-centered allocentric spatial information, independent of object location and object size, or egocentric information, depending on dynamically changing task demands.

Trading robust representations for sample complexity through self-supervised visual experience

The results suggest that equivalence sets other than class labels, which are abundant in unlabeled visual experience, can be used for self-supervised learning of semantically relevant image embeddings.



Invariant recognition drives neural representations of action sequences

It is shown that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings.

The dynamics of invariant object recognition in the human visual system.

The human visual system can rapidly recognize objects despite transformations that alter their appearance. The precise timing of when the brain computes neural representations that are invariant to

Untangling invariant object recognition

A feedforward architecture accounts for rapid categorization

It is shown that a specific implementation of a class of feedforward theories of object recognition (that extend the Hubel and Wiesel simple-to-complex cell hierarchy and account for many anatomical and physiological constraints) can predict the level and the pattern of performance achieved by humans on a rapid masked animal vs. non-animal categorization task.

A fast, invariant representation for human action in the visual system.

It is found that within 200 ms action can be read out of magnetoencephalography data and that this representation is invariant to changes in viewpoint, suggesting that the brain quickly integrates complex spatiotemporal features to form invariant action representations.

Metamers of the ventral stream

A population model for mid-ventral processing is developed, in which nonlinear combinations of V1 responses are averaged in receptive fields that grow with eccentricity, providing a quantitative framework for assessing the capabilities and limitations of everyday vision.

Representational dynamics of object vision: the first 1000 ms.

The stationarity of patterns of activity in the brain that encode object category information and show these patterns vary over time are examined, suggesting the brain might use flexible time varying codes to represent visual object categories.

The invariance hypothesis implies domain-specific regions in visual cortex

This work can define an index of transformation-compatibility, computable from videos, that can be combined with information about the statistics of natural vision to yield predictions for which object categories ought to have domain-specific regions.

The fusiform face area: a cortical region specialized for the perception of faces

  • N. KanwisherG. Yovel
  • Psychology
    Philosophical Transactions of the Royal Society B: Biological Sciences
  • 2006
It is argued that the F FA is engaged both in detecting faces and in extracting the necessary perceptual information to recognize them, and that the properties of the FFA mirror previously identified behavioural signatures of face-specific processing.