Five points to check when comparing visual perception in humans and machines

  title={Five points to check when comparing visual perception in humans and machines},
  author={Christina M. Funke and Judy Borowski and Karolina Stosio and Wieland Brendel and Thomas S. A. Wallis and Matthias Bethge},
  journal={Journal of Vision},
With the rise of machines to human-level performance in complex recognition tasks, a growing amount of work is directed toward comparing information processing in humans and machines. These studies are an exciting chance to learn about one system by studying the other. Here, we propose ideas on how to design, conduct, and interpret experiments such that they adequately support the investigation of mechanisms when comparing human and machine perception. We demonstrate and apply these ideas… 

Figures and Tables from this paper

Deep Neural Network Models of Object Recognition Exhibit Human-Like Limitations when Performing Visual Search Tasks

This work shows that DNNs exhibit a hallmark effect seen when participants search simplified stimuli, and tests DNN models of object recognition with natural images, finding DNN accuracy is inversely correlated with visual search difficulty score.

Can Deep Convolutional Neural Networks Learn Same-Different Relations?

In a series of simulations, it is shown that DCNNs are capable of visual same-different classification, but only when the test images are similar to the training images at the pixel-level, and even when there are only subtle differences between the testing and training images, the performance ofDCNNs could drop to chance levels.

Abstraction and analogy‐making in artificial intelligence

  • M. Mitchell
  • Computer Science
    Annals of the New York Academy of Sciences
  • 2021
The advantages and limitations of several approaches toward forming humanlike abstractions or analogies, including symbolic methods, deep learning, and probabilistic program induction are reviewed.

Toward Building Science Discovery Machines

It is argued that in order to build science discovery machines and speed up the scientific discovery process, theoretical and computational frameworks that encapsulate the main principles of science discovery should be built.

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding

This work presents Few-Shot object detection via Contrastive proposals Encoding (FSCE), a simple yet effective approach to learning contrastive-aware object proposal encodings that facilitate the classification of detected objects.

Finding Biological Plausibility for Adversarially Robust Features via Metameric Tasks

It is found that the discriminability of robust representation and texture model images decreased to near chance performance as stimuli were presented farther in the periphery, which supports the idea that localized texture summary statis- 25 tic representations may drive human invariance to adversarial noise.

Can deep convolutional neural networks support relational reasoning in the same-different task?

In a series of simulations, it is shown that models based on the ResNet-50 architecture are capable of visual same-different classification, but only when the test images are similar to the training images at the pixel-level, and that the Relation Network, a deep learning architecture specifically designed to tackle visual relational reasoning problems, suffers the same kind of limitations than Res net-50 classifiers.

Visual Representation Learning Does Not Generalize Strongly Within the Same Domain

This paper test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets and observe that all of them struggle to learn the underlying mechanism regardless of supervision signal and architectural bias.

Artificial Psychophysics questions Hue Cancellation Experiments

The results suggest that the opponent curves of the standard hue cancellation experiment are just a by-product of the front-end photoreceptors and of a very specific experimental choice but they do not inform about the downstream color representation.

Inconsistent illusory motion in predictive coding deep neural networks



Comparing machines and humans on a visual categorization test

This work compares the efficiency of human and machine learning in assigning an image to one of two categories determined by the spatial arrangement of constituent parts and demonstrates that human subjects grasp the separating principles from a handful of examples, whereas the error rates of computer programs fluctuate wildly and remain far behind that of humans even after exposure to thousands of examples.

How Deep is the Feature Analysis underlying Rapid Visual Categorization?

It is found that recognition accuracy increases with higher stages of visual processing but that human decisions agree best with predictions from intermediate stages, and that the complexity of visual representations afforded by modern deep network models may exceed those used by human participants during rapid categorization.

Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency

A central problem in cognitive science and behavioural neuroscience as well as in machine learning and artificial intelligence research is to ascertain whether two or more decision makers (e.g.

Do Neural Networks Show Gestalt Phenomena? An Exploration of the Law of Closure

The findings suggest that NNs trained with natural images do exhibit closure, in contrast to networks with randomized weights or networks that have been trained on visually random data.

How intelligent are convolutional neural networks?

An "Aha Challenge" for visual perception is proposed, calling for focused and quantitative research on Gestalt-style machine intelligence using limited training examples for deep learning algorithm's ability to infer simple visual concepts from examples.

Atoms of recognition in human and computer vision

This work shows by combining a novel method (minimal images) and simulations that the human recognition system uses features and learning processes, which are critical for recognition, but are not used by current models.

Global contour processing in amblyopia

Deep Neural Networks as a Computational Model for Human Shape Sensitivity

It is demonstrated that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs, and indicates that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes.

Convolutional Neural Networks Can Be Deceived by Visual Illusions

It is shown that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size.

Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study

This work proposes to address the interpretability problem in modern DNNs using the rich history of problem descriptions, theories and experimental methods developed by cognitive psychologists to study the human mind, and demonstrates the capability of tools from cognitive psychology for exposing hidden computational properties of DNN's while concurrently providing us with a computational model for human word learning.