Do Deep Neural Networks Suffer from Crowding?

@inproceedings{Volokitin2017DoDN,
  title={Do Deep Neural Networks Suffer from Crowding?},
  author={Anna Volokitin and G. Roig and T. Poggio},
  booktitle={NIPS},
  year={2017}
}
Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks (DNNs) for object recognition. We analyze both deep convolutional neural networks (DCNNs) as well as an extension of DCNNs that are multi-scale and that change the receptive field size of the convolution filters with their… Expand
Crowding in humans is unlike that in convolutional neural networks
TLDR
Data show that DCNNs, while proficient in object recognition, likely achieve this competence through a set of mechanisms that are distinct from those in humans, and caution must be exercised when inferring mechanisms derived from their operation. Expand
Object Recognition in Deep Convolutional Neural Networks is Fundamentally Different to That in Humans
TLDR
DCNNs, while proficient in object recognition, likely achieve this competence through a set of mechanisms that are distinct from those in humans, and are not equivalent models of human or primate object recognition and caution must be exercised when inferring mechanisms derived from their operation. Expand
Crowding Reveals Fundamental Differences in Local vs. Global Processing in Humans and Machines
TLDR
This work uses visual crowding as a well-controlled, specific probe to test global shape computations of ffCNN and provides evidence that ffCNNs cannot produce human-like global shape computation for principled architectural reasons. Expand
Crowding reveals fundamental differences in local vs. global processing in humans and machines
TLDR
This work uses visual crowding as a well-controlled, specific probe to test global shape computations of ffCNN and provides evidence that ffCNNs cannot produce human-like global shape computation for principled architectural reasons. Expand
The Notorious Difficulty of Comparing Human and Machine Perception
TLDR
It is shown that, despite their ability to solve closed-contour tasks, the authors' neural networks use different decision-making strategies than humans, and that neural networks do experience a "recognition gap" on minimal recognizable images. Expand
Learning long-range spatial dependencies with horizontal gated-recurrent units
TLDR
This work introduces the horizontal gated-recurrent unit (hGRU) to learn intrinsic horizontal connections -- both within and across feature columns, and demonstrates that a single hGRU layer matches or outperforms all tested feedforward hierarchical baselines including state-of-the-art architectures which have orders of magnitude more free parameters. Expand
Scale and translation-invariance for novel objects in human vision
TLDR
The results suggest that to explain invariant recognition of objects by humans, neural network models should explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representations captured by neurons’ receptive field sizes and sampling density that change with eccentricity. Expand
Biologically Inspired Mechanisms for Adversarial Robustness
TLDR
It is demonstrated that the non-uniform sampling performed by the primate retina and the presence of multiple receptive fields with a range of receptive field sizes at each eccentricity improve the robustness of neural networks to small adversarial perturbations. Expand
Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents
TLDR
This study leads to the surprising conclusion that UNREAL learns more quickly about larger target stimuli than it does about smaller stimuli, and motivates a specific improvement in the form of a simple model of foveal vision that turns out to significantly boostUNREAL's performance, both on Psychlab tasks, and on standard DeepMind Lab tasks. Expand
The Foes of Neural Network's Data Efficiency Among Unnecessary Input Dimensions
TLDR
This letter investigates the impact of unnecessary input dimensions on a central issue of DNNs: their data efficiency, ie. Expand
...
1
2
...

References

SHOWING 1-10 OF 28 REFERENCES
Crowding—An essential bottleneck for object recognition: A mini-review
  • D. Levi
  • Medicine, Psychology
  • Vision Research
  • 2008
TLDR
The goal of this review is to provide a broad, balanced and succinct review that organizes and summarizes the diverse and scattered studies of crowding, and helps to explain it to the non-specialist. Expand
Recurrent Models of Visual Attention
TLDR
A novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution is presented. Expand
Pooling of continuous features provides a unifying account of crowding
TLDR
The main effects of three studies from the crowding literature are consistent with the predictions of the Texture Tiling Model, suggesting that many of the stimulus-specific curiosities surrounding crowding are the inherent result of the informativeness of a rich set of image statistics for the particular tasks. Expand
Foveation-based Mechanisms Alleviate Adversarial Examples
TLDR
It is shown that adversarial examples, i.e., the visually imperceptible perturbations that result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying the CNN in different image regions, and corroborate that when the neural responses are linear, applying the foveation mechanism to the adversarial example tends to significantly reduce the effect of the perturbation. Expand
A summary-statistic representation in peripheral vision explains visual crowding.
TLDR
It is shown that the difficulty of performing an identification task within a single pooling region using this representation of the stimuli is correlated with peripheral identification performance under conditions of crowding, and provides evidence that a unified neuronal mechanism may underlie peripheral vision, ordinary pattern recognition in central vision, and texture perception. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
Eccentricity Dependent Deep Neural Networks: Modeling Invariance in Human Vision
TLDR
To this knowledge, this work is the first to unify explanations of all three types of invariance, all while leveraging the power and neurological grounding of CNNs. Expand
Visual crowding: a fundamental limit on conscious perception and object recognition
TLDR
The goal of this review is to provide a broad-based synthesis of the most recent findings in crowding, to define what crowding is and is not, and to set the stage for future work that will extend the understanding of crowding well beyond low-level vision. Expand
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features. Expand
...
1
2
3
...