• Corpus ID: 238634169

Trivial or impossible - dichotomous data difficulty masks model differences (on ImageNet and beyond)

  title={Trivial or impossible - dichotomous data difficulty masks model differences (on ImageNet and beyond)},
  author={Kristof Meding and Luca M. Schulze Buschoff and Robert Geirhos and Felix Wichmann},
“The power of a generalization system follows directly from its biases” (Mitchell 1980). Today, CNNs are incredibly powerful generalisation systems—but to what degree have we understood how their inductive bias influences model decisions? We here attempt to disentangle the various aspects that determine how a model decides. In particular, we ask: what makes one model decide differently from another? In a meticulously controlled setting, we find that (1.) irrespective of the network architecture… 
Partial success in closing the gap between human and machine vision
The longstanding distortion robustness gap between humans and CNNs is closing, with the best models now exceeding human feedforward performance on most of the investigated OOD datasets, and the behavioural difference between human and machine vision is narrowing.


Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
A high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain is introduced, and behaves similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts.
On the surprising similarities between supervised and self-supervised models
Surprisingly, current self-supervised CNNs share four key characteristics of their supervised counterparts: relatively poor noise robustness,Non-human category-level error patterns, non-human image-levelerror patterns, high similarity to supervised model errors and a bias towards texture.
Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy
This paper examines ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods, and considers three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology.
Learning Transferable Visual Models From Natural Language Supervision
It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.
Do ImageNet Classifiers Generalize to ImageNet?
The results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
From ImageNet to Image Classification: Contextualizing Progress on Benchmarks
This work uses human studies to investigate the consequences of employing a noisy data collection pipeline and study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit.
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
Surprisingly, it is found that lower capacity models may be practically more useful than higher capacity models in real-world datasets with high proportions of erroneously labeled data.
Individual differences among deep neural network models
Individual differences among DNN instances that arise from varying only the random initialization of the network weights are investigated, demonstrating that this minimal change in initial conditions prior to training leads to substantial differences in intermediate and higher-level network representations, despite achieving indistinguishable network-level classification performance.
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
It is shown that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies.