Unbiased look at dataset bias

@article{Torralba2011UnbiasedLA,
  title={Unbiased look at dataset bias},
  author={Antonio Torralba and Alexei A. Efros},
  journal={CVPR 2011},
  year={2011},
  pages={1521-1528}
}
Datasets are an integral part of contemporary object recognition research. [...] Key Result The experimental results, some rather surprising, suggest directions that can improve dataset collection as well as algorithm evaluation protocols. But more broadly, the hope is to stimulate discussion in the community regarding this very important, but largely neglected issue.Expand
Undoing the Damage of Dataset Bias
TLDR
Overall, this work finds that it is beneficial to explicitly account for bias when combining multiple datasets, and proposes a discriminative framework that directly exploits dataset bias during training. Expand
Comparison of Data Set Bias in Object Recognition Benchmarks
TLDR
The results show that all the tested data sets allowed classification accuracy higher than mere chance by using the small images, although the sub-images did not contain any visually interpretable information, which shows that the consistency of the images within the different classes of object recognition data sets can allow classifying the images even by algorithms that do not recognize objects. Expand
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models
TLDR
A highly automated platform that enables gathering datasets with controls at scale using automated tools throughout machine learning to generate datasets that exercise models in new ways thus providing valuable feedback to researchers is developed. Expand
Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias
TLDR
This work proposes Unbiased Metric Learning (UML), a metric learning approach that learns a set of less biased candidate distance metrics on training examples from multiple biased datasets, based on structural SVM. Expand
On Zero-Shot Recognition of Generic Objects
TLDR
It is shown that the actual classification accuracy of existing ZSL models is significantly higher than was previously thought as it is account for major structural flaws of the current benchmark and the notion of structural bias specific to ZSL datasets. Expand
A Survey on Bias in Visual Datasets
TLDR
There is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit, and a checklist that can be used to spot different types of bias during visual dataset collection is proposed. Expand
A Deeper Look at Dataset Bias
TLDR
This paper proposes to verify the potential of the DeCAF features when facing the dataset bias problem, and conducts a series of analyses looking at how existing datasets differ among each other and verifying the performance of existing debiasing methods under different representations. Expand
Properties of Datasets Predict the Performance of Classifiers
TLDR
Using data driven models, it is demonstrated that based on a few reference exemplars, these methods are able to detect novelties in ego-motions of people, and changes in the static environments surrounding them. Expand
RESOUND: Towards Action Recognition Without Representation Bias
TLDR
Experimental evaluation confirms the effectiveness of RESOUND to reduce the static biases of current datasets. Expand
The Pascal Visual Object Classes Challenge: A Retrospective
TLDR
A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition
TLDR
For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors. Expand
Caltech-256 Object Category Dataset
We introduce a challenging set of 256 object categories containing a total of 30607 images. The original Caltech-101 [1] was collected by choosing a set of object categories, downloading examplesExpand
Recognition by association via learning per-exemplar distances
TLDR
This work uses the distance functions to detect and segment objects in novel images by associating the bottom-up segments obtained from multiple image segmentations with the exemplar regions and learns separate distance functions for each exemplar. Expand
Dataset Issues in Object Recognition
TLDR
Current datasets are lacking in several respects, and this paper discusses some of the lessons learned from existing efforts, as well as innovative ways to obtain very large and diverse annotated datasets. Expand
SUN database: Large-scale scene recognition from abbey to zoo
TLDR
This paper proposes the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images and uses 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. Expand
The Pascal Visual Object Classes (VOC) Challenge
TLDR
The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse. Expand
Pedestrian detection: A benchmark
TLDR
The Caltech Pedestrian Dataset is introduced, which is two orders of magnitude larger than existing datasets and proposes improved evaluation metrics, demonstrating that commonly used per-window measures are flawed and can fail to predict performance on full images. Expand
LabelMe: A Database and Web-Based Tool for Image Annotation
TLDR
A web-based tool that allows easy image annotation and instant sharing of such annotations is developed and a large dataset that spans many object categories, often containing multiple instances over a wide variety of images is collected. Expand
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories
TLDR
The incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum-likelihood, which have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible. Expand
Distortion-invariant recognition via jittered queries
  • M. Burl
  • Computer Science
  • Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662)
  • 2000
TLDR
This paper presents a new approach for achieving distortion-invariant recognition and classification, where instead of querying with a single pattern, a more robust query is constructed, based on the family of patterns formed by distorting the test example. Expand
...
1
2
3
...