Generating Diverse Image Datasets with Limited Labeling

Abstract

Image datasets play a pivotal role in advancing multimedia and image analysis research. However, most of these datasets are created by extensive human effort and extremely expensive to scale up. There is high chance that we may have no instances for some required concepts in these data-sets or the available instances do not cover the diversity of real-world scenarios. In this regard, several approaches for learning from web images and refining them have been proposed, but these approaches either include significant redundant instances in the dataset or fail to guarantee a diverse enough set to train a robust classifier. In this work, we propose a semi-supervised sparse coding framework to collect a diverse set of images with minimal human effort, which can be used to both create a dataset from scratch or enrich an existing dataset with diverse examples. To evaluate our method, we constructed an image dataset with our framework, which is named as DivNet. Experiments on this dataset demonstrate that our method not only reduces manual effort, but also the created dataset has excellent accuracy, diversity and cross-dataset generalization ability.

DOI: 10.1145/2964284.2967285

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Mithun2016GeneratingDI, title={Generating Diverse Image Datasets with Limited Labeling}, author={Niluthpol Chowdhury Mithun and Rameswar Panda and Amit K. Roy-Chowdhury}, booktitle={ACM Multimedia}, year={2016} }