Corpus ID: 235731551

Exploring Data Pipelines through the Process Lens: a Reference Model forComputer Vision

  title={Exploring Data Pipelines through the Process Lens: a Reference Model forComputer Vision},
  author={Agathe Balayn and Bogdan Kulynych and Seda F. Guerses},
Researchers have identified datasets used for training computer vision (CV) models as an important source of hazardous outcomes, and continue to examine popular CV datasets to expose their harms. These works tend to treat datasets as objects, or focus on particular steps in data production pipelines. We argue here that we could further systematize our analysis of harms by examining CV data pipelines through a process-oriented lens that captures the creation, the evolution and use of these… Expand

Figures from this paper


Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy
This paper examines ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods, and considers three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology. Expand
On the data set’s ruins
The double dismissal of the role played by the workers and the agency of the photographic apparatus in the elaboration of computer vision foreground the inherent fragility of the edifice of machine vision and a necessary rethinking of its scale. Expand
The Focus–Aspect–Value model for predicting subjective visual attributes
The Focus–Aspect–Value model is explained to break down the process of subjective image interpretation into three steps and a dataset following this way of modeling is described and Tensor Fusion is among the best performing methods across all measures and outperforms the default way of information fusion (concatenation). Expand
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of sceneExpand
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets
The ways in which benchmark datasets in machine learning operate as infrastructure and four research questions for these datasets are described and described. Expand
Reconfiguring the Imaging Pipeline for Computer Vision
This work examines the role of the image signal processing (ISP) pipeline in computer vision to identify opportunities to reduce computation and save energy, and proposes a new image sensor design that can compensate for these stages. Expand
Transfer learning in computer vision tasks: Remember where you come from
The experimental protocol put forward the versatility of a regularizer that is easy to implement and to operate that is eventually recommend as the new baseline for future approaches to transfer learning relying on fine-tuning. Expand
No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World
It is suggested that examining the geo-diversity of open data sets is critical before adopting a data set for use cases in the developing world and two large, publicly available image data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Expand
Lessons from archives: strategies for collecting sociocultural data in machine learning
It is argued that a new specialization should be formed within ML that is focused on methodologies for data collection and annotation: efforts that require institutional frameworks and procedures for sociocultural data collection. Expand
Data and its (dis)contents: A survey of dataset development and use in machine learning research
The many concerns raised about the way the authors collect and use data in machine learning are surveyed and it is advocated that a more cautious and thorough understanding of data is necessary to address several of the practical and ethical issues of the field. Expand