• Corpus ID: 53249765

Satyam: Democratizing Groundtruth for Machine Vision

  title={Satyam: Democratizing Groundtruth for Machine Vision},
  author={Hang Qiu and Krishna Chintalapudi and Ramesh Govindan},
The democratization of machine learning (ML) has led to ML-based machine vision systems for autonomous driving, traffic monitoring, and video surveillance. However, true democratization cannot be achieved without greatly simplifying the process of collecting groundtruth for training and testing these systems. This groundtruth collection is necessary to ensure good performance under varying conditions. In this paper, we present the design and evaluation of Satyam, a first-of-its-kind system that… 

Efficient Pipelines for Vision-Based Context Sensing

The design space that consists of three dimensions: sensing task, sensor types, and task locations is explored, developing efficient and scalable solutions for different points in the design space of vision-based sensing tasks and achieving state-of-the-art accuracy in those applications.

On Localizing a Camera from a Single Image

It is shown that, using a judicious combination of projective geometry, neural networks, and crowd-sourced annotations from human workers, it is possible to position 95% of the images in the authors' test data set to within 12 m.

Sensing the Sensor: Estimating Camera Properties with Minimal Information

It is shown, using a judicious combination of projective geometry, neural networks, and crowd-sourced annotations from human workers, that it is possible to localize 95% of the cameras in the authors' test data set to within 12 m using a single image taken from the camera.

Minimum Cost Active Labeling

This paper considers the problem of minimum-cost labeling: classifying all images in a large data set with a target accuracy bound at minimum dollar cost, which has 6X lower overall cost relative to human labeling, and is always cheaper than the cheapest active learning strategy.

Caesar: cross-camera complex activity recognition

This paper argues that a system for near real-time detection of complex activities spanning multiple (possibly wireless) cameras, a capability applicable to surveillance tasks, must employ a hybrid design: one in which rule-based activity detection must complement neural network based detection.



Fine-Grained Crowdsourcing for Fine-Grained Recognition

This work includes humans in the loop to help computers select discriminative features humans use, and proposes the "Bubble Bank" algorithm that uses the human selected bubbles to improve machine recognition performance.

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.

Crowdsourcing Annotations for Visual Object Detection

The key observation is that drawing a bounding box is significantly more difficult and time consuming than giving answers to multiple choice questions, so quality control through additional verification tasks is more cost effective than consensus based algorithms.

CrowdDQS: Dynamic Question Selection in Crowdsourcing Systems

CrowdDQS is presented, a system that uses the most recent set of crowdsourced voting evidence to dynamically issue questions to workers on Amazon Mechanical Turk, and can accurately answer questions using up to 6x fewer votes than standard approaches.

Embracing Error to Enable Rapid Crowdsourcing

This work presents a technique that produces extremely rapid judgments for binary and categorical labels, and demonstrates that it is possible to rectify errors by randomizing task order and modeling response latency.

Crowdsourcing in Computer Vision

Crowdsourcing in Computer Vision describes the types of annotations computer vision researchers have collected using crowdsourcing, and how they have ensured that this data is of high quality while annotation effort is minimized.

The Pascal Visual Object Classes Challenge: A Retrospective

A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly is given and several new streamlined architectures for both residual and non-residual Inception Networks are presented.

Drone-Based Object Counting by Spatially Regularized Regional Proposal Network

This work presents a new large-scale car parking lot dataset (CARPK) that contains nearly 90,000 cars captured from different parking lots and is the first and the largest drone view dataset that supports object counting, and provides the bounding box annotations.

Towards Understanding Action Recognition

It is found that high-level pose features greatly outperform low/mid level features, in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information.