Click Here: Human-Localized Keypoints as Guidance for Viewpoint Estimation
@article{Szeto2017ClickHH, title={Click Here: Human-Localized Keypoints as Guidance for Viewpoint Estimation}, author={Ryan Szeto and Jason J. Corso}, journal={2017 IEEE International Conference on Computer Vision (ICCV)}, year={2017}, pages={1604-1613} }
We motivate and address a human-in-the-loop variant of the monocular viewpoint estimation task in which the location and class of one semantic object keypoint is available at test time. In order to leverage the keypoint information, we devise a Convolutional Neural Network called Click-Here CNN (CH-CNN) that integrates the keypoint information with activations from the layers that process the image. It transforms the keypoint information into a 2D map that can be used to weigh features from…
Figures and Tables from this paper
19 Citations
StarMap for Category-Agnostic Keypoint and Viewpoint Estimation
- Computer ScienceECCV
- 2018
A category-agnostic keypoint representation, which combines a multi-peak heatmap for all the keypoints and their corresponding features as 3D locations in the canonical viewpoint defined for each instance, which demonstrates competitive performance in keypoint detection and localization compared to category-specific state-of-the-art methods.
Cross-Object Viewpoint Estimation via Domain Adaptation
- Computer Science
- 2018
A framework that learns an embedding which is invariant to both synthesized-or-real domains as well as object classes is proposed, which discourage the learned embedding to encode the domain or class information by reverse the gradient during back-propagation in training.
An Appearance-and-Structure Fusion Network for Object Viewpoint Estimation
- Computer ScienceIJCAI
- 2018
A novel Appearance-and-Structure Fusion network, which is called ASFnet that estimates viewpoint by fusing both appearance and structure information, is proposed in this paper and outperforms state-of-the-art methods on a public PASCAL 3D+ dataset.
Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints From Limited Training Data
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This paper presents an approach which can learn from a small annotated dataset containing a limited range of viewpoints and generalize to detect semantic parts for a much largerrange of viewpoints.
C-Reference: Improving 2D to 3D Object Pose Estimation Accuracy via Crowdsourced Joint Object Estimation
- Computer ScienceProc. ACM Hum. Comput. Interact.
- 2020
A crowd-machine hybrid approach that jointly uses crowds' approximate measurements of multiple in-scene objects to estimate the 3D state of a single target object and can reduce errors in the target object's 3D location estimation by over 40%, while requiring only $35$% as much human time.
Semantic translation with convolutional encoder-decoder networks for viewpoint estimation
- Computer Science2017 11th Asian Control Conference (ASCC)
- 2017
A new pipeline of viewpoint estimation is proposed, introducing semantic translation methods to highlight the structures of interest (SOIs) as foregrounds, and a convolutional encoder-decoder network is applied as the generator of semantic segmentation.
Conservative Wasserstein Training for Pose Estimation
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This paper systematically concludes the practical closed-form solution of Wasserstein distance for pose data with either one-hot or conservative target label, especially using convex mapping function for ground metric, conservative label, and closed- form solution.
Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild
- Computer ScienceAAAI
- 2020
A deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module to extract RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation.
Adviser Networks: Learning What Question to Ask for Human-In-The-Loop Viewpoint Estimation
- Computer ScienceArXiv
- 2018
This work forms a solution to the adviser problem using a deep network and applies it to the viewpoint estimation problem where the question asks for the location of a specific keypoint in the input image, and is able to outperform the previous hybrid-intelligence state-of-the-art.
Ground-truth or DAER: Selective Re-query of Secondary Information
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This work proposes the problem of seed rejection—determining whether to reject a seed based on the expected performance degradation when it is provided in place of a gold-standard seed, and provides a formal definition to this problem.
References
SHOWING 1-10 OF 32 REFERENCES
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
A deep convolutional neural network architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image is presented.
Viewpoints and keypoints
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
The problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details is characterized and it is demonstrated that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions.
Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
A scalable and overfit-resistant image synthesis pipeline, together with a novel CNN specifically tailored for the viewpoint estimation task, is proposed that can significantly outperform state-of-the-art methods on PASCAL 3D+ benchmark.
SSD: Single Shot MultiBox Detector
- Computer ScienceECCV
- 2016
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Single Image 3D Interpreter Network
- Computer ScienceECCV
- 2016
This work proposes 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data, and achieves state-of-the-art performance on both 2DKeypoint estimation and3D structure recovery.
Parsing IKEA Objects: Fine Pose Estimation
- Computer Science2013 IEEE International Conference on Computer Vision
- 2013
This work addresses the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models by using local keypoint detectors to find candidate poses and score global alignment of each candidate pose to the image.
Monocular 3D Object Detection for Autonomous Driving
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
Best of both worlds: Human-machine collaboration for object annotation
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This paper empirically validate the effectiveness of the human-in-the-loop labeling approach on the ILSVRC2014 object detection dataset and seamlessly integrates multiple computer vision models with multiple sources of human input in a Markov Decision Process.
Click Carving: Segmenting Objects in Video with Point Clicks
- Computer ScienceHCOMP
- 2016
A novel form of interactive video object segmentation where a few clicks by the user helps the system produce a full spatio-temporal segmentation of the object of interest that outperforms all similarly fast methods, and is competitive or better than those requiring 2 to 12 times the effort.
Beyond PASCAL: A benchmark for 3D object detection in the wild
- Computer ScienceIEEE Winter Conference on Applications of Computer Vision
- 2014
PASCAL3D+ dataset is contributed, which is a novel and challenging dataset for 3D object detection and pose estimation, and on average there are more than 3,000 object instances per category.