A large-scale hierarchical multi-view RGB-D object dataset

  title={A large-scale hierarchical multi-view RGB-D object dataset},
  author={Kevin Lai and Liefeng Bo and Xiaofeng Ren and Dieter Fox},
  journal={2011 IEEE International Conference on Robotics and Automation},
  • Kevin LaiLiefeng Bo D. Fox
  • Published 9 May 2011
  • Computer Science
  • 2011 IEEE International Conference on Robotics and Automation
Over the last decade, the availability of public image repositories and recognition benchmarks has enabled rapid progress in visual object category and instance detection. Today we are witnessing the birth of a new generation of sensing technologies capable of providing high quality synchronized videos of both color and depth, the RGB-D (Kinect-style) camera. With its advanced sensing capabilities and the potential for mass adoption, this technology represents an opportunity to dramatically… 

Figures and Tables from this paper

RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark

A large-scale, hierarchical multi-view object dataset collected using an RGB-D camera is introduced and it is demonstrated that combining color and depth information substantially improves quality of results.

Recurrent Convolutional Fusion for RGB-D Object Recognition

This work introduces a novel end-to-end architecture for RGB-D object recognition called recurrent convolutional fusion (RCFusion), which significantly outperforms state-of-the-art approaches in both the object categorization and instance recognition tasks.

A large-scale multi-pose 3D-RGB object database

We present a new RGB-D database for multi-pose object recognition tasks. With the help of a multi-axis rotation framework, we are capable of capturing depth and color data of arbitrary small objects

Change Their Perception: RGB-D for 3-D Modeling and Recognition

It is believed that RGB-D perception will be on the center stage of perception and, by making robots see much better than before, will enable a variety of perception-based research and applications.

Multiview RGB-D Dataset for Object Instance Detection

A new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset is presented and an approach for detection and recognition is presented.

A recurrent multi-scale approach to RBG-D Object Recognition.

The project concerns the realization of a new end-to-end architecture for the recognition of RGB-D objects called RCFusion, which generates compact and highly discriminative multi-modal features by combining complementary RGB and depth information representing different levels of abstraction.

Volumetric Object Recognition Using 3-D CNNs on Depth Data

This work proposes two volumetric representations to reveal rich 3-D structural information hidden in depth images and combines information from multiple views of objects to provide rotational invariance and improves the accuracy significantly comparing with the single-rotational approach.

Learning hierarchical sparse features for RGB-(D) object recognition

HMP builds feature hierarchies layer by layer with an increasing receptive field size to capture abstract representations from raw RGB-D data and indicates that the features learned enable superior object recognition results using linear support vector machines.

Complex-Valued Representation for RGB-D Object Recognition

This paper proposes a novel method to describe RGB-D images with a complex-valued representation by means of neural network, and introduces a new CVNN (Complex-Valued Neural Network) with RBF neurons.

Unsupervised Feature Learning for RGB-D Based Object Recognition

HMP uses sparse coding to learn hierarchical feature representations from raw RGB-D data in an unsupervised way and enables superior object recognition results using linear support vector machines.



RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments

This paper presents RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment to achieve globally consistent maps.

3D generic object categorization, localization and pose estimation

This work proposes a novel and robust model to represent and learn generic 3D object categories, and proposes a framework in which learning is done via minimal supervision compared to previous works.

Using stereo for object recognition

  • S. HelmerD. Lowe
  • Computer Science
    2010 IEEE International Conference on Robotics and Automation
  • 2010
This paper proposes a model that utilizes a chamfer-type silhouette classifier which is weighted by a prior on scale, which is robust to missing stereo depth information, and is validated on a set of challenging indoor scenes containing mugs and shoes.

LabelMe: A Database and Web-Based Tool for Image Annotation

A web-based tool that allows easy image annotation and instant sharing of such annotations is developed and a large dataset that spans many object categories, often containing multiple instances over a wide variety of images is collected.

An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection

This paper presents a method which improves this adaptive background mixture model by reinvestigating the update equations at different phases, which allows the system learn faster and more accurately as well as adapts effectively to changing environment.

Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces

This work has created a public framework for dividing the work of labeling video data into micro-tasks that can be completed by huge labor pools available through crowdsourced marketplaces and leverages more sophisticated interpolation between key frames to maximize performance given a budget.

Object recognition from local scale-invariant features

  • D. Lowe
  • Computer Science
    Proceedings of the Seventh IEEE International Conference on Computer Vision
  • 1999
Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

A discriminatively trained, multiscale, deformable part model

A discriminatively trained, multiscale, deformable part model for object detection, which achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge and outperforms the best results in the 2007 challenge in ten out of twenty categories.

Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons

A unified model to construct a vocabulary of prototype tiny surface patches with associated local geometric and photometric properties, represented as a set of linear Gaussian derivative filter outputs, under different lighting and viewing conditions is provided.

ImageNet: A large-scale hierarchical image database

A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.