A Multi-scale CNN for Affordance Segmentation in RGB Images

  title={A Multi-scale CNN for Affordance Segmentation in RGB Images},
  author={Anirban Roy and Sinisa Todorovic},
Given a single RGB image our goal is to label every pixel with an affordance type. [] Key Method Our approach uses a deep architecture, consisting of a number of multi-scale convolutional neural networks, for extracting mid-level visual cues and combining them toward affordance segmentation. The mid-level cues include depth map, surface normals, and segmentation of four types of surfaces – namely, floor, structure, furniture and props. For evaluation, we augmented the NYUv2 dataset with new ground-truth…
Learning to Segment Affordances
This work demonstrates both, quantitatively and qualitatively, that learning a dense predictor of affordances from an object part dataset is indeed possible and shows that the model outperforms several baselines.
3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding
Comprehensive results on the contributed dataset show the promise of visual affordance understanding as a valuable yet challenging benchmark.
AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection
The experimental results on the public datasets show that the AffordanceNet outperforms recent state-of-the-art methods by a fair margin, while its end-to-end architecture allows the inference at the speed of 150ms per image.
What can I do here? Leveraging Deep 3D saliency and geometry for fast and scalable multiple affordance detection
This paper develops and evaluates a novel method that allows for the detection of affordances in a scalable and multiple-instance manner on visually recovered pointclouds based on highly parallelizable, one-shot learning that is fast in commodity hardware.
An Affordance Keypoint Detection Network for Robot Manipulation
This letter investigates the addition of keypoint detections to a deep network affordance segmentation pipeline. The intent is to better interpret the functionality of object parts from a
Learning to Label Affordances from Simulated and Real Data
A convolutional neural network is designed that can densely predict affordances given only a single 2D RGB image with a novel cost function, which is able to handle (potentially multiple) affordances of objects and their parts in a pixel-wise manner even in the case of incomplete data.
Object affordance detection with relationship-aware network
A novel relationship-aware convolutional neural network, which takes the symbiotic relationship between multiple affordances and the combinational relationship between the affordance and objectness into consideration, to predict the most probable affordance label for each pixel in the object.
PartAfford: Part-level Affordance Discovery from 3D Objects
This work presents a new task of part-level affordance discovery (PartAfford), where given only the affordance labels per object, the machine is tasked to decompose 3D shapes into parts and discover how each part of the object corresponds to a certain affordance category.
A New Localization Objective for Accurate Fine-Grained Affordance Segmentation Under High-Scale Variations
This work proposes an instance-segmentation framework that can accurately localize functionality and affordance of individual object parts and proposes a novel Angular Intersection Over Larger (AIOL) measure to address limitations.


Learning Hierarchical Features for Scene Labeling
A method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel, alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information.
Learning human activities and object affordances from RGB-D videos
This work considers the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances, and formulate the learning problem using a structural support vector machine (SSVM) approach.
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
This work proposes algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information and shows how this contextual information in turn improves object recognition.
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture
  • D. Eigen, R. Fergus
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
This paper employs two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally, and applies a scale-invariant error to help measure depth relations rather than scale.
Recurrent Convolutional Neural Networks for Scene Parsing
This work proposes an approach consisting of a recurrent convolutional neural network which allows us to consider a large input context, while limiting the capacity of the model, while remaining very fast at test time.
Affordance detection of tool parts from geometric features
This work proposes two approaches for learning affordances from local shape and geometry primitives: superpixel based hierarchical matching pursuit (S-HMP); and structured random forests (SRF), and introduces a large RGB-Depth dataset where tool parts are labeled with multiple affordances and their relative rankings.
Recovering Surface Layout from an Image
This paper takes the first step towards constructing the surface layout, a labeling of the image intogeometric classes, to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region.
Indoor Semantic Segmentation using depth information
This work addresses multi-class segmentation of indoor scenes with RGB-D inputs by applying a multiscale convolutional network to learn features directly from the images and the depth information.
Indoor Segmentation and Support Inference from RGBD Images
The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.