• Corpus ID: 57189184

S4-Net: Geometry-Consistent Semi-Supervised Semantic Segmentation

  title={S4-Net: Geometry-Consistent Semi-Supervised Semantic Segmentation},
  author={Sinisa Stekovic and Friedrich Fraundorfer and Vincent Lepetit},
We show that it is possible to learn semantic segmentation from very limited amounts of manual annotations, by enforcing geometric 3D constraints between multiple views. More exactly, image locations corresponding to the same physical 3D point should all have the same label. We show that introducing such constraints during learning is very effective, even when no manual label is available for a 3D point, and can be done simply by employing techniques from 'general' semi-supervised learning to… 

Figures from this paper

Semi-supervised semantic segmentation needs strong, varied perturbations

This work finds that adapted variants of the recently proposed CutOut and CutMix augmentation techniques yield state-of-the-art semi-supervised semantic segmentation results in standard datasets.

Consistency regularization and CutMix for semi-supervised semantic segmentation

The recently proposed CutMix regularizer for semantic segmentation is adapted and it is found that it is able to overcome this obstacle, leading to a successful application of consistency regularization to semi-supervised semantic segmentsation.

Robust Semi-Supervised Semantic Segmentation Based on Self-Attention and Spectral Normalization

The present work addresses the issue of long-range dependencies between different image regions by introducing a self-attention mechanism in the generator of the GAN to effectively account for relationships between widely separated spatial regions of the input image with supervision based on pixel-level ground truth data.



ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.

Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization

A novel efficient and robust method to optimize the pose is presented, which trains a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate.

Label Fusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes

This paper develops a pipeline to rapidly generate high quality RGBD data with pixelwise labels and object poses and uses this dataset to answer questions related to how much training data is required, and of what quality the data must be, to achieve high performance from a DNN architecture.

Figure-ground segmentation using a hierarchical conditional random field

This work proposes an approach to the problem of detecting and segmenting generic object classes that combines three "off the shelf" components in a novel way that can handle deformable (non-rigid) objects such as animals, even under severe occlusion and rotation.

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

The main contribution is to explicitly consider the inferred 3D geometry of the whole scene, and enforce consistency of the estimated 3D point clouds and ego-motion across consecutive frames, and outperforms the state-of-the-art for both breadth and depth.

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

An adaptive geometric consistency loss is proposed to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively and achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.

Indoor Segmentation and Support Inference from RGBD Images

The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.

Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes

This work proposes an alternative paradigm which combines real and synthetic data for learning semantic instance segmentation and object detection models, and introduces a novel dataset of augmented urban driving scenes with 360 degree images that are used as environment maps to create realistic lighting and reflections on rendered objects.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Digging Into Self-Supervised Monocular Depth Estimation

It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.