Can Ground Truth Label Propagation from Video Help Semantic Segmentation?

@inproceedings{Mustikovela2016CanGT,
  title={Can Ground Truth Label Propagation from Video Help Semantic Segmentation?},
  author={Siva Karthik Mustikovela and Michael Ying Yang and Carsten Rother},
  booktitle={ECCV Workshops},
  year={2016}
}
For state-of-the-art semantic segmentation task, training convolutional neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling, which is expensive and involves extensive human effort. In this work, we study the possibility of using auxiliary ground truth, so-called pseudo ground truth (PGT) to improve the performance. The PGT is obtained by propagating the labels of a GT frame to its subsequent frames in the video using a simple CRF-based, cue integration framework. Our main… 

Improving Semantic Segmentation via Efficient Self-Training.

This paper introduces a self-training framework to leverage pseudo labels generated from unlabeled data, and proposes a centroid sampling strategy to uniformly select training samples from every class within each epoch in order to handle the data imbalance problem of semantic segmentation.

Improving Semantic Segmentation via Video Propagation and Label Relaxation

This paper presents a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks, and introduces a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries.

Improving Semantic Segmentation via Self-Training

This paper demonstrates the effectiveness of self-training on a challenging cross-domain generalization task, outperforming conventional finetuning method by a large margin and proposes a fast training schedule to accelerate the training of segmentation models by up to 2x without performance degradation.

Label propagation in RGB-D video

This work proposes a new method using the camera poses and 3D point clouds for propagating the labels in superpixels computed on the unannotated frames of the sequence and demonstrates an increase in performance when the ground truth keyframes are combined with the propagate labels during training.

Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

This work proposes a novel label propagation method, termed Warp-Refine Propagation, that combines semantic cues with geometric cues to efficiently auto-label videos that improves label-propagation by a noteworthy margin and achieves competitive results on three semantic-segmentation benchmarks.

Semantics through Time: Semi-supervised Segmentation of Aerial Videos with Iterative Label Propagation

This paper introduces SegProp, a novel iterative flow-based method, with a direct connection to spectral clustering in space and time, to propagate the semantic labels to frames that lack human annotations, significantly outperforming other state-of-the-art label propagation methods.

Large Scale Labelled Video Data Augmentation for Semantic Segmentation in Driving Scenarios

This work makes use of an occlusion-aware and uncertainty-enabled label propagation algorithm to generate additional labelled data and increases the availability of high-resolution labelled frames by a factor of 20, yielding in a 6.8% to 10.

Improving Semantic Image Segmentation via Label Fusion in Semantically Textured Meshes

This work presents a label fusion framework that is capable of improving semantic pixel labels of video sequences in an unsupervised manner using a 3D mesh representation of the environment and fuse the predictions of different frames into a consistent representation using semantic mesh textures.

STD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling

A novel superpixel-based multi-view convolutional neural network for semantic image segmentation that produces a high quality segmentation of a single image by leveraging information from additional views of the same scene is proposed.

An Unsupervised Temporal Consistency (TC) Loss to Improve the Performance of Semantic Segmentation Networks

This paper proposes a novel unsupervised temporal consistency (TC) loss that penalizes unstable semantic segmentation predictions and demonstrates that this training strategy helps in improving the temporal consistency of two state-of-the-art semantic segmentations networks on two different road-scenes datasets.

References

SHOWING 1-10 OF 41 REFERENCES

Recurrent Convolutional Neural Networks for Scene Labeling

This work proposes an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model, and yields state-of-the-art performance on both the Stanford Background Dataset and the SIFT FlowDataset while remaining very fast at test time.

Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

Expectation-Maximization (EM) methods for semantic image segmentation model training under weakly supervised and semi-supervised settings are developed and extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC 2012 image segmentsation benchmark, while requiring significantly less annotation effort.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

This work proposes Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space of a CNN, and demonstrates the generality of this new learning framework.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Label propagation in video sequences

This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem, and reports studies on a state of the art Random forest classifier based video segmentation scheme, trained using fully ground truth data and with data obtained from label propagation.

Supervoxel-Consistent Foreground Propagation in Video

This work proposes a higher order supervoxel label consistency potential for semi-supervised foreground segmentation, leveraging bottom-up supervoxels to guide its estimates towards long-range coherent regions.

Weakly Supervised Learning of Object Segmentations from Web-Scale Video

This work proposes to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors.

Part-Based R-CNNs for Fine-Grained Category Detection

This work proposes a model for fine-grained categorization that overcomes limitations by leveraging deep convolutional features computed on bottom-up region proposals, and learns whole-object and part detectors, enforces learned geometric constraints between them, and predicts a fine- grained category from a pose-normalized representation.

Discriminative Segment Annotation in Weakly Labeled Video

This paper presents CRANE, a weakly supervised algorithm that is specifically designed to learn under real-world training data, and shows state-of-the-art pixel-level segmentation results on two datasets, one of which includes a training set of spatiotemporal segments from more than 20,000 videos.