Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

@article{Xiong2019PixelOL,
  title={Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos},
  author={Bo Xiong and Suyog Dutt Jain and Kristen Grauman},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2019},
  volume={41},
  pages={2677-2692}
}
  • Bo Xiong, S. Jain, K. Grauman
  • Published 11 August 2018
  • Computer Science
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all “object-like” regions—even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep fully convolutional network. When applied to a video, our model further incorporates a motion stream… 

VidSeg-GAN: Generative Adversarial Network for Video Object Segmentation Tasks

TLDR
This work shows that the proposed framework of processing the video frames independently using a deep generative adversarial network (GAN), is able to maintain the temporal coherency across frames without the use of any explicit trajectory based information, to provide superior results.

Simpler Does It: Generating Semantic Labels with Objectness Guidance

TLDR
This work presents a novel framework that generates pseudo-labels for training images, which are then used to train a segmentation model, and proposes an end-to-end multi-task learning strategy, that jointly learns to segment semantics and objectness using the generated pseudo-Labels.

Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation

TLDR
UVO (Unidentified Video Objects), a new benchmark for openworld class-agnostic object segmentation in videos, is presented and it is demonstrated that UVO can be used for other applications, such as object tracking and super-voxel segmentation.

Objectness-Aware Few-Shot Semantic Segmentation

TLDR
This work demonstrates how to increase overall model capacity, by introducing objectness, which is class-agnostic and so not prone to overfitting, for complementary use with class-specific features in few-shot semantic segmentation models.

Object-centric Video Prediction without Annotation

TLDR
Object-centric Prediction without Annotation is presented, an object-centric video prediction method that takes advantage of priors from powerful computer vision models and how to adapt a perception model in an environment through end-to-end video prediction training.

Point-Supervised Segmentation Of Microscopy Images And Volumes Via Objectness Regularization

  • Shijie LiNeel Dey G. Gerig
  • Computer Science
    2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)
  • 2021
TLDR
This work enables the training of semantic segmentation networks on images with only a single point for training per instance, an extreme case of weak supervision which drastically reduces the burden of annotation.

Visual Recognition From Structured Supervision

TLDR
This thesis proposes methods that reduce the need for human supervision by leveraging the structure in the visual world targeting visual recognition in difficult scenarios where annotated data is scarce and the visual concepts are innumerable or ambiguous.

Gaussian Dynamic Convolution for Efficient Single-Image Segmentation

TLDR
This work adopts the Gaussian dynamic convolution (GDC) to address the typical single-image segmentation tasks and builds a Gaussianynamic pyramid Pooling to show its potential and generality in common semantic segmentation.

Class-agnostic Object Detection

TLDR
This work proposes class-agnostic object detection as a new problem that focuses on detecting objects irrespective of their object-classes, and proposes a new adversarial learning framework that forces the model to exclude class-specific information from features used for predictions.

Content-Aware Cubemap Projection for Panoramic Image via Deep Q-Learning

TLDR
A content-awared CMP optimization method via deep Q-learning to predict an angle for rotating the image in Equirectangular projection (ERP), which attempts to keep foreground objects away from the edge of each projection plane after the image is re-projected with CMP.

References

SHOWING 1-10 OF 94 REFERENCES

Pixel Objectness

TLDR
This work proposes an end-to-end learning framework for foreground object segmentation that substantially improves the state-of-the-art on foreground segmentation on the ImageNet and MIT Object Discovery datasets and generalizes well to segment object categories unseen in the foreground maps used for training.

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

  • S. JainBo XiongK. Grauman
  • Computer Science, Physics
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
This work designs a two-stream fully convolutional neural network which fuses together motion and appearance in a unified framework for segmenting generic objects in videos and shows how to bootstrap weakly annotated videos together with existing image recognition datasets for training.

Learning Motion Patterns in Videos

TLDR
The core of this approach is a fully convolutional network, which is learned entirely from synthetic video sequences, and their ground-truth optical flow and motion segmentation, which outperforms the top method on the recently released DAVIS benchmark dataset by 5.6%.

Learning object class detectors from weakly annotated video

TLDR
It is shown that training from a combination of weakly annotated videos and fully annotated still images using domain adaptation improves the performance of a detector trained from still images alone.

Key-segments for video object segmentation

TLDR
The method first identifies object-like regions in any frame according to both static and dynamic cues and compute a series of binary partitions among candidate “key-segments” to discover hypothesis groups with persistent appearance and motion.

Learning Video Object Segmentation with Visual Memory

TLDR
A novel two-stream neural network with an explicit memory module to achieve the task of segmenting moving objects in unconstrained videos and provides an extensive ablative analysis to investigate the influence of each component in the proposed framework.

Fully Connected Object Proposals for Video Segmentation

TLDR
A novel approach to video segmentation using multiple object proposals that combines appearance with long-range point tracks to ensure robustness with respect to fast motion and occlusions over longer video sequences is presented.

Semantic Co-segmentation in Videos

TLDR
This paper proposes to segment objects and understand their visual semantics from a collection of videos that link to each other, which it refers to as semantic co-segmentation, and utilizes a tracking-based approach to generate multiple object-like tracklets across the video.

Learning to Segment Object Candidates

TLDR
A new way to generate object proposals is proposed, introducing an approach based on a discriminative convolutional network that obtains substantially higher object recall using fewer proposals and is able to generalize to unseen categories it has not seen during training.

One-Shot Video Object Segmentation

TLDR
One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot).
...