SOLO: A Simple Framework for Instance Segmentation

  title={SOLO: A Simple Framework for Instance Segmentation},
  author={Xinlong Wang and Rufeng Zhang and Chunhua Shen and Tao Kong and Lei Li},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that has made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the “detect-then-segment” strategy (e.g., Mask R-CNN), or predict embedding vectors first then cluster pixels into individual instances. In this paper, we view the task of instance segmentation from a completely new perspective by introducing… 

FreeSOLO: Learning to Segment Objects without Annotations

This work proposes a fully unsupervised learning method that learns class-agnostic instance segmentation without any annotations, and presents a novel localization-aware pre-training framework, where objects can be discovered from complicated scenes in an unsuper supervised manner.

Active Pointly-Supervised Instance Segmentation

An economic active learning setting, named active pointly-supervised instance segmentation (APIS), which starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object, which suggests that APIS, inte-grating the advantages of active learning and point-based supervision, is an effective learning paradigm for label-efficient instance segmentsation.

An Improved IRNet for Instance Segmentation Based on Image-level Supervision

A contrastive loss function is introduced to improve the displacement field prediction results and a Siamese network is developed to obtain scale-invariant class boundary maps.

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Evaluating the vision tower of a giant CLIP from EVA can greatly stabilize the training and outperform the training from scratch counterpart with much fewer samples and less compute, providing a new direction for scaling up and accelerating the costly training of multi-modal foundation models.

Point-Teaching: Weakly Semi-Supervised Object Detection with Point Annotations

This work presents Point-Teaching, a weakly semi-supervised object detection framework to fully exploit the point annotations (WSSOD-P), and proposes a Hungarian-based point matching method to generate pseudo labels for point annotated images.

NeuralBF: Neural Bilateral Filtering for Top-down Instance Segmentation on Point Clouds

This work introduces a method for instance proposal generation for 3D point clouds based on iterative bilateral bilateral filtering with learned kernels, which considers both the deep feature em-beddings of each point, as well as their locations in the 3D space.

Vision Transformers Are Good Mask Auto-Labelers

Qualitative results indicate that masks produced by MAL are, in some cases, even better than human annotations, and the method reduces the gap between auto-labeling and human annotation regarding mask quality.

A Learning-Based Framework for Depth Perception using Dense Light Fields

A learning-based framework that allows unifying scene depth with visual information obtained from Light Fields is proposed, using a siamese neural network called EPINET-FAST, which allows for generating depth maps in less than half the time of the original EPINet.

Efficient and Lightweight Framework for Real-Time Ore Image Segmentation Based on Deep Learning

A lightweight backbone is introduced for feature extraction while reducing computational complexity, and a compact pyramid network is proposed to process the data obtained from the backbone to reduce unnecessary semantic information and computation.

Deep Learning in Diverse Intelligent Sensor Based Systems

This survey paper provides a comprehensive summary of deep learning implementation tips and links to tutorials, open-source codes, and pretrained models, which can serve as an excellent self-contained reference for deep learning practitioners and those seeking to innovate deep learning in this space.



SOLO: Segmenting Objects by Locations

A new, embarrassingly simple approach to instance segmentation in images by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location and size thus nicely converting instance mask segmentation into a classification-solvable problem.

Mask Encoding for Single Shot Instance Segmentation

Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector, which allows the instance segmentation task to be incorporated into one-stage bounding-box detectors and results in a simple yet efficient instance segmentations framework.

Semantic Instance Segmentation with a Discriminative Loss Function

This work proposes an approach of combining an off-the-shelf network with a principled loss function inspired by a metric learning objective that encourages a convolutional network to produce a representation of the image that can easily be clustered into instances with a simple post-processing step.

Simultaneous Detection and Segmentation

This work builds on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN), introducing a novel architecture tailored for SDS, and uses category-specific, top-down figure-ground predictions to refine the bottom-up proposals.

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.

SSAP: Single-Shot Instance Segmentation With Affinity Pyramid

This work proposes a single-shot proposal-free instance segmentation method that requires only one single pass for prediction, based on a pixel-pair affinity pyramid, which computes the probability that two pixels belong to the same instance in a hierarchical manner.

Panoptic Feature Pyramid Networks

This work endsow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone, and shows it is a robust and accurate baseline for both tasks.

Semi-convolutional Operators for Instance Segmentation

It is shown theoretically and empirically that constructing dense pixel embeddings that can separate object instances cannot be easily achieved using convolutional operators, and that simple modifications, which are called semi-convolutional, have a much better chance of succeeding at this task.

LVIS: A Dataset for Large Vocabulary Instance Segmentation

This work introduces LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation, which has a long tail of categories with few training samples due to the Zipfian distribution of categories in natural images.

TensorMask: A Foundation for Dense Object Segmentation

It is demonstrated that the tensor view leads to large gains over baselines that ignore this structure, and leads to results comparable to Mask R-CNN, suggesting that TensorMask can serve as a foundation for novel advances in dense mask prediction and a more complete understanding of the task.