Waymo Open Dataset: Panoramic Video Panoptic Segmentation

  title={Waymo Open Dataset: Panoramic Video Panoptic Segmentation},
  author={Jieru Mei and Alex Zihao Zhu and Xinchen Yan and Han Yan and Siyuan Qiao and Yukun Zhu and Liang-Chieh Chen and Henrik Kretzschmar and Dragomir Anguelov},
Panoptic image segmentation is the computer vision task of finding groups of pixels in an image and assigning semantic classes and object instance identifiers to them. Research in image segmentation has become increasingly popular due to its critical applications in robotics and autonomous driving. The research community thereby relies on publicly available benchmark dataset to advance the state-of-the-art in computer vision. Due to the high costs of densely labeling the images, however, there is… 

Figures and Tables from this paper

Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning

This work proposes a framework which allows model training on standard pinhole images and transfers the learned features to a different domain in a cost-minimizing way, and experiment with different target models in order to prove the effectiveness of the proposed approach and to identify models which are most suitable for panoramic panoptic segmentation.

PanoFlow: Learning 360{\deg} Optical Flow for Surrounding Temporal Understanding

This paper proposes a Cyclic Flow Estimation (CFE) method by leveraging the cyclicity of spherical images to infer 360 ◦ optical flow and converting large displacement to relatively small displacement and achieves state-of-the-art performance on the public OmniFlowNet and the fresh established Flow360 benchmarks.

Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation

This paper addresses panoramic semantic segmentation, which provides a full-view and dense-pixel understanding of surroundings in a holistic way and introduces the upgraded Trans4PASS+ model, featuring DMLPv2 with parallel token mixing to improve the flexibility and generalizability in modeling discriminative cues.



Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images

This work proposes a learning approach for panoramic depth map estimation from a single image, thanks to a specifically developed distortion-aware deformable convolution filter, which can be trained by means of conventional perspective images, then used to regress depth forPanoramic images, thus bypassing the effort needed to create annotated pan oramic training dataset.

The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes

The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes, aiming to significantly further the development of state-of-the-art methods for visual road-scene understanding.

Panoptic Segmentation

A novel panoptic quality (PQ) metric is proposed that captures performance for all classes (stuff and things) in an interpretable and unified manner and is performed a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task.

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

In this paper, we present ViP-DeepLab, a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from

Capturing Omni-Range Context for Omnidirectional Segmentation

This work puts forward and extensively evaluate models on Wild PAnoramic Semantic Segmentation (WildPASS), a dataset designed to capture diverse scenes from all around the globe, and introduces Efficient Concurrent Attention Networks (ECANets), directly capturing the inherent long-range dependencies in omnidirectional imagery.

Panoptic Feature Pyramid Networks

This work endsow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone, and shows it is a robust and accurate baseline for both tasks.

Single-Shot Panoptic Segmentation

We present a novel end-to-end single-shot method that segments countable object instances (things) as well as background regions (stuff) into a non-overlapping panoptic segmentation at almost video

UPSNet: A Unified Panoptic Segmentation Network

A parameter-free panoptic head is introduced which solves thepanoptic segmentation via pixel-wise classification and first leverages the logits from the previous two heads and then innovatively expands the representation for enabling prediction of an extra unknown class which helps better resolving the conflicts between semantic and instance segmentation.

STEP: Segmenting and Tracking Every Pixel

This work presents a new benchmark: Segmenting and Tracking Every Pixel (STEP), encompassing two datasets, KITTI-STEP, and MOTChallenge-STEP together with a new evaluation metric Segmentation and Tracking Quality (STQ), that fairly balances semantic and tracking aspects of this task and is suitable for evaluating sequences of arbitrary length.

WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving

The first extensive fisheye automotive dataset, WoodScape, named after Robert Wood, which comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection is released.