Video Polyp Segmentation: A Deep Learning Perspective

  title={Video Polyp Segmentation: A Deep Learning Perspective},
  author={Ge-Peng Ji and Guobao Xiao and Yu-Cheng Chou and Deng-Ping Fan and Kai Zhao and Geng Chen and H. Fu and Luc Van Gool},
We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 frames from the well-known SUN-database. We provide additional annotations with diverse types, i.e. , attribute , object mask… 

Figures and Tables from this paper

TBraTS: Trusted Brain Tumor Segmentation
A trusted brain tumor segmentation network which can generate robust segmentation results and reliable uncertainty estimations without excessive computational burden and modification of the backbone network is proposed.


Progressively Normalized Self-Attention Network for Video Polyp Segmentation
The novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (∼140fps) on a single RTX 2080 GPU and no postprocessing is proposed.
MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
A novel end-to-end learning neural network, i.e., MATNet, for zero-shot video object segmentation (ZVOS), motivated by the human visual attention behavior, which leverages motion cues as a bottom-up signal to guide the perception of object appearance.
Full-Duplex Strategy for Video Object Segmentation
A novel framework, termed the FSNet (Full-duplex Strategy Network), which designs a relational cross-attention module (RCAM) to achieve the bidirectional message propagation across embedding subspaces to improve model robustness.
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
The proposed model, named PolypPVT, effectively suppresses noises in the features and significantly improves their expressive capabilities, and achieves the new state-of-the-art performance.
The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos
This work is the first truly end-to-end zero-shot object segmentation from videos and develops generic objectness for segmentation and tracking, but also outperforms prevalent image-based contrastive learning methods without augmentation engineering.
Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection
A Constrained Self-Attention (CSA) operation to capture motion cues, based on the prior that objects always move in a continuous trajectory is designed, which outperforms previous state-of-the-art methods in both accuracy and speed.
Precise Yet Efficient Semantic Calibration and Refinement in ConvNets for Real-time Polyp Segmentation from Colonoscopy Videos
A novel convolutional neural network equipped with two new semantic calibration and refinement approaches for automatic polyp segmentation from colonoscopy videos, called the proposed ConvNet as SCR-Net, which has two key modules.
Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation
Spatial-Temporal Feature Transformation (STFT), a multi-frame collaborative framework to address issues of precise localization of polyp and inter-frame variations in the camera-moving situation with feature alignment by proposal-guided deformable convolutions.
Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection
A dynamic context-sensitive filtering network (DCFNet) equipped with aynamic context- sensitive filtering module (DCFM) and an effective bidirectional dynamic fusion strategy and the proposed DCFM sheds new light on dynamic filter generation by extracting location-related affinities between consecutive frames.
PraNet: Parallel Reverse Attention Network for Polyp Segmentation
Quantitative and qualitative evaluations on five challenging datasets across six metrics show that the PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.