Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

  title={Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection},
  author={Yu Tian and Guansong Pang and Fengbei Liu and Yuyuan Liu and Chongjian Wang and Yuanhong Chen and Johan W. Verjans and G. Carneiro},
. Current polyp detection methods from colonoscopy videos use exclusively normal (i.e., healthy) training images, which i) ignore the importance of temporal information in consecutive video frames, and ii) lack knowledge about the polyps. Consequently, they often have high detection errors, especially on challenging polyp cases (e.g., small, flat, or partially visible polyps). In this work, we formulate polyp detection as a weakly-supervised anomaly detection task that uses video-level labelled… 

Figures and Tables from this paper


Few-Shot Anomaly Detection for Polyp Frames from Colonoscopy
A new few-shot anomaly detection method based on an encoder trained to maximise the mutual information between feature embeddings and normal images, followed by a few- shot score inference network, trained with a large set of inliers and a substantially smaller set of outliers is proposed.
MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection
A multiple instance self-training framework (MIST) to efficiently refine task-specific discriminative representations with only video-level annotations, which performs comparably to or even better than existing supervised and weakly supervised methods.
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
A novel and theoretically sound method, named Robust Temporal Feature Magnitude learning (RTFM), which trains a feature magnitude learning function to effectively recognise the positive instances, substantially improving the robustness of the MIL approach to the negative instances from abnormal videos.
Detecting, Localising and Classifying Polyps from Colonoscopy Videos using Deep Learning
A system that can automatically detect, localise and classify polyps from colonoscopy videos, and study a method to improve the reliability and interpretability of the classification result using uncertainty estimation and classification calibration.
Weakly Supervised Video Anomaly Detection via Center-Guided Discriminative Learning
An anomaly detection framework, called Anomaly Regression Net (ARNet), which only requires video-level labels in training stage is proposed, which yields a new state-of-the-art result for video anomaly detection on ShanghaiTech dataset.
Photoshopping Colonoscopy Video Frames
  • Yuyuan Liu, Yu Tian, G. Carneiro
  • Computer Science, Medicine
    2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI)
  • 2020
A new system that detects frames containing polyps as anomalies from a distribution of frames from exams that do not contain any polyps, comprising a dual GAN with two generators and two discriminators is introduced.
Progressively Normalized Self-Attention Network for Video Polyp Segmentation
The novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (∼140fps) on a single RTX 2080 GPU and no postprocessing is proposed.
Real-World Anomaly Detection in Surveillance Videos
The experimental results show that the MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches, and the results of several recent deep learning baselines on anomalous activity recognition are provided.
Constrained Contrastive Distribution Learning for Unsupervised Anomaly Detection and Localisation in Medical Images
A novel self-supervised representation learning method, called Constrained Contrastive Distribution learning for anomaly detection (CCD), which learns fine-grained feature representations by simultaneously predicting the distribution of augmented data and image contexts using contrastive learning with pretext constraints.
ViViT: A Video Vision Transformer
This work shows how to effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets, and achieves state-of-the-art results on multiple video classification benchmarks.