TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery

  title={TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery},
  author={Zixu Zhao and Yueming Jin and Pheng-Ann Heng},
  journal={2022 International Conference on Robotics and Automation (ICRA)},
  • Zixu Zhao, Yueming Jin, P. Heng
  • Published 17 February 2022
  • Computer Science
  • 2022 International Conference on Robotics and Automation (ICRA)
Surgical instrument segmentation - in general a pixel classification task - is fundamentally crucial for promoting cognitive intelligence in robot-assisted surgery (RAS). However, previous methods are struggling with discriminating instrument types and instances. To address above issues, we explore a mask classification paradigm that produces per-segment predictions. We propose TraSeTR, a novel Track-to-Segment Transformer that wisely exploits tracking cues to assist surgical instrument… 

Figures and Tables from this paper



Anchor-guided online meta adaptation for fast one-Shot instrument segmentation from robotic surgical videos

One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video

MDAL, a meta-learning based dynamic online adaptive learning scheme with a two-stage framework to fast adapt the model parameters on the first frame and partial subsequent frames while predicting the results, outperforms other state-of-the-art methods on two datasets (including a real-world RAS dataset).

ISINet: An Instance-Based Approach for Surgical Instrument Segmentation

The proposed Instance-based Surgical Instrument Segmentation Network (ISINet), a method that addresses this task from an instance-based segmentation perspective, includes a temporal consistency module that takes into account the previously overlooked and inherent temporal information of the problem.

Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning

This paper describes a deep learning-based approach for robotic instrument segmentation that addressed the binary segmentation problem, where every pixel in an image is labeled as an instrument or background from the surgery video feed, and solves a multi-class segmentation problems.

End-to-End Object Detection with Transformers

This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.

Learning Motion Flows for Semi-supervised Instrument Segmentation from Robotic Surgical Video

This paper proposes a dual motion based method to wisely learn motion flows for segmentation enhancement by leveraging temporal dynamics, and designs a flow predictor to derive the motion for jointly propagating the frame-label pairs given the current labeled frame.

2018 Robotic Scene Segmentation Challenge

The robotic instrument segmentation dataset was introduced with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs and added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes.

Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video

This paper proposes a novel framework to leverage instrument motion information, by incorporating a derived temporal prior to an attention pyramid network for accurate segmentation, and demonstrates a promising potential for reducing annotation cost in the clinical practice.

Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

This paper introduces, for the first time in surgical workflow analysis, Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate surgical phase recognition in hybrid embedding aggregation Transformer.

End-to-End Video Instance Segmentation with Transformers

A new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem, and achieves the highest speed among all existing VIS models and the best result among methods using single model on the YouTube-VIS dataset.