Learning Invariant Representation of Tasks for Robust Surgical State Estimation

  title={Learning Invariant Representation of Tasks for Robust Surgical State Estimation},
  author={Yidan Qin and Max Allan and Yisong Yue and Joel W. Burdick and Mahdi Azizian},
  journal={IEEE Robotics and Automation Letters},
Surgical state estimators in robot-assisted surgery (RAS)-especially those trained via learning techniques-rely heavily on datasets that capture surgeon actions in laboratory or real-world surgical tasks. Real-world RAS datasets are costly to acquire, are obtained from multiple surgeons who may use different surgical strategies, and are recorded under uncontrolled conditions in highly complex environments. The combination of high diversity and limited data calls for new learning methods that… 

Figures and Tables from this paper

PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?
The improvement of video/kinematic-based surgical workflow recognition methods compared with uni-modality methods was significant for all teams, however, the longer testing execution time for video-based than kinematic- based methods must be taken into account.
TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery
This work proposes TraSeTR, a novel Track-to-Segment Transformer that wisely exploits tracking cues to assist surgical instrument segmentation and introduces the prior query that encoded with previous temporal knowledge, to transfer tracking signals to current instances via identity matching.
The “ PEg TRAnsfer Workflow recognition by different modalities
  • 2022


Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources
Fusion-KVE is proposed, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events and achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgicalstate estimation models in both JIGSAWS suturing dataset and the authors' RIOUS dataset.
daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery
The proposed daVinciNet is an end-to-end dual-task model for robot motion and surgical state predictions that performs concurrent end-effector trajectory and surgicalState predictions using features extracted from multiple data streams, including robot kinematics, endoscopic vision, and system events.
Machine and deep learning for workflow recognition during surgery
  • N. Padoy
  • Computer Science
    Minimally invasive therapy & allied technologies : MITAT : official journal of the Society for Minimally Invasive Therapy
  • 2019
It is presented here how several recent techniques relying on machine and deep learning can be used to analyze the activities taking place during surgery, using videos captured from either endoscopic or ceiling-mounted cameras.
Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition
This work confronts the problem of learning surgical phase recognition in scenarios presenting scarce amounts of annotated data and proposes a teacher/student type of approach, where a strong predictor called the teacher, trained beforehand on a small dataset of ground truth-annotated videos, generates synthetic annotations for a larger dataset, which another model - the student - learns from.
Unsupervised Adversarial Invariance
This work presents a novel unsupervised invariance induction framework for neural networks that learns a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, without needing any labeled information about nuisance factors or domain knowledge.
Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video
This work proposes to use a 3D Convolutional Neural Network to learn spatiotemporal features from consecutive video frames to achieve high frame-wise surgical gesture recognition accuracies, outperforming comparable models that either extract only spatial features or model spatial and low-level temporal information separately.
Symmetric Dilated Convolution for Surgical Gesture Recognition
A novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos is proposed with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly.
Aggregating Long-Term Context for Learning Surgical Workflows
A new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM) and explore several choices for propagated statistics.
Controllable Invariance through Adversarial Feature Learning
This paper shows that the proposed framework induces an invariant representation, and leads to better generalization evidenced by the improved performance on three benchmark tasks.
Recognizing Surgical Activities with Recurrent Neural Networks
This work applies recurrent neural networks to the task of recognizing surgical activities from robot kinematics, and is the first to apply recurrent neural Networks to this task, using a single model and a single set of hyperparameters.