AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

  title={AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation},
  author={Xiangyi Yan and Hao Tang and Shanlin Sun and Haoyu Ma and Deying Kong and Xiaohui Xie},
  journal={2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  • Xiangyi Yan, Hao Tang, Xiaohui Xie
  • Published 20 October 2021
  • Computer Science
  • 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the UNet model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these… 

Figures and Tables from this paper

Transformers in Medical Imaging: A Survey
This survey surveys the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks and develops taxonomy for each application.
Transformers in Medical Image Analysis: A Review
Transformers have dominated the field of natural language processing, and recently impacted the computer vision area. In the field of medical image analysis, Transformers have also been successfully
SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation
SimCVD is presented, a simple contrastive distillation framework that significantly advances state-of-the-art voxel-wise representation learning and hypothesize that dropout can be viewed as a minimal form of data augmentation and makes the network robust to representation collapse.
Automated segmentation of endometriosis using transfer learning technique
The proposed SSAE approach identifies the affected region using U-Net architecture and systematic sampling procedure and proves the similarity between pathologically identified images and the corresponding annotated images using a statistical evaluation.
Transformers Meet Visual Learning Understanding: A Comprehensive Review
This review mainly investigates the current research progress of Transformer in image and video applications, which makes a comprehensive overview of Trans transformer in visual learning understanding.
TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation
A transformer framework for multi-view 3D pose estimation, aiming at directly improving individual 2D predictors by integrating information from different views is introduced, and the concept of epipolar field to encode 3D positional information into the transformer model is proposed.
A survey on attention mechanisms for medical applications: are we moving towards better algorithms?
This paper concludes with a critical analysis of the claims and potentialities presented in the literature about attention mechanisms and proposes future research lines in medical applications that may benefit from these frameworks.
Open-world active learning for echocardiography view classification
This work developed an open world active learning approach for echocardiography view classification, where the network classifies images of known views into their respective classes and identifies images of unknown views through a clustering approach.
SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation
An unsupervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos, which achieves state-of-the-art performance on all datasets and can even outperform some weakly-supervised approaches, demonstrating its effectiveness and generalizability.
Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow
A new model called Neural Diffeomorphic Flow (NDF) is proposed to learn deep implicit shape templates, representing shapes as conditional diffeomorphic deformations of templates, intrinsically preserving shape topologies.


Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation
Under the direct downsampling and up-sampling of the inputs and outputs by 4×, experiments demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution.
UNETR: Transformers for 3D Medical Image Segmentation
This work reformulates the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem and introduces a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information.
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
It is argued that Transformers can serve as strong encoders for medical image segmentation tasks, with the combination of U-Net to enhance finer details by recovering localized spatial information.
UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation
  • Huimin Huang, Lanfen Lin, Jian Wu
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
A novel UNet 3+ is proposed, which takes advantage of full-scale skip connections and deep supervisions, and can reduce the network parameters to improve the computation efficiency.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
This paper presents UNet++, a new, more powerful architecture for medical image segmentation where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways, and argues that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar.
Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation
A new framework for combining 3D and 2D models is proposed, in which the segmentation is realized through high-resolution 2D convolutions, but guided by spatial contextual information extracted from a low-resolution 3D model.
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
This work proposes an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network, trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once.
Automatic Pulmonary Lobe Segmentation Using Deep Learning
This work proposes pre-processing CT image by cropping region that is covered by the convex hull of the lungs in order to mitigate the influence of noise from outside the lungs, and uses a hybrid loss function with dice loss to tackle extreme class imbalance issue and focal loss to force model to focus on voxels that are hard to be discriminated.
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
A new framework for few-shot medical image segmentation based on prototypical networks based on a context relation encoder that uses correlation to capture local relation features between foreground and background regions and a recurrent mask refinement module that repeatedly uses the CRE and a prototypical network to recapture the change of context relationship and refine the segmentation mask iteratively.
Multiple Slice k-space Deep Learning for Magnetic Resonance Imaging Reconstruction
A fully data-driven deep learning algorithm for k-space interpolation, utilizing the correlation information between the target slice and its neighboring slices, and a novel network is proposed, which models the inter-dependencies between different slices.