AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation
@article{Yan2022AFTerUNetAF, title={AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation}, author={Xiangyi Yan and Hao Tang and Shanlin Sun and Haoyu Ma and Deying Kong and Xiaohui Xie}, journal={2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, year={2022}, pages={3270-3280} }
Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the UNet model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these…
Figures and Tables from this paper
10 Citations
Transformers in Medical Imaging: A Survey
- Computer ScienceArXiv
- 2022
This survey surveys the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks and develops taxonomy for each application.
Transformers in Medical Image Analysis: A Review
- Physics
- 2022
Transformers have dominated the field of natural language processing, and recently impacted the computer vision area. In the field of medical image analysis, Transformers have also been successfully…
SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation
- Computer ScienceIEEE transactions on medical imaging
- 2022
SimCVD is presented, a simple contrastive distillation framework that significantly advances state-of-the-art voxel-wise representation learning and hypothesize that dropout can be viewed as a minimal form of data augmentation and makes the network robust to representation collapse.
Automated segmentation of endometriosis using transfer learning technique
- Computer ScienceF1000Research
- 2022
The proposed SSAE approach identifies the affected region using U-Net architecture and systematic sampling procedure and proves the similarity between pathologically identified images and the corresponding annotated images using a statistical evaluation.
Transformers Meet Visual Learning Understanding: A Comprehensive Review
- Computer ScienceArXiv
- 2022
This review mainly investigates the current research progress of Transformer in image and video applications, which makes a comprehensive overview of Trans transformer in visual learning understanding.
TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation
- Computer ScienceArXiv
- 2021
A transformer framework for multi-view 3D pose estimation, aiming at directly improving individual 2D predictors by integrating information from different views is introduced, and the concept of epipolar field to encode 3D positional information into the transformer model is proposed.
A survey on attention mechanisms for medical applications: are we moving towards better algorithms?
- Computer ScienceArXiv
- 2022
This paper concludes with a critical analysis of the claims and potentialities presented in the literature about attention mechanisms and proposes future research lines in medical applications that may benefit from these frameworks.
Open-world active learning for echocardiography view classification
- Computer ScienceMedical Imaging 2022: Computer-Aided Diagnosis
- 2022
This work developed an open world active learning approach for echocardiography view classification, where the network classifies images of known views into their respective classes and identifies images of unknown views through a clustering approach.
SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation
- Computer Science2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2022
An unsupervised method that operates on a corpus of unlabeled videos and predicts a likely set of temporal segments across the videos, which achieves state-of-the-art performance on all datasets and can even outperform some weakly-supervised approaches, demonstrating its effectiveness and generalizability.
Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow
- Computer ScienceArXiv
- 2022
A new model called Neural Diffeomorphic Flow (NDF) is proposed to learn deep implicit shape templates, representing shapes as conditional diffeomorphic deformations of templates, intrinsically preserving shape topologies.
References
SHOWING 1-10 OF 56 REFERENCES
Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation
- Computer ScienceArXiv
- 2021
Under the direct downsampling and up-sampling of the inputs and outputs by 4×, experiments demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution.
UNETR: Transformers for 3D Medical Image Segmentation
- Computer Science2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2022
This work reformulates the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem and introduces a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information.
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
- Computer ScienceArXiv
- 2021
It is argued that Transformers can serve as strong encoders for medical image segmentation tasks, with the combination of U-Net to enhance finer details by recovering localized spatial information.
UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A novel UNet 3+ is proposed, which takes advantage of full-scale skip connections and deep supervisions, and can reduce the network parameters to improve the computation efficiency.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
- Computer ScienceDLMIA/ML-CDS@MICCAI
- 2018
This paper presents UNet++, a new, more powerful architecture for medical image segmentation where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways, and argues that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar.
Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation
- Computer Science2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2021
A new framework for combining 3D and 2D models is proposed, in which the segmentation is realized through high-resolution 2D convolutions, but guided by spatial contextual information extracted from a low-resolution 3D model.
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
- Computer Science2016 Fourth International Conference on 3D Vision (3DV)
- 2016
This work proposes an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network, trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once.
Automatic Pulmonary Lobe Segmentation Using Deep Learning
- Computer Science2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)
- 2019
This work proposes pre-processing CT image by cropping region that is covered by the convex hull of the lungs in order to mitigate the influence of noise from outside the lungs, and uses a hybrid loss function with dice loss to tackle extreme class imbalance issue and focal loss to force model to focus on voxels that are hard to be discriminated.
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
A new framework for few-shot medical image segmentation based on prototypical networks based on a context relation encoder that uses correlation to capture local relation features between foreground and background regions and a recurrent mask refinement module that repeatedly uses the CRE and a prototypical network to recapture the change of context relationship and refine the segmentation mask iteratively.
Multiple Slice k-space Deep Learning for Magnetic Resonance Imaging Reconstruction
- Physics2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)
- 2020
A fully data-driven deep learning algorithm for k-space interpolation, utilizing the correlation information between the target slice and its neighboring slices, and a novel network is proposed, which models the inter-dependencies between different slices.