• Corpus ID: 244709538

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers

  title={Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers},
  author={John Guibas and Morteza Mardani and Zong-Yi Li and Andrew Tao and Anima Anandkumar and Bryan Catanzaro},
Vision transformers have delivered tremendous success in representation learning. This is primarily due to effective token mixing through self-attention. However, this scales quadratically with the number of pixels, which becomes infeasible for high-resolution inputs. To cope with this challenge, we propose Adaptive Fourier Neural Operator (AFNO) as an efficient token mixer that learns to mix in the Fourier domain. AFNO is based on a principled foundation of operator learning which allows us to… 

Figures and Tables from this paper

IAE-Net: Integral Autoencoders for Discretization-Invariant Learning
A novel deep learning framework based on integral autoencoders (IAE-Net) for discretization invariant learning that achieves state-of-the-art performance in existing applications and creates a wide range of new applications where existing methods fail.
Towards Large-Scale Learned Solvers for Parametric PDEs with Model-Parallel Fourier Neural Operators
This work proposes a model-parallel version of FNOs based on domain-decomposition of both the input data and network weights that is able to predict time-varying PDE solutions of over 3.2 billion variables on Summit using up to 768 GPUs.
FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators
FourCastNet, short for Fourier ForeCasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at 0.25◦ resolution.
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
This paper analyzes attention through the lens of convex duality, and shows how self-attention networks implicitly clusters the tokens, based on their latent similarity, in the latent feature and token dimensions.


Global Filter Networks for Image Classification
The Global Filter Network is presented, a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity and can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
Axial Attention in Multidimensional Transformers
Axial Transformers is proposed, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors that maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation.
Image Transformer
This work generalizes a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood, and significantly increases the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
  • Ze Liu, Yutong Lin, B. Guo
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.
Generating Long Sequences with Sparse Transformers
This paper introduces sparse factorizations of the attention matrix which reduce this to $O(n)$, and generates unconditional samples that demonstrate global coherence and great diversity, and shows it is possible in principle to use self-attention to model sequences of length one million or more.
Long-Short Transformer: Efficient Transformers for Language and Vision
The proposed Long-Short Transformer (Transformer-LS), an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks, aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations.
Pay Attention to MLPs
This work proposes a simple attention-free network architecture, gMLP, based solely on MLPs with gating, and shows that it can perform as well as Transformers in key language and vision applications and can scale as much as Transformers over increased data and compute.
Linformer: Self-Attention with Linear Complexity
This paper demonstrates that the self-attention mechanism of the Transformer can be approximated by a low-rank matrix, and proposes a new self-Attention mechanism, which reduces the overall self-ATTention complexity from $O(n^2)$ to $O (n)$ in both time and space.
Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel
A new formulation of attention via the lens of the kernel is presented, which models the input as a product of symmetric kernels and achieves competitive performance to the current state of the art model with less computation.
DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators
This work proposes deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset, and demonstrates that DeepONet significantly reduces the generalization error compared to the fully-connected networks.