• Corpus ID: 244117374

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

  title={Searching for TrioNet: Combining Convolution with Local and Global Self-Attention},
  author={Huaijin Pi and Huiyu Wang and Yingwei Li and Zizhang Li and Alan Loddon Yuille},
Recently, self-attention operators have shown superior performance as a stand-alone building block for vision models. However, existing self-attention models are often hand-designed, modified from CNNs, and obtained by stacking one operator only. A wider range of architecture space which combines different self-attention operators and convolution is rarely explored. In this paper, we explore this novel architecture space with weight-sharing Neural Architecture Search (NAS) algorithms. The result… 

Figures and Tables from this paper


Stand-Alone Self-Attention in Vision Models
The results establish that stand-alone self-attention is an important addition to the vision practitioner's toolbox and is especially impactful when used in later layers.
Attention Augmented Convolutional Networks
It is found that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a state-of-the art mobile constrained network, while keeping the number of parameters similar.
A2-Nets: Double Attention Networks
This work proposes the "double attention block", a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access featuresFrom the entire space efficiently.
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
This work presents Block-wisely Self-supervised Neural Architecture Search (BossNAS), an unsupervised NAS method that addresses the problem of inaccurate architecture rating caused by large weight-sharing space and biased supervision in previous methods.
Exploring Self-Attention for Image Recognition
This work considers two forms of self-attention, pairwise and patchwise, which generalizes standard dot-product attention and is fundamentally a set operator and strictly more powerful than convolution.
AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification
This work proposes a novel search space for spatiotemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell, and demonstrates strong generalization across different modalities, backbones, and datasets.
CCNet: Criss-Cross Attention for Semantic Segmentation
This work proposes a Criss-Cross Network (CCNet) for obtaining contextual information in a more effective and efficient way and achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
The adopted Neural Architecture Search is adopted and a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections is discovered, named NAS-FPN, which achieves better accuracy and latency tradeoff compared to state-of-the-art object detection models.
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
This paper presents a network level search space that includes many popular designs, and develops a formulation that allows efficient gradient-based architecture search and demonstrates the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets.