CARAFE++: Unified Content-Aware ReAssembly of FEatures

  title={CARAFE++: Unified Content-Aware ReAssembly of FEatures},
  author={Jiaqi Wang and Kai Chen and Rui Xu and Ziwei Liu and Chen Change Loy and Dahua Lin},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  • Jiaqi WangKai Chen Dahua Lin
  • Published 7 December 2020
  • Computer Science
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature reassembly, i.e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e.g., residual networks and feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight, and highly effective operator to fulfill this goal. CARAFE++ has several appealing properties… 

FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling

FADE reveals its effectiveness and task-agnostic characteris-tic by consistently outperforming recent dynamic upsampling operators in different tasks, and generalizes well across convolutional and transformer architectures with little computational overhead.

SAPA: Similarity-Aware Point Affiliation for Feature Upsampling

A lightweight upsampling operator, termed Similarity-Aware Point Affiliation (SAPA), is instantiate and experiments show the effectiveness of SAPA and also indicate its limitation: it is more suitable for tasks that favor detail delineation.

Dual Complementary Dynamic Convolution for Image Recognition

The DCDC operator overcomes the limitations of vanilla convolution and most existing dynamic convolutions who capture only spatial-adaptive features, and thus markedly boosts the representation capacities of CNNs.

CATNet: Context AggregaTion Network for Instance Segmentation in Remote Sensing Images

A novel context aggregation network (CATNet) is proposed to improve the feature extraction process of instance segmentation in remote sensing images and demonstrates that the proposed approach outperforms state-of-the-arts with similar computational costs.

M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images

A many-to-many reassembly of features (M2MRF) that reassembles features in a dimensionreduced feature space and simultaneously aggregates multiple features inside a large predefined region into multiple target features.

Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images

A novel context aggregation network (CATNet) is proposed to improve the feature extraction process of instance segmentation in remote sensing images, and outperforms state-of-the-arts under similar computational costs.

Seesaw Loss for Long-Tailed Instance Segmentation

This work proposes Seesaw Loss to dynamically re-balance gradients of positive and negative samples for each category, with two complementary factors, i.e., mitigation factor and compensation factor, which obtains significant gains over Cross-Entropy Loss and achieves state-of-the-art performance on LVIS dataset without bells and whistles.

COVID ‐19 lung infection segmentation from chest CT images based on CAPA‐ResUNet

A deep learning model, namely, the content‐aware pre‐activated residual UNet (CAPA‐ResUNet), was proposed for segmenting COVID‐19 lesions from CT slices and gains an advantage over other models in multiple metrics.

Seesaw Loss for Long-Tailed Instance Segmentation Supplementary Materials

Jiaqi Wang Wenwei Zhang Yuhang Zang Yuhang Cao Jiangmiao Pang Tao Gong Kai Chen Ziwei Liu Chen Change Loy Dahua Lin SenseTime-CUHK Joint Lab, The Chinese University of Hong Kong S-Lab, Nanyang

ELSA: Enhanced Local Self-Attention for Vision Transformer

The devil lies in the generation and application of spatial attention, where relative position embeddings and the neighboring filter application are key factors and the enhanced local self-attention (ELSA) with Hadamard attention and the ghost head is proposed.



CARAFE: Content-Aware ReAssembly of FEatures

Comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and inpainting show CARAFE shows consistent and substantial gains across all the tasks, and has great potential to serve as a strong building block for future research.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Guided Upsampling Network for Real-Time Semantic Segmentation

A Neural Network named Guided Upsampling Network which consists of a multiresolution architecture that jointly exploits high-resolution and large context information that can be plugged into any existing encoder-decoder architecture with little modifications and low additional computation cost is proposed.

Deep Feature Pyramid Reconfiguration for Object Detection

A novel reconfiguration architecture is proposed to combine low-level representations with high-level semantic features in a highly-nonlinear yet efficient way to gather task-oriented features across different spatial locations and scales, globally and locally.

M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network

A powerful end-to-end one-stage object detector called M2Det is designed and train by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one- stage detectors.

PSANet: Point-wise Spatial Attention Network for Scene Parsing

The point-wise spatial attention network (PSANet) is proposed to relax the local neighborhood constraint and achieves top performance on various competitive scene parsing datasets, including ADE20K, PASCAL VOC 2012 and Cityscapes, demonstrating its effectiveness and generality.

Context Encoders: Feature Learning by Inpainting

It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.

Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade

A novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation and is an end-to-end trainable framework, allowing joint learning of all sub-models.

Feature Pyramid Networks for Object Detection

This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

Detail-Preserving Pooling in Deep Networks

Inspired by the human visual system, which focuses on local spatial changes, DPP is proposed, an adaptive pooling method that magnifies spatial changes and preserves important structural detail and can be learned jointly with the rest of the network.