• Publications
  • Influence
R-FCN: Object Detection via Region-based Fully Convolutional Networks
This work presents region-based, fully convolutional networks for accurate and efficient object detection, and proposes position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Expand
Deformable Convolutional Networks
This work introduces two new modules to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI pooling, based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from the target tasks, without additional supervision. Expand
Instance-Aware Semantic Segmentation via Multi-task Network Cascades
  • Jifeng Dai, Kaiming He, Jian Sun
  • Computer Science
  • IEEE Conference on Computer Vision and Pattern…
  • 14 December 2015
This paper presents Multitask Network Cascades for instance-aware semantic segmentation, which consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects, and develops an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Expand
Flow-Guided Feature Aggregation for Video Object Detection
This work presents flow-guided feature aggregation, an accurate and end-to-end learning framework for video object detection that improves the per-frame features by aggregation of nearby features along the motion paths, and thus improves the video recognition accuracy. Expand
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
A new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT), which adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. Expand
Deep Feature Flow for Video Recognition
Deep feature flow is presented, a fast and accurate framework for video recognition that runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field and achieves significant speedup as flow computation is relatively fast. Expand
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference, can achieve better performance than DETR (especially on small objects) with 10$\times less training epochs. Expand
Fully Convolutional Instance-Aware Semantic Segmentation
The first fully convolutional end-to-end solution for instance-aware semantic segmentation task, which achieves state-of-the-art performance in both accuracy and efficiency, wins the COCO 2016 segmentation competition by a large margin. Expand
Deformable ConvNets V2: More Deformable, Better Results
This work presents a reformulation of Deformable Convolutional Networks that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training, and guides network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. Expand
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
This paper proposes to use scribbles to annotate images, and develops an algorithm to train convolutional networks for semantic segmentation supervised by scribbles, which shows excellent results on the PASCALCONTEXT dataset thanks to extra inexpensive scribble annotations. Expand