• Publications
  • Influence
Learning Efficient Convolutional Networks through Network Slimming
TLDR
The approach is called network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy.
DSOD: Learning Deeply Supervised Object Detectors from Scratch
TLDR
Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch following the single-shot detection (SSD) framework, and one of the key findings is that deep supervision, enabled by dense layer-wise connections, plays a critical role in learning a good detector.
TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection
TLDR
This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C), developed based on two key properties of desirable bounding box: high purity and high completeness.
Multiple Granularity Descriptors for Fine-Grained Categorization
TLDR
This work leverages the fact that a subordinate-level object already has other labels in its ontology tree to train a series of CNN-based classifiers, each specialized at one grain level, which outperforms state-of-the-art algorithms, including those requiring strong labels.
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions
TLDR
This paper proposes to generalize the traditional Sign and PReLU functions to enable explicit learning of the distribution reshape and shift at near-zero extra cost and shows that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
Weakly Supervised Dense Video Captioning
TLDR
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences, and proposes lexical fully convolutional neural networks with weakly supervised multi-instance multi-label learning to weakly link video regions with lexical labels.
SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses
TLDR
A gradient detach based stacked complementary losses (SCL) method that uses detection losses as the primary objective, and cuts in several auxiliary losses in different network stages accompanying with gradient detach training to learn more discriminative representations.
Attentive Cutmix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification
TLDR
Attentive CutMix is proposed, a naturally enhanced augmentation strategy based on CutMix that consistently outperforms the baseline CutMix and other methods by a significant margin, and can boost the baseline significantly.
Towards Instance-Level Image-To-Image Translation
TLDR
This paper presents a simple yet effective instance-aware image-to-image translation approach (INIT), which employs the fine-grained local (instance) and global styles to the target image spatially and collects a large-scale benchmark for the new instance-level translation task.
MEAL: Multi-Model Ensemble via Adversarial Learning
TLDR
This paper proposes an adversarial-based learning strategy where a block-wise training loss is defined to guide and optimize the predefined student network to recover the knowledge in teacher models, and to promote the discriminator network to distinguish teacher vs. student features simultaneously.
...
1
2
3
4
5
...