• Publications
  • Influence
Boosting Adversarial Attacks with Momentum
TLDR
A broad class of momentum-based iterative algorithms to boost adversarial attacks by integrating the momentum term into the iterative process for attacks, which can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples.
Learning Efficient Convolutional Networks through Network Slimming
TLDR
The approach is called network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy.
DSOD: Learning Deeply Supervised Object Detectors from Scratch
TLDR
Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch following the single-shot detection (SSD) framework, and one of the key findings is that deep supervision, enabled by dense layer-wise connections, plays a critical role in learning a good detector.
One step beyond histograms: Image representation using Markov stationary features
TLDR
The MSF characterizes the spatial co-occurrence of histogram patterns by Markov chain models, and finally yields a compact feature representation through Markov stationary analysis, which goes one step beyond histograms since it now involves spatial structure information of both within histogram bins and between histograms.
Learning SURF Cascade for Fast and Accurate Object Detection
TLDR
A novel learning framework for training boosting cascade based object detector from large scale dataset derived from the well-known Viola-Jones (VJ) framework that can train object detectors from billions of negative samples within one hour even on personal computers.
Weakly Supervised Dense Video Captioning
TLDR
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences, and proposes lexical fully convolutional neural networks with weakly supervised multi-instance multi-label learning to weakly link video regions with lexical labels.
Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages
TLDR
Tiny-DSOD is proposed, based on the deeply supervised object detection (DSOD) framework, which introduces two innovative and ultra-efficient architecture blocks: depthwise dense block (DDB) based backbone and depthwise feature-pyramid-network (D-FPN) based front-end.
NuActiv: recognizing unseen new activities using semantic attribute-based learning
TLDR
NuActiv is presented, an activity recognition system that can recognize a human activity even when there are no training data for that activity class, and a two-layer zero-shot learning algorithm developed for activity recognition using semantic attribute-based learning.
BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera
TLDR
This work proposes BodyFusion, a novel real-time geometry fusion method that can track and reconstruct non-rigid surface motion of a human performance using a single consumer-grade depth camera and contributes a skeleton-embedded surface fusion (SSF) method.
Face detection using SURF cascade
TLDR
A novel boosting cascade based face detection framework using SURF features that is able to train face detectors within one hour through scanning billions of negative samples on current personal computers and is comparable to the state-of-the-art algorithm.
...
1
2
3
4
5
...