• Publications
  • Influence
In defense of soft-assignment coding
TLDR
In object recognition, soft-assignment coding enjoys computational efficiency and conceptual simplicity. Expand
  • 444
  • 62
  • PDF
What Value Do Explicit High Level Concepts Have in Vision to Language Problems?
TLDR
We propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. Expand
  • 298
  • 35
  • PDF
Deep learning features at scale for visual place recognition
TLDR
In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. Expand
  • 136
  • 20
  • PDF
From Motion Blur to Motion Flow: A Deep Learning Solution for Removing Heterogeneous Motion Blur
TLDR
We propose the first universal end-to-end mapping from the blurred image to the dense motion flow and recover the unblurred image from the estimated motion flow. Expand
  • 135
  • 19
  • PDF
Seeing Deeply and Bidirectionally: A Deep Learning Approach for Single Image Reflection Removal
TLDR
We propose a cascade deep neural network, which estimates both the background image and the reflection. Expand
  • 40
  • 18
  • PDF
Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation
TLDR
In this paper, we propose a novel structure-aware convolutional network for pose estimation, termed Adversarial PoseNet, which trains a multi-task pose generator with two discriminator networks. Expand
  • 168
  • 15
  • PDF
Towards Context-Aware Interaction Recognition for Visual Relationship Detection
TLDR
This paper proposes an alternative, context-aware interaction recognition framework which combines the context, and the interaction. Expand
  • 73
  • 15
  • PDF
Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression
TLDR
We propose an objective function which can simultaneously suppress the noisy signal in the text and learn a function to match the text document and visual features. Expand
  • 112
  • 14
  • PDF
Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors
TLDR
We propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. Expand
  • 78
  • 12
  • PDF
Mid-level deep pattern mining
TLDR
In this paper, building on the well-known association rule mining, we propose a pattern mining algorithm, Midlevel Deep Pattern Mining (MDPM), to study the problem of mid-level visual element discovery. Expand
  • 80
  • 12
  • PDF