• Publications
  • Influence
Non-local Neural Networks
TLDR
In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies with deep neural networks. Expand
  • 1,921
  • 363
  • PDF
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
TLDR
We propose a novel Hollywood in Homes approach to collect a large-scale dataset of boring videos of daily activities. Expand
  • 419
  • 104
  • PDF
Videos as Space-Time Region Graphs
TLDR
In this paper, we propose to represent videos as space-time region graphs which capture temporal shape dynamics and modeling functional relationships between humans and objects. Expand
  • 271
  • 47
  • PDF
Unsupervised Learning of Visual Representations Using Videos
  • X. Wang, A. Gupta
  • Computer Science
  • IEEE International Conference on Computer Vision…
  • 4 May 2015
TLDR
We present a simple yet surprisingly powerful approach for unsupervised learning of ConvNets using hundreds of thousands of unlabeled videos from the web to learn visual representations. Expand
  • 339
  • 33
  • PDF
Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs
TLDR
We build upon the recently introduced Graph Convolutional Network and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Expand
  • 204
  • 30
  • PDF
Actions ~ Transformations
TLDR
We propose a novel representation for actions by modeling an action as transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect). Expand
  • 173
  • 23
  • PDF
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
TLDR
We propose to learn an adversarial network that generates examples with occlusions and deformations. Expand
  • 296
  • 22
  • PDF
Designing deep networks for surface normal estimation
TLDR
We use CNNs for the task of predicting surface normals from a single image. Expand
  • 248
  • 19
  • PDF
Learning Correspondence From the Cycle-Consistency of Time
TLDR
We introduce a self-supervised method for learning visual correspondence from unlabeled video. Expand
  • 115
  • 14
  • PDF
3D Human Pose Estimation in the Wild by Adversarial Learning
TLDR
In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated constrained 3D pose dataset to in-the-wild images with only 2D pose annotations. Expand
  • 158
  • 13
  • PDF