Non-local Neural Networks
- X. Wang, Ross B. Girshick, A. Gupta, Kaiming He
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 21 November 2017
This paper presents non-local operations as a generic family of building blocks for capturing long-range dependencies in computer vision and improves object detection/segmentation and pose estimation on the COCO suite of tasks.
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
- Gunnar A. Sigurdsson, Gül Varol, X. Wang, Ali Farhadi, I. Laptev, A. Gupta
- Computer ScienceEuropean Conference on Computer Vision
- 6 April 2016
This work proposes a novel Hollywood in Homes approach to collect data, collecting a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities, and evaluates and provides baseline results for several tasks including action recognition and automatic description generation.
Videos as Space-Time Region Graphs
The proposed graph representation achieves state-of-the-art results on the Charades and Something-Something datasets and obtains a huge gain when the model is applied in complex environments.
Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs
This paper builds upon the recently introduced Graph Convolutional Network (GCN) and proposes an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers, and shows that it is robust to noise in the KG.
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
- Yu Sun, X. Wang, Zhuang Liu, John Miller, Alexei A. Efros, Moritz Hardt
- Computer ScienceInternational Conference on Machine Learning
- 29 September 2019
This work turns a single unlabeled test sample into a self-supervised learning problem, on which the model parameters are updated before making a prediction, which leads to improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts.
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
- X. Wang, Abhinav Shrivastava, A. Gupta
- Computer ScienceComputer Vision and Pattern Recognition
- 11 April 2017
This paper proposes to learn an adversarial network that generates examples with occlusions and deformations, the goal of the adversary is to generate examples that are difficult for the object detector to classify and both the original detector and adversary are learned in a joint manner.
Learning Correspondence From the Cycle-Consistency of Time
- X. Wang, A. Jabri, Alexei A. Efros
- Computer ScienceComputer Vision and Pattern Recognition
- 18 March 2019
A self-supervised method to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch and demonstrates the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow.
Visual Semantic Navigation using Scene Priors
- Wei Yang, X. Wang, Ali Farhadi, A. Gupta, Roozbeh Mottaghi
- Computer ScienceInternational Conference on Learning…
- 27 September 2018
This work proposes to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework and shows how semantic knowledge improves performance significantly and improves in generalization to unseen scenes and/or objects.
Actions ~ Transformations
- X. Wang, Ali Farhadi, A. Gupta
- Computer ScienceComputer Vision and Pattern Recognition
- 2 December 2015
A novel representation for actions is proposed by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).
Unsupervised Learning of Visual Representations Using Videos
A simple yet surprisingly powerful approach for unsupervised learning of CNN that uses hundreds of thousands of unlabeled videos from the web to learn visual representations and designs a Siamese-triplet network with a ranking loss function to train this CNN representation.
...
...