Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Ranjay Krishna, Yuke Zhu, Li Fei-Fei
- Computer ScienceInternational Journal of Computer Vision
- 23 February 2016
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Scene Graph Generation by Iterative Message Passing
- Danfei Xu, Yuke Zhu, C. Choy, Li Fei-Fei
- Computer ScienceComputer Vision and Pattern Recognition
- 10 January 2017
This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image.
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
- Chen Wang, Danfei Xu, S. Savarese
- Computer ScienceComputer Vision and Pattern Recognition
- 15 January 2019
DenseFusion is a generic framework for estimating 6D pose of a set of known objects from RGB-D images that processes the two data sources individually and uses a novel dense fusion network to extract pixel-wise dense feature embedding, from which the pose is estimated.
AI2-THOR: An Interactive 3D Environment for Visual AI
- Eric Kolve, Roozbeh Mottaghi, Ali Farhadi
- Computer ScienceArXiv
- 14 December 2017
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.
Visual7W: Grounded Question Answering in Images
- Yuke Zhu, O. Groth, Michael S. Bernstein, Li Fei-Fei
- Computer ScienceComputer Vision and Pattern Recognition
- 11 November 2015
A semantic link between textual descriptions and image regions by object-level grounding enables a new type of QA with visual answers, in addition to textual answers used in previous work, and proposes a novel LSTM model with spatial attention to tackle the 7W QA tasks.
Target-driven visual navigation in indoor scenes using deep reinforcement learning
- Yuke Zhu, Roozbeh Mottaghi, Ali Farhadi
- Computer ScienceIEEE International Conference on Robotics and…
- 16 September 2016
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine.
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
- Ajay Mandlekar, Danfei Xu, Roberto Mart'in-Mart'in
- Computer ScienceConference on Robot Learning
- 6 August 2021
This study analyzes the most critical challenges when learning from offline human data for manipulation and highlights opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods.
SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark
- Linxi (Jim) Fan, Yuke Zhu, Li Fei-Fei
- Computer ScienceConference on Robot Learning
- 23 October 2018
SURREAL, an open-source scalable framework that supports state-of-the-art distributed reinforcement learning algorithms, is introduced, which demonstrates that SURREAL algorithms outperform existing opensource implementations in both agent performance and learning efficiency.
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
- Michelle A. Lee, Yuke Zhu, J. Bohg
- Computer ScienceIEEE International Conference on Robotics and…
- 24 October 2018
This work uses self-supervision to learn a compact and multimodal representation of sensory inputs, which can then be used to improve the sample efficiency of the policy learning of deep reinforcement learning algorithms.
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
- Yuke Zhu, Ziyun Wang, N. Heess
- Computer ScienceRobotics: Science and Systems
- 26 February 2018
This work proposes a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent and trains end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities.
...
...