• Publications
  • Influence
(CAD)$^2$RL: Real Single-Image Flight without a Single Real Image
TLDR
This paper proposes a learning method that they call CAD$^2$RL, which can be used to perform collision-free indoor flight in the real world while being trained entirely on 3D CAD models, and shows that it can train a policy that generalizes to thereal world, without requiring the simulator to be particularly realistic or high-fidelity.
VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases
TLDR
This work introduces the problem of visual verification of relation phrases and developed a Visual Knowledge Extraction system called VisKE, which has been used to not only enrich existing textual knowledge bases by improving their recall, but also augment open-domain question-answer reasoning.
Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control
TLDR
This paper trains a deep recurrent controller that can automatically determine which actions move the end-effector of a robotic arm to a desired object and describes how the resulting model can be transferred to a real-world robot by disentangling perception from control and only adapting the visual layers.
Probabilistic Label Trees for Efficient Large Scale Image Classification
TLDR
This paper shows how the parameters of the label tree can be found using maximum likelihood estimation, and produces a label tree with significantly improved recognition accuracy.
Latent Pyramidal Regions for Recognizing Scenes
TLDR
The proposed LPR representation obtains state-of-the-art results on all these datasets which shows that it can simultaneously model the global and local scene characteristics in a single framework and is general enough to be used for both indoor and outdoor scene classification.
Visalogy: Answering Visual Analogy Questions
TLDR
This paper introduces a dataset of visual analogy questions in natural images, and shows first results of its kind on solving analogy questions on natural images.
Learning to Select and Order Vacation Photographs
TLDR
This work proposes a discriminative structured model capable of encoding simple preferences for contextual layout of the scene and ordering between photos, and allows automatic composition of photo albums from unordered and untagged collections of images.
Incorporating Scene Context and Object Layout into Appearance Modeling
TLDR
This paper proposes a method to learn scene structures that can encode three main interlacing components of a scene: the scene category, the context-specific appearance of objects, and their layout.
DIViS: Domain Invariant Visual Servoing for Collision-Free Goal Reaching
TLDR
DIViS is proposed, a Domain Invariant policy learning approach for collision free Visual Servoing that incorporates high level semantics from previously collected static human-labeled datasets and learns collision free servoing entirely in simulation and without any real robot data.
MuSHR: A Low-Cost, Open-Source Robotic Racecar for Education and Research
We present MuSHR, the Multi-agent System for non-Holonomic Racing. MuSHR is a low-cost, open-source robotic racecar platform for education and research, developed by the Personal Robotics Lab in the
...
1
2
3
...