• Publications
  • Influence
Precise Detection in Densely Packed Scenes
TLDR
This work proposes a novel, deep-learning based method for precise object detection, designed for such challenging settings as packed retail environments, and shows the method to outperform existing state-of-the-art with substantial margins. Expand
Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
TLDR
A novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action is proposed, which is applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches. Expand
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
TLDR
This paper takes an axiomatic perspective to derive the desired properties and invariances of a such network to certain input permutations, presenting a structural characterization that is provably both necessary and sufficient. Expand
Learning Canonical Representations for Scene Graph to Image Generation
TLDR
This work presents a novel model that addresses semantic equivalence issues in graphs by learning canonical graph representations from the data, resulting in improved image generation for complex visual scenes. Expand
Accurate Visual Localization for Automotive Applications
TLDR
This work proposes a hybrid coarse-to-fine approach that leverages visual and GPS location cues and uses a self-supervised approach to learn a compact road image representation that is highly effective in challenging urban environments and reduces localization error by an order of magnitude. Expand
Compositional Video Synthesis with Action Graphs
TLDR
This work introduces a generative model (AG2Vid) based on Action Graphs, a natural and convenient structure that represents the dynamics of actions between objects over time, allowing for more accurate generation of videos. Expand
Spatio-Temporal Action Graph Networks
TLDR
This work proposes a novel inter-object graph representation for activity recognition based on a disentangled graph embedding with direct observation of edge appearance, and offers significantly improved performance compared to baseline approaches without object-graph representations, or with previous graph-based models. Expand
Classifying Collisions with Spatio-Temporal Action Graph Networks
TLDR
It is shown that a new model for explicit representation of object interactions significantly improves deep video activity classification for driving collision detection and proposes a Spatio-Temporal Action Graph (STAG) network, which incorporates spatial and temporal relations of objects. Expand
Differentiable Scene Graphs
TLDR
Differentiable Scene Graphs (DSGs) are proposed, an image representation that is amenable to differentiable end-to-end optimization, and requires supervision only from the downstream tasks. Expand
Learning Object Detection from Captions via Textual Scene Attributes
TLDR
This work argues that captions contain much richer information about the image, including attributes of objects and their relations, and presents a method that uses the attributes in this "textual scene graph" to train object detectors. Expand
...
1
2
...