• Publications
  • Influence
Anticipating Accidents in Dashcam Videos
TLDR
The DSA-RNN learns to distribute soft-attention to candidate objects dynamically to gather subtle cues and model the temporal dependencies of all cues to robustly anticipate an accident, and achieves the highest mean average precision (74.35%) outperforming other baselines without attention or RNN. Expand
No More Discrimination: Cross City Adaptation of Road Scene Segmenters
TLDR
This work proposes an unsupervised learning approach to adapt road scene segmenters across different cities by advancing a joint global and class-specific domain adversarial learning framework, and shows that this method improves the performance of semantic segmentation in multiple cities across continents. Expand
A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss
TLDR
By end-to-end training the model with the inconsistency loss and original losses of extractive and abstractive models, the model achieves state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset in a solid human evaluation. Expand
Learning to Compose with Professional Photographs on the Web
TLDR
This work forms the photo composition problem as a view finding process which successively examines pairs of views and determines their aesthetic preferences, and exploits the rich professional photographs on the web to mine unlimited high-quality ranking samples and demonstrates that an aesthetics-aware deep ranking network can be trained without explicitly modeling any photographic rules. Expand
Conditional regression forests for human pose estimation
TLDR
A conditional regression forest model for human pose estimation that incorporates dependency relationships between output variables through a global latent variable while still maintaining a low computational cost is presented. Expand
Leveraging Video Descriptions to Learn Video Question Answering
TLDR
A scalable approach to learn video-based question answering (QA): answer a "free-form natural language question" about a video content and a self-paced learning procedure to iteratively identify non-perfect candidate QA pairs is proposed and shown to be effective. Expand
Learning 3-D Scene Structure from a Single Still Image
TLDR
This work considers the problem of estimating detailed 3D structure from a single still image of an unstructured environment and uses a Markov random field (MRF) to infer a set of "plane parameters" that capture both the 3D location and 3D orientation of the patch. Expand
Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories
TLDR
This work proposes a new 3D object class model that is capable of recognizing unseen views by pose estimation and synthesis and performs superiorly to and on par with state-of-the-art algorithms on the Savarese et al. 2007 and PASCAL datasets in object detection. Expand
HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation
TLDR
The proposed network, HorizonNet, trained for predicting 1D layout, outperforms previous state-of-the-art approaches and can diversify panorama data and be applied to other panorama-related learning tasks. Expand
Learning 3-D Scene Structure from a Single Still Image
TLDR
This work considers the problem of estimating detailed 3D structure from a single still image of an unstructured environment and uses a Markov random field (MRF) to infer a set of "plane parameters" that capture both the 3D location and 3D orientation of the patch. Expand
...
1
2
3
4
5
...