• Publications
  • Influence
Semantic Image Inpainting with Deep Generative Models
A novel method for semantic image inpainting, which generates the missing content by conditioning on the available data, and successfully predicts information in large missing regions and achieves pixel-level photorealism, significantly outperforming the state-of-the-art methods. Expand
Efficient Deep Learning for Stereo Matching
This paper proposes a matching network which is able to produce very accurate results in less than a second of GPU computation, and exploits a product layer which simply computes the inner product between the two representations of a siamese architecture. Expand
VideoMatch: Matching based Video Object Segmentation
This work develops a novel matching based algorithm for video object segmentation that learns to match extracted features to a provided template without memorizing the appearance of the objects. Expand
Convolutional Image Captioning
This paper develops a convolutional image captioning technique that demonstrates efficacy on the challenging MSCOCO dataset and demonstrates performance on par with the LSTM baseline, while having a faster training time per number of parameters. Expand
Fully Connected Deep Structured Networks
This work unifies this two-stage process for semantic segmentation into a single joint training algorithm and demonstrates the method on the semantic image segmentation task and shows encouraging results on the challenging PASCAL VOC 2012 dataset. Expand
Generative Modeling Using the Sliced Wasserstein Distance
This work considers an alternative formulation for generative modeling based on random projections which, in its simplest form, results in a single objective rather than a saddle-point formulation and finds its approach to be significantly more stable compared to even the improved Wasserstein GAN. Expand
Factor Graph Attention
This work designs a factor graph based attention mechanism for visual dialog which operates on any number of data utilities and illustrates the applicability on the challenging and recently introduced VisDial datasets, outperforming recent state-of-the-art methods. Expand
Creativity: Generating Diverse Questions Using Variational Autoencoders
This paper proposes a creative algorithm for visual question generation which combines the advantages of variational autoencoders with long short-term memory networks and demonstrates that this framework is able to generate a large set of varying questions given a single input image. Expand
Learning Deep Structured Models
This paper proposes a training algorithm that is able to learn structured models jointly with deep features that form the MRF potentials and demonstrates the effectiveness of this algorithm in the tasks of predicting words from noisy images, as well as tagging of Flickr photographs. Expand
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms moreExpand