• Publications
  • Influence
Visual saliency estimation by nonlinearly integrating features using region covariances.
TLDR
This work proposes to use covariance matrices of simple image features (known as region covariance descriptors in the computer vision community) as meta-features for saliency estimation and demonstrates that the proposed approach outperforms the state-of-art models on various tasks including prediction of human eye fixations, salient object detection, and image-retargeting.
Structure-preserving image smoothing via region covariances
TLDR
This study proposes an alternative yet simple image smoothing approach which depends on covariance matrices of simple image features, aka the region covariances, and uses second order statistics as a patch descriptor to implicitly capture local structure and texture information.
Re-evaluating Automatic Metrics for Image Captioning
TLDR
This paper provides an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments and explores the utilization of the recently proposed Word Mover’s Distance document metric for the purpose of image Captioning.
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
TLDR
This survey classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space.
Image Synthesis in Multi-Contrast MRI With Conditional Generative Adversarial Networks
TLDR
The proposed approach preserves intermediate-to-high frequency details via an adversarial loss, and it offers enhanced synthesis performance via pixel-wise and perceptual losses for registered multi-contrast images and a cycle-consistency loss for unregistered images.
Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction
TLDR
The use of deep learning for dynamic saliency prediction and the so-called spatio-temporal saliency networks, where the architecture of two-stream networks are investigated to integrate spatial and temporal information are proposed.
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
TLDR
This work introduces RecipeQA, a dataset for multimodal comprehension of cooking recipes, a set of comprehension and reasoning tasks that require joint understanding of images and text, capturing the temporal flow of events and making sense of procedural knowledge.
Image Matting with KL-Divergence Based Sparse Sampling
TLDR
This paper proposes to pick a small set of candidate samples that best explains the unknown pixels in sampling as a sparse subset selection problem, and describes a new distance measure for comparing two samples which is based on KL-divergence between the distributions of features extracted in the vicinity of the samples.
Top down saliency estimation via superpixel-based discriminative dictionaries
TLDR
This work presents a novel method for learning top-down visual saliency, which is well-suited to locate objects of interest in complex scenes and provides much better saliency maps.
Disconnected Skeleton: Shape at Its Absolute Scale
TLDR
This work presents a new skeletal representation along with a matching framework to address the deformable shape recognition problem and replaces the local coordinate frame with a global Euclidean frame supported by additional mechanisms to handle articulations and local boundary deformations.
...
1
2
3
4
5
...