• Publications
  • Influence
nocaps: novel object captioning at scale
TLDR
This work presents the first large-scale benchmark for novel object captioning at scale, ‘nocaps’, consisting of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets and provides analysis to guide future work. Expand
VirTex: Learning Visual Representations from Textual Annotations
TLDR
VirTex is proposed – a pretraining approach using semantically dense captions to learn visual representations that match or exceed those learned on ImageNet – supervised or unsupervised – despite using up to ten times fewer images. Expand
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
TLDR
A new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable, that are more understandable while requiring lesser number of teaching examples for VQA is proposed. Expand
CASTing Your Model: Learning to Localize Improves Self-Supervised Representations
TLDR
Comparative Attention-Supervised Tuning (CAST) is proposed, which uses unsupervised saliency maps to intelligently sample crops, and to provide grounding supervision via a Grad-CAM attention loss to overcome contrastive SSL methods' limitations. Expand
Continual Reinforcement Learning in 3D Non-stationary Environments
TLDR
This paper proposes and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes and introduces an end-to-end model-free continual reinforcement learning strategy. Expand
RedCaps: web-curated image-text data created by the people, for the people
TLDR
This work introduces RedCaps – a large-scale dataset of 11.7M image-text pairs collected from Reddit and shows that captioning models trained on RedC Caps produce rich and varied captions preferred by humans, and learn visual representations that transfer to many downstream tasks. Expand
Development Of A Graphical User Interface For Control Of A Robotic Manipulatior With Sample Acquisition Capability
TLDR
This thesis work creates a bridge between technical and psychological aspects of interface design by integrating the concepts of compatibility of GUI with users, consistency in design, visual hierarchy and page layout. Expand
Development Of A Graphical User Interface For Control Of A Robotic Manipulatior With Sample Acquisition Capability
Design of a graphical user interface (GUI) is a delicate task requiring knowledge of human cognitive behaviour, design strategies and programming skills. In this thesis work, a GUI has been developedExpand