• Publications
  • Influence
nocaps: novel object captioning at scale
TLDR
This work presents the first large-scale benchmark for novel object captioning at scale, ‘nocaps’, consisting of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets and provides analysis to guide future work.
VirTex: Learning Visual Representations from Textual Annotations
TLDR
VirTex is proposed – a pretraining approach using semantically dense captions to learn visual representations that match or exceed those learned on ImageNet – supervised or unsupervised – despite using up to ten times fewer images.
CASTing Your Model: Learning to Localize Improves Self-Supervised Representations
TLDR
Comparative Attention-Supervised Tuning (CAST) is proposed, which uses unsupervised saliency maps to intelligently sample crops, and to provide grounding supervision via a Grad-CAM attention loss to overcome contrastive SSL methods' limitations.
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
TLDR
A new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable, that are more understandable while requiring lesser number of teaching examples for VQA is proposed.
Continual Reinforcement Learning in 3D Non-stationary Environments
TLDR
This paper proposes and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes and introduces an end-to-end model-free continual reinforcement learning strategy.
RedCaps: web-curated image-text data created by the people, for the people
TLDR
It is shown that captioning models trained on RedCaps produce rich and varied captions preferred by humans, and learn visual representations that transfer to many downstream tasks.
Automated Multiclass Artifact Detection in Diffusion MRI Volumes via 3D Residual Squeeze-and-Excitation Convolutional Neural Networks
TLDR
A deep learning-based automated multiclass artifact classifier for dMRI volumes is proposed and a proof-of-concept dMRI analysis is conducted exploring the relationship between whole-brain fractional anisotropy (FA) and participant age, to test whether the use of the model improves the brain-age association.
Development Of A Graphical User Interface For Control Of A Robotic Manipulatior With Sample Acquisition Capability
TLDR
This thesis work creates a bridge between technical and psychological aspects of interface design by integrating the concepts of compatibility of GUI with users, consistency in design, visual hierarchy and page layout.
Motorized chair
A full sized wheelchair can be too large to move around the house, especially in tight places such as the kitchen. Wheelchairs may also be too low from which to reach objects on shelves and in
Development Of A Graphical User Interface For Control Of A Robotic Manipulatior With Sample Acquisition Capability
TLDR
This thesis work creates a bridge between technical and psychological aspects of interface design by integrating the concepts of compatibility of GUI with users, consistency in design, visual hierarchy and page layout.
...
...