Towards General Purpose Vision Systems
@article{Gupta2021TowardsGP, title={Towards General Purpose Vision Systems}, author={Tanmay Gupta and A. Kamath and Aniruddha Kembhavi and Derek Hoiem}, journal={ArXiv}, year={2021}, volume={abs/2104.00743} }
A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question… Expand
Figures and Tables from this paper
References
SHOWING 1-10 OF 72 REFERENCES
12-in-1: Multi-Task Vision and Language Representation Learning
- Computer Science
- 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
- 76
- PDF
Learning Transferable Visual Models From Natural Language Supervision
- Computer Science
- ArXiv
- 2021
- 50
- Highly Influential
- PDF
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Computer Science
- NeurIPS
- 2019
- 425
- PDF
UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory
- Computer Science
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
- 325
- PDF
From Recognition to Cognition: Visual Commonsense Reasoning
- Computer Science
- 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
- 191
- PDF
Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
- Computer Science
- ArXiv
- 2021
- 6
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Computer Science, Mathematics
- ICML
- 2015
- 5,841
- PDF