Corpus ID: 233004347

Towards General Purpose Vision Systems

@article{Gupta2021TowardsGP,
  title={Towards General Purpose Vision Systems},
  author={Tanmay Gupta and A. Kamath and Aniruddha Kembhavi and Derek Hoiem},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.00743}
}
A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question… Expand

References

SHOWING 1-10 OF 72 REFERENCES
12-in-1: Multi-Task Vision and Language Representation Learning
  • 79
  • PDF
Learning Transferable Visual Models From Natural Language Supervision
  • 62
  • Highly Influential
  • PDF
Unifying Vision-and-Language Tasks via Text Generation
  • 5
  • PDF
VisualBERT: A Simple and Performant Baseline for Vision and Language
  • 220
  • PDF
UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory
  • I. Kokkinos
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
  • 331
  • PDF
From Recognition to Cognition: Visual Commonsense Reasoning
  • 198
  • PDF
VinVL: Making Visual Representations Matter in Vision-Language Models
  • 7
  • PDF
Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer
  • 6
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
  • 5,900
  • PDF
...
1
2
3
4
5
...