Towards Task Understanding in Visual Settings
@article{Santy2019TowardsTU, title={Towards Task Understanding in Visual Settings}, author={Sebastin Santy and Wazeer Zulfikar and Rishabh Mehrotra and Emine Yilmaz}, journal={ArXiv}, year={2019}, volume={abs/1811.11833} }
We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external…Â
Figures from this paper
One Citation
Special issue on learning from user interactions
- Computer ScienceInf. Retr. J.
- 2020
Understanding user behavior could allow the system to support users at the various stages of their tasks, and could have implications on many aspects of the system design including user interface.
References
SHOWING 1-4 OF 4 REFERENCES
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
- Computer ScienceJ. Artif. Intell. Res.
- 2016
This survey classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space.
Show and tell: A neural image caption generator
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- Computer ScienceAAAI
- 2017
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve…
Show and tell : A neural image caption generator . In CVPR . 10028
- 2015