Towards Task Understanding in Visual Settings

  title={Towards Task Understanding in Visual Settings},
  author={Sebastin Santy and Wazeer Zulfikar and Rishabh Mehrotra and Emine Yilmaz},
We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external… 
1 Citations

Figures from this paper

Special issue on learning from user interactions
Understanding user behavior could allow the system to support users at the various stages of their tasks, and could have implications on many aspects of the system design including user interface.


Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
This survey classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space.
Show and tell: A neural image caption generator
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve
Show and tell : A neural image caption generator . In CVPR . 10028
  • 2015