Towards Task Understanding in Visual Settings

@article{Santy2019TowardsTU,
  title={Towards Task Understanding in Visual Settings},
  author={Sebastin Santy and Wazeer Zulfikar and Rishabh Mehrotra and Emine Yilmaz},
  journal={ArXiv},
  year={2019},
  volume={abs/1811.11833}
}
We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external… 
1 Citations

Figures from this paper

Special issue on learning from user interactions
TLDR
Understanding user behavior could allow the system to support users at the various stages of their tasks, and could have implications on many aspects of the system design including user interface.

References

SHOWING 1-4 OF 4 REFERENCES
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
TLDR
This survey classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space.
Show and tell: A neural image caption generator
TLDR
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve
Show and tell : A neural image caption generator . In CVPR . 10028
  • 2015