Inferring spatial relations from textual descriptions of images

@article{Elu2021InferringSR,
  title={Inferring spatial relations from textual descriptions of images},
  author={Aitzol Elu and Gorka Azkune and Oier Lopez de Lacalle and Ignacio Arganda-Carreras and Aitor Soroa Etxabe and Eneko Agirre},
  journal={Pattern Recognit.},
  year={2021},
  volume={113},
  pages={107847}
}
1 Citations

Figures and Tables from this paper

References

SHOWING 1-10 OF 69 REFERENCES
Image Description using Visual Dependency Representations
TLDR
In an image description task, two template-based description generation models that operate over visual dependency representations outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.
Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates
TLDR
This work introduces the task of predicting spatial templates for two objects under a relationship, and presents two simple neural-based models that leverage annotated images and structured text to learn this task, demonstrating that spatial locations are to a large extent predictable from implicit spatial language.
Spatial Knowledge Distillation to Aid Visual Reasoning
TLDR
This work proposes a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering and demonstrates the impact of predicting such a mask inside the teachers network using attention.
A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation
TLDR
This work proposes a pooling interpretation of spatial relations and shows how it improves image retrieval and annotations tasks involving spatial language and argues for a learning-based approach that acquires a representation of spatial Relations by learning parameters of the pooling operator.
Understanding visual scenes
TLDR
This paper develops representations for the semantics of scenes by explicitly encoding the objects detected in them and their spatial relations through two well-known types of tree representations, namely constituents and dependencies.
Object-Contextual Representations for Semantic Segmentation
TLDR
This paper addresses the semantic segmentation problem with a focus on the context aggregation strategy, and presents a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class.
Computational Models for Spatial Prepositions
TLDR
This paper treats the modeling task as calling for assignment of probabilities to prepositional relations as a function of multiple factors, where such probabilities can be viewed as estimates of whether humans would judge the relations to hold in given circumstances.
Visual Semantic Role Labeling
TLDR
The problem of Visual Semantic Role Labeling is introduced: given an image the authors want to detect people doing actions and localize the objects of interaction and associate objects in the scene with different semantic roles for each action.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
TLDR
This work proposes a novel hierarchical approach for text-to-image synthesis by inferring semantic layout and shows that the model can substantially improve the image quality, interpretability of output and semantic alignment to input text over existing approaches.
...
...