• Computer Science
  • Published in ArXiv 2019

Contextual Grounding of Natural Language Entities in Images

@article{Lai2019ContextualGO,
  title={Contextual Grounding of Natural Language Entities in Images},
  author={Farley Lai and Ning Xie and Derek Doran and Asim Kadav},
  journal={ArXiv},
  year={2019},
  volume={abs/1911.02133}
}
In this paper, we introduce a contextual grounding approach that captures the context in corresponding text entities and image regions to improve the grounding accuracy. Specifically, the proposed architecture accepts pre-trained text token embeddings and image object features from an off-the-shelf object detector as input. Additional encoding to capture the positional and spatial information can be added to enhance the feature quality. There are separate text and image branches facilitating… CONTINUE READING

Figures, Tables, and Topics from this paper.

Explore Further: Topics Discussed in This Paper

References

Publications referenced by this paper.
SHOWING 1-10 OF 14 REFERENCES

Attention is All you Need

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Image Transformer

VIEW 1 EXCERPT

Video Action Transformer Network

  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2018
VIEW 1 EXCERPT

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2017

Non-local Neural Networks

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2017
VIEW 2 EXCERPTS