Learning Multimodal Affinities for Textual Editing in Images

@article{Perel2021LearningMA,
  title={Learning Multimodal Affinities for Textual Editing in Images},
  author={Or Perel and Oron Anschel and Omri Ben-Eliezer and Shai Mazor and Hadar Averbuch-Elor},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.10139}
}
Fig. 1. Given a document-image (left), we learn multimodal affinities among the textual entities. The user can then select words to edit. For example, the user can select to highlight the word “FRIGATEBIRD” marked in red or delete the word “107” marked in blue (center). Our method propagates the editing operations onto words that are semantically and visually similar, highlighting all bird names and removing their size information (right). Image courtesy: Société Audubon Haiti. 

References

SHOWING 1-10 OF 47 REFERENCES
Unsupervised Deep Embedding for Clustering Analysis
Exploring Visual Information Flows in Infographics
Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline
Clustering-Driven Deep Embedding With Pairwise Constraints
Contextual String Embeddings for Sequence Labeling
Deep Continuous Clustering
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Image Inpainting for Irregular Holes Using Partial Convolutions
...
1
2
3
4
5
...