Learning Multimodal Affinities for Textual Editing in Images

  title={Learning Multimodal Affinities for Textual Editing in Images},
  author={Or Perel and Oron Anschel and Omri Ben-Eliezer and Shai Mazor and Hadar Averbuch-Elor},
Fig. 1. Given a document-image (left), we learn multimodal affinities among the textual entities. The user can then select words to edit. For example, the user can select to highlight the word “FRIGATEBIRD” marked in red or delete the word “107” marked in blue (center). Our method propagates the editing operations onto words that are semantically and visually similar, highlighting all bird names and removing their size information (right). Image courtesy: Société Audubon Haiti. 


Unsupervised Deep Embedding for Clustering Analysis
Deep Embedded Clustering is proposed, a method that simultaneously learns feature representations and cluster assignments using deep neural networks and learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Expand
Exploring Visual Information Flows in Infographics
This work uses a deep neural network to identify visual elements related to information, agnostic to their various artistic appearances, and characterize the VIF design space by a taxonomy of 12 different design patterns. Expand
Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline
An end-to-end approach that automatically extracts an extensible timeline template from a bitmap image that adopts a deconstruction and reconstruction paradigm and shows that this approach can effectively extract extensible templates from real-world timeline infographics. Expand
Clustering-Driven Deep Embedding With Pairwise Constraints
This paper proposes a new framework, called Clustering-driven deep embedding with PAirwise Constraints (CPAC), for nonparametric clustering using a neural network, based on a Siamese network and shows that clustering performance increases when using this scheme, even with a limited amount of user queries. Expand
Contextual String Embeddings for Sequence Labeling
This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text. Expand
Deep Continuous Clustering
A clustering algorithm that performs nonlinear dimensionality reduction and clustering jointly is presented that outperforms state-of-the-art clustering schemes, including recent methods that use deep networks. Expand
Deep Video Color Propagation
This work proposes a deep learning framework for color propagation that combines a local strategy, to propagate colors frame-by-frame ensuring temporal stability, and a global strategy, using semantics for color propagate within a longer range. Expand
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. Expand
Image Inpainting for Irregular Holes Using Partial Convolutions
This work proposes the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels, and outperforms other methods for irregular masks. Expand
Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics
This work augments background patches in infographics from the authors' Visually29K dataset with Internet-scraped icons which are used as training data for an icon proposal mechanism and presents a multi-modal summarization application. Expand