Editing Text in the Wild

  title={Editing Text in the Wild},
  author={Liang Wu and Chengquan Zhang and Jiaming Liu and Junyu Han and Jingtuo Liu and Errui Ding and Xiang Bai},
  journal={Proceedings of the 27th ACM International Conference on Multimedia},
  • Liang Wu, Chengquan Zhang, X. Bai
  • Published 8 August 2019
  • Computer Science
  • Proceedings of the 27th ACM International Conference on Multimedia
In this paper, we are interested in editing text in natural images, which aims to replace or modify a word in the source image with another one while maintaining its realistic look. [] Key Method Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module. The text conversion module changes the text content of the source image into the target text while keeping the original text style…

Figures and Tables from this paper

RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image
A novel representational learning-based STE model, referred to as RewriteNet, that employs textual information as well as visual information that bridges the domain gap between synthetic and real data is proposed.
RewriteNet: Reliable Scene Text Editing with Implicit Decomposition of Text Contents and Styles
A novel STE model, referred to as RewriteNet, that decomposes text images into content and style features and re-writes a text in the original image, which achieves better generation performances than other comparisons.
SwapText: Image Based Texts Transfer in Scenes
Swapping text in scene images while preserving original fonts, colors, sizes and background textures is a challenging task due to the complex interplay between different factors, so a three-stage framework to transfer texts across scene images is presented.
STRIVE: Scene Text Replacement In Videos
This work proposes replacing scene text in videos using deep style transfer and learned photometric transformations, and introduces new synthetic and real-world datasets with paired text objects, which is the first attempt at deep video text replacement.
EraseNet: End-to-End Text Removal in the Wild
A novel GAN-based model termed EraseNet that can automatically remove text located on the natural images that significantly outperforms the existing state-of-the-art methods in terms of all metrics, with remarkably superior higher-quality results.
APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation
Style-guided text image generation tries to synthesize text image by imitating reference image’s appearance while keeping text content unaltered. The text image appearance includes many aspects. In
Deep Learning-Based Forgery Attack on Document Images
The forged-andrecaptured samples created by the proposed text editing attack and recapturing operation have successfully fooled some existing document authentication systems.
Progressive Scene Text Erasing with Self-Supervision
Self-supervision is employed for feature representation on unlabeled real-world scene text images and improves the generalization of the text erasing task and achieves state-of-the-art performance on public benchmarks.
Construction of Scene Tibetan Dataset Based on GAN
This paper focuses on the study of replacing other languages in the scene with Tibetan, while maintaining the style of the original text, and decomposes the problem into three sub-networks: text style transfer network, background inpainting network and fusion network.
TextStyleBrush: Transfer of Text Aesthetics from a Single Example
A novel approach for disentangling the content of a text image from all aspects of its appearance into a non-parametric, fixed-dimensional vector, which can then be applied to new content, for one-shot transfer of the source style to newcontent.


STEFANN: Scene Text Editor Using Font Adaptive Neural Network
This paper proposes a method to modify text in an image at character-level using two different neural network architectures - FANnet to achieve structural consistency with source font and Colornet to preserve source color.
Context-Aware Unsupervised Text Stylization
This work presents a novel algorithm to stylize the text without supervision, which provides a flexible and convenient way to invoke fantastic text expressions and establishes an implicit mapping for them by using an abstract imagery of the style image as bridges.
Scene Text Eraser
A scene text erasing method to properly hide the information via an inpainting convolutional neural network (CNN) model demonstrates a drastically decrease on the precision, recall and f-score compared with direct text detection way.
Synthetic Data for Text Localisation in Natural Images
The relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning, are discussed.
EnsNet: Ensconce Text in the Wild
Both qualitative and quantitative experiments on synthetic images and the ICDAR 2013 dataset demonstrate that each component of the EnsNet is essential to achieve a good performance, and it can significantly outperform previous state-of-the-art methods in terms of all metrics.
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.
Multi-oriented Text Detection with Fully Convolutional Networks
A novel approach for text detection in natural images that consistently achieves the state-of-the-art performance on three text detection benchmarks: MSRA-TD500, I CDAR2015 and ICDAR2013.
Image Style Transfer Using Convolutional Neural Networks
A Neural Algorithm of Artistic Style is introduced that can separate and recombine the image content and style of natural images and provide new insights into the deep image representations learned by Convolutional Neural Networks and demonstrate their potential for high level image synthesis and manipulation.
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes
A novel text detector namely LOMO is presented, which localizes the text progressively for multiple times (or in other word, LOok More than Once), and the state-of-the-art results on several public benchmarks confirm the striking robustness and effectiveness of LomO.
Multi-content GAN for Few-Shot Font Style Transfer
This work focuses on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface, and proposes an end-to-end stacked conditional GAN model considering content along channels and style along network layers.