SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

  title={SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition},
  author={Chengwei Zhang and Yunlu Xu and Zhanzhan Cheng and Shiliang Pu and Yi Niu and Fei Wu and Futai Zou},
Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Rectification (i.e., spatial transformers) as the preprocessing stage is one popular approach and extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable… 

Figures and Tables from this paper

Text Recognition in Natural Scenes: A Review

This literature review attempts to present the recent picture in the field of scene text recognition and provides a reference for people entering this field and may help inspire future research.

PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition

A Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency is proposed, which adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate.

Research on Text Recognition of Natural Scenes for Complex Situations

The continuous development of sceneText detection and recognition algorithm system will lay the foundation for the research of recognition problems such as multilingual recognition of scene text and formula recognition.

DBCAN: Dual-Branch Cross-Attention Network for Scene Text Recognition

Different from the previous methods heavily relying on semantic information, DBCAN can enhance the position clues and learn semantic relations with two separate branches and fuse them by a tailored Cross-Attention Module (CAM).

DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding

DavarOCR is an open-source toolbox for OCR and document understanding tasks that has relatively more complete support for the sub-tasks of the cutting-edge technology of document understanding.

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

This work proposes to learn discrimination and generation by integrating contrastive learning and masked image modeling in a self-supervised text recognition method, and demonstrates that the pre-trained model can be easily applied to other text-related tasks with obvious performance gain.

HRNet Encoder and Dual-Branch Decoder Framework-Based Scene Text Recognition Model

Compared with classical models such as ASTER, TextSR, and SCGAN, the recognition accuracy of the proposed model is improved and better recognition results can be achieved on irregular and blurred datasets such as IC15, SVTP, and CUTE80.

Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

A robust con-sistency regularization based semi-supervised framework is proposed for STR, which can effectively solve the instability issue due to domain inconsistency between synthetic and real images and is believed to be the first consistencyRegularization based framework that applies successfully to STR.

IterVM: Iterative Vision Modeling Module for Scene Text Recognition

This paper proposes iterative vision modeling module (IterVM) to further improve the STR accuracy and proposes a powerful scene text recognizer called IterNet, which achieves new state-of-the-art results on several public benchmarks.

Self-supervised Implicit Glyph Attention for Text Recognition

SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment, which serve as the supervision to improve attention correctness without extra character-level annotations.



Symmetry-Constrained Rectification Network for Scene Text Recognition

A Symmetry-constrained Rectification Network (ScRN) based on local attributes of text instances, such as center line, scale and orientation, which enables ScRN to generate better rectification results than existing methods and thus lead to higher recognition accuracy.

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

This work proposes an easy-to-implement strong baseline for irregular scene text recognition, using off- the-shelf neural network components and only word-level annotations, and achieves state-of-the-art performance on both regular and irregular sceneText recognition benchmarks.

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

  • Fangneng ZhanShijian Lu
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
Extensive experiments show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.

AON: Towards Arbitrarily-Oriented Text Recognition

The arbitrary orientation network (AON) is developed to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence and is comparable to major existing methods in regular datasets.

Efficient Backbone Search for Scene Text Recognition

This work designs a domain-specific search space for STR, which contains both choices on operations and constraints on the downsampling path, and proposes a two-step search algorithm, which decouples operations and downsampled path, for an efficient search in the given space.

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.

Robust Scene Text Recognition with Automatic Rectification

RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text, which is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems.

2D Attentional Irregular Scene Text Recognizer

This paper proposes a framework which transforms the irregular text with 2D layout to character sequence directly via 2D attentional scheme and utilizes a relation attention module to capture the dependencies of feature maps and a parallel Attention module to decode all characters in parallel, which make the method more effective and efficient.

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Extensive experiments on various benchmarks show that the proposed augmentation and the joint learning methods significantly boost the performance of the recognition networks.

Synthetically Supervised Feature Learning for Scene Text Recognition

This work designs a multi-task network with an encoder-discriminator-generator architecture to guide the feature of the original image toward that of the clean image, and significantly outperforms the state-of-the-art methods on standard scene text recognition benchmarks in the lexicon-free category.