What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

@article{Baek2021WhatIW,
  title={What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels},
  author={Jeonghun Baek and Yusuke Matsui and Kiyoharu Aizawa},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={3112-3121}
}
Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other than English for which we do not always have synthetic data. However, there has been implicit common… 

Multimodal Semi-Supervised Learning for Text Recognition

TLDR
This work extends an existing visual representation learning algorithm and proposes the first contrastive-based method for scene text recognition, SemiMTR, that leverages unlabeled data at each modality training phase and maintains a compact three-stage algorithm.

Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for Semi-Supervised Text Recognition

TLDR
This paper proposes a pseudo-label generation and an uncertainty-based data selection framework for semi-supervised text recognition, using Beam-Search inference to yield highly probable hypotheses to assign pseudo-labels to the unlabelled examples and adopting an ensemble of models to obtain a robust estimate of the uncertainty associated with the prediction.

SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

TLDR
A novel Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in scene images to align the semantic feature distribution between the support domain and target domain is proposed.

Optical Character Recognition of Electrical Equipment Nameplate with Contrast Enhancement

TLDR
An OCR system that can reliably handle the common reflection effects shown in the electrical equipment nameplate images captured under challenging ambient light conditions is proposed by applying a robust contrast enhancement method that is based on a logarithmic mapping function (LMF) to the Electrical equipment nameplates images prior to OCR with one of the state-of-the-art OCR networks, PPOCRv3.

Scene Text Recognition with Permuted Autoregressive Sequence Models

TLDR
This method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling that unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context.

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

TLDR
This work proposes a novel task that predicts the link between truncated texts and conducts three tasks to detect the onomatopoeia region and capture its intended meaning: text detection, text recognition, and link prediction.

Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition

TLDR
Perceiving Stroke-Semantic Context (PerSec), a new approach to self-supervised representation learning tailored for Scene Text Recognition (STR) task, shows significant performance improvement when fine-tuning the learned representation on the labeled data.

PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System

TLDR
A more robust OCR system PP-OCRv3 is proposed in this paper, which upgrades the text detection model and text recognition model in 9 aspects based on PP- OCRv2 and shows that Hmean of PP-PCRV3 outperforms PP-ocRv1 by 5% with comparable inference speed.

Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

TLDR
A robust con-sistency regularization based semi-supervised framework is proposed for STR, which can effectively solve the instability issue due to domain inconsistency between synthetic and real images and is believed to be the first consistencyRegularization based framework that applies successfully to STR.

SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization

TLDR
This work proposes a Similarity-Aware Normalization (SimAN) module to identify the different patterns and align the corresponding styles from the guiding patch to gain representation capability for distinguishing complex patterns such as messy strokes and cluttered backgrounds.

References

SHOWING 1-10 OF 78 REFERENCES

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

TLDR
A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed, which generates an effective yet much smaller model, which is more practical for real-world application scenarios.

Scene Text Recognition using Higher Order Language Priors

TLDR
A framework is presented that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary, and achieves significant improvement in word recognition accuracies without using a restricted word list.

Uber-text: A large-scale dataset for optical character recognition from street-level imagery

  • In Scene Understanding Workshop,
  • 2017

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

TLDR
A large-scale dataset of 25,000 annotated signboard images, in which all the text lines and characters are annotated with locations and transcriptions, were released and a multi ground truth (multi-GT) evaluation method was proposed to make evaluation fairer.

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019

TLDR
The dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge are presented, which has 4 tasks covering various aspects of multi-lingual scene text.

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

TLDR
This work introduces ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network that predicts a character sequence directly from the rectified image.

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)

  • Baoguang ShiC. Yao X. Bai
  • Computer Science
    2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
  • 2017
TLDR
This report introduces RCTW, a new competition that focuses on Chinese text reading with a large-scale dataset with over 12,000 annotated images and calls for more future research on the Chinese textReading problem.

ICDAR 2013 Robust Reading Competition

TLDR
The datasets and ground truth specification are described, the performance evaluation protocols used are details, and the final results are presented along with a brief summary of the participating methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

TLDR
A unified four-stage STR framework is introduced that most existing STR models fit into and allows for the extensive evaluation of previously proposed STR modules and the discovery of previously unexplored module combinations.

TextScanner: Reading Characters in Order for Robust Scene Text Recognition

TLDR
TextScanner bears three characteristics: it belongs to the semantic segmentation family, as it generates pixel-wise, multi-channel segmentation maps for character class, position and order, and also adopts RNN for context modeling.
...