Corpus ID: 236428322

Towards the Unseen: Iterative Text Recognition by Distilling from Errors

  title={Towards the Unseen: Iterative Text Recognition by Distilling from Errors},
  author={A. Bhunia and Pinaki Nath Chowdhury and Aneeshan Sain and Yi-Zhe Song},
Visual text recognition is undoubtedly one of the most extensively researched topics in computer vision. Great progress have been made to date, with the latest models starting to focus on the more practical “in-the-wild” setting. However, a salient problem still hinders practical deployment – prior state-of-arts mostly struggle with recognising unseen (or rarely seen) character sequences. In this paper, we put forward a novel framework to specifically tackle this “unseen” problem. Our framework… Expand

Figures and Tables from this paper

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
It is argued that semantic information offers a complimentary role in addition to visual only by proposing a multi-stage multi-scale attentional decoder that performs joint visual-semantic reasoning in a stage-wise manner. Expand
Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
This paper aims for a single model that can compete favourably with two separate state-of-the-art STR and HTR models, and proposes four distillation losses all of which are specifically designed to cope with the aforementioned unique characteristics of text recognition. Expand


AON: Towards Arbitrarily-Oriented Text Recognition
The arbitrary orientation network (AON) is developed to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence and is comparable to major existing methods in regular datasets. Expand
On Vocabulary Reliance in Scene Text Recognition
An analytical framework is established, in which different datasets, metrics and module combinations for quantitative comparisons are devised, to conduct an in-depth study on the problem of vocabulary reliance in scene text recognition, and a simple yet effective mutual learning strategy is proposed to allow models of two families to learn collaboratively. Expand
Focusing Attention: Towards Accurate Text Recognition in Natural Images
The FAN (the abbreviation of Focusing Attention Network) method is proposed that employs a focusing attention mechanism to automatically draw back the drifted attention in scene text images and substantially outperforms the existing methods. Expand
Deep Structured Output Learning for Unconstrained Text Recognition
A convolutional neural network based architecture which incorporates a Conditional Random Field graphical model, taking the whole word image as a single input, which achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition. Expand
SCATTER: Selective Context Attentional Scene Text Recognizer
A novel architecture for STR is introduced, named Selective Context ATtentional Text Recognizer (SCATTER), that utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Expand
RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
Theoretically, the proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical. Expand
Reading Text in the Wild with Convolutional Neural Networks
An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated. Expand
What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images
This paper proposes a novel and efficient optimization-based method that can be naturally integrated to different sequential prediction schemes, i.e., connectionist temporal classification (CTC) and attention mechanism and applies it to five state-of-the-art STR models with both targeted and untargeted attack modes. Expand
Symmetry-Constrained Rectification Network for Scene Text Recognition
A Symmetry-constrained Rectification Network (ScRN) based on local attributes of text instances, such as center line, scale and orientation, which enables ScRN to generate better rectification results than existing methods and thus lead to higher recognition accuracy. Expand
Scene Text Recognition using Higher Order Language Priors
A framework is presented that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary, and achieves significant improvement in word recognition accuracies without using a restricted word list. Expand