Corpus ID: 236429102

Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

@article{Bhunia2021TextIT,
  title={Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation},
  author={A. Bhunia and Aneeshan Sain and Pinaki Nath Chowdhury and Yi-Zhe Song},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.12087}
}
Text recognition remains a fundamental and extensively researched topic in computer vision, largely owing to its wide array of commercial applications. The challenging nature of the very problem however dictated a fragmentation of research efforts: Scene Text Recognition (STR) that deals with text in everyday scenes, and Handwriting Text Recognition (HTR) that tackles hand-written text. In this paper, for the first time, we argue for their unification – we aim for a single model that can… Expand

Figures and Tables from this paper

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
TLDR
It is argued that semantic information offers a complimentary role in addition to visual only by proposing a multi-stage multi-scale attentional decoder that performs joint visual-semantic reasoning in a stage-wise manner. Expand

References

SHOWING 1-10 OF 69 REFERENCES
Towards the Unseen: Iterative Text Recognition by Distilling from Errors
TLDR
A novel framework that utilises predicted knowledge of character sequences from a previous iteration, to augment the main network in improving the next prediction, and offers the best performance thus showcasing the capability of generalising onto unseen words. Expand
End-to-end scene text recognition
TLDR
While scene text recognition has generally been treated with highly domain-specific methods, the results demonstrate the suitability of applying generic computer vision methods. Expand
Symmetry-Constrained Rectification Network for Scene Text Recognition
TLDR
A Symmetry-constrained Rectification Network (ScRN) based on local attributes of text instances, such as center line, scale and orientation, which enables ScRN to generate better rectification results than existing methods and thus lead to higher recognition accuracy. Expand
AON: Towards Arbitrarily-Oriented Text Recognition
TLDR
The arbitrary orientation network (AON) is developed to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence and is comparable to major existing methods in regular datasets. Expand
Scene Text Recognition using Higher Order Language Priors
TLDR
A framework is presented that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary, and achieves significant improvement in word recognition accuracies without using a restricted word list. Expand
On Vocabulary Reliance in Scene Text Recognition
TLDR
An analytical framework is established, in which different datasets, metrics and module combinations for quantitative comparisons are devised, to conduct an in-depth study on the problem of vocabulary reliance in scene text recognition, and a simple yet effective mutual learning strategy is proposed to allow models of two families to learn collaboratively. Expand
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
TLDR
This work proposes an easy-to-implement strong baseline for irregular scene text recognition, using off- the-shelf neural network components and only word-level annotations, and achieves state-of-the-art performance on both regular and irregular sceneText recognition benchmarks. Expand
Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
TLDR
It is argued that semantic information offers a complimentary role in addition to visual only by proposing a multi-stage multi-scale attentional decoder that performs joint visual-semantic reasoning in a stage-wise manner. Expand
SCATTER: Selective Context Attentional Scene Text Recognizer
TLDR
A novel architecture for STR is introduced, named Selective Context ATtentional Text Recognizer (SCATTER), that utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Expand
Decoupled Attention Network for Text Recognition
TLDR
A decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results, and achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition. Expand
...
1
2
3
4
5
...