• Corpus ID: 3126988

End-to-end text recognition with convolutional neural networks

  title={End-to-end text recognition with convolutional neural networks},
  author={Tao Wang and David J. Wu and Adam Coates and A. Ng},
  journal={Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012)},
  • Tao WangDavid J. Wu A. Ng
  • Published 1 November 2012
  • Computer Science
  • Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012)
Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to… 

Figures and Tables from this paper

Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks

A unified network that simultaneously localizes and recognizes text with a single forward pass is proposed, avoiding intermediate processes, such as image cropping, feature re-calculation, word separation, and character grouping.

Reading Text in the Wild with Convolutional Neural Networks

An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

End-to-End Text Recognition Using Local Ternary Patterns, MSER and Deep Convolutional Nets

The system presented outperforms state of the art methods on the ICDAR 2003 dataset in the text-detection, dictionary-driven cropped-word recognition and Dictionary-driven end-to-end recognition tasks.

End-to-End Text Recognition with Hybrid HMM Maxout Models

This work proposes new solutions to the character and word recognition problems and shows how to combine these solutions in an end-to-end text-recognition system that beats state-of-the-art results on all the sub-problems for both the ICDAR 2003 and SVT benchmark datasets.

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

The recognition module of the Mask TextSpotter method is investigated separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Review network for scene text recognition

An end-to-end trainable deep review neural network for scene text recognition, which is a combination of feature extraction, feature reviewing, feature attention, and sequence recognition, is proposed.

Visual Attention Models for Scene Text Recognition

This paper proposes an approach to lexicon-free recognition of text in scene images using a LSTM-based soft visual attention model learned from convolutional features, and shows that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results.

Attention-Based Deep Neural Network and Its Application to Scene Text Recognition

  • Haizhen HeJiehan Li
  • Computer Science
    2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN)
  • 2019
An attention-based deep neural network architecture for scene text recognition, which integrates feature extraction, feature attention, feature labeling and transcription into a unified framework is proposed.

Reading Scene Text in Deep Convolutional Sequences

A deep recurrent model is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently, achieving impressive results on several benchmarks, advancing the state-of-the-art substantially.

Enhancing Text Spotting with a Language Model and Visual Context Information

This paper relies on off-the-shelf deep networks already trained with large amounts of data and that provide a series of text hypotheses per input image that are combined with different priors obtained from both the semantic interpretation of the image and from a scene-based language model.



End-to-end scene text recognition

While scene text recognition has generally been treated with highly domain-specific methods, the results demonstrate the suitability of applying generic computer vision methods.

Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning

This paper applies large-scale algorithms for learning the features automatically from unlabeled data to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system.

Top-down and bottom-up cues for scene text recognition

This work presents a framework that exploits both bottom-up and top-down cues in the problem of recognizing text extracted from street images, and shows significant improvements in accuracies on two challenging public datasets, namely Street View Text and ICDAR 2003.

Automatic Scene Text Recognition using a Convolutional Neural Network

An automatic recognition method for color text characters extracted from scene images, which is robust to strong distortions, complex background, low resolution and non uniform lightning and without applying any preprocessing or post-processing and without using tunable parameters is presented.

Reading Digits in Natural Images with Unsupervised Feature Learning

A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.

A Method for Text Localization and Recognition in Real-World Images

The paper is first to report both text detection and recognition results on the standard and rather challenging ICDAR 2003 dataset, and the text localization works for number of alphabets and the method is easily adapted to recognition of other scripts, e.g. cyrillics.

Word Spotting in the Wild

It is argued that the appearance of words in the wild spans this range of difficulties and a new word recognition approach based on state-of-the-art methods from generic object recognition is proposed, in which object categories are considered to be the words themselves.

High-Performance Neural Networks for Visual Object Classification

We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a

Character Recognition in Natural Images

It is demonstrated that the performance of the proposed method can be far superior to that of commercial OCR systems, and can benefit from synthetically generated training data obviating the need for expensive data collection and annotation.

A discriminative semi-Markov model for robust scene text recognition

A semi-Markov model for recognizing scene text that integrates character and word segmentation with recognition that performs robustly on low-resolution images of signs containing text in fonts atypical of documents is presented.