FOTS: Fast Oriented Text Spotting with a Unified Network

@article{Liu2018FOTSFO,
  title={FOTS: Fast Oriented Text Spotting with a Unified Network},
  author={Xuebo Liu and Ding Liang and Shipeng Yan and Dagui Chen and Yu Qiao and Junjie Yan},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={5676-5685}
}
  • X. Liu, Ding Liang, +3 authors Junjie Yan
  • Published 5 January 2018
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Incidental scene text spotting is considered one of the most difficult and valuable challenges in the document analysis community. [...] Key Method Specifically, RoIRotate is introduced to share convolutional features between detection and recognition. Benefiting from convolution sharing strategy, our FOTS has little computation overhead compared to baseline text detection network, and the joint training method makes our method perform better than these two-stage methods. Experiments on ICDAR 2015, ICDAR 2017…Expand
Scene text spotting based on end-to-end
TLDR
A text recognition model based on the SAM-BiLSTM (spatial attention mechanism with BiL STM), which can more effectively extract the semantic information between characters is proposed, which significantly surpasses state-of-the-art methods on a number of text detection and text spotting benchmarks.
Towards End-to-End Text Spotting in Natural Scenes
  • H. Li, P. Wang, Chunhua Shen
  • Medicine, Computer Science
    IEEE transactions on pattern analysis and machine intelligence
  • 2021
TLDR
This work proposes a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes such as image cropping and feature re-calculation, word separation, and character grouping.
Towards Accurate Scene Text Detection with Bidirectional Feature Pyramid Network
TLDR
A new Fully Convolutional One-Stage Object Detection (FCOS)-based text detection method that can robustly detect multioriented and multilingual text from natural scene images in a per pixel prediction approach and applies the Bidirectional Feature Pyramid Network (BiFPN) as the backbone network, enhancing the model learning capacity and increasing the receptive field.
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
TLDR
This paper investigates the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images, and proposes an end-to-end trainable neural network model, named as Mask TextSpotter, which is inspired by the newly published work Mask R-CNN.
ASTS: A Unified Framework for Arbitrary Shape Text Spotting
TLDR
An end-to-end trainable unified framework to perceive and understand text based on different levels of semantics, holistic-, pixel- and sequence-level semantics, and then unify the recognized semantics for robust text spotting.
Toward Arbitrary-Shaped Text Spotting Based on End-to-End
TLDR
A novel end-to-end text spotting model that adopts the corner attention mechanism to extract the features of long text more effectively and the rectification feature map into the SA-BiLSTM decoder to recognize the curve text more effective.
DGST : Discriminator Guided Scene Text detector
TLDR
A detector framework based on the conditional generative adversarial networks to improve the segmentation effect of scene text detection, called DGST (Discriminator Guided Scene Text detector), which achieves an F-measure of 87% on ICDAR 2015 dataset.
Single Shot TextSpotter with Explicit Alignment and Attention
TLDR
A novel text-alignment layer is proposed that allows it to precisely compute convolutional features of a text instance in ar- bitrary orientation, which is critical to identify challenging text instances.
Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting
TLDR
This paper proposes an end-to-end trainable text spotting approach named Text Perceptron, which unites text detection and the following recognition part into a whole framework, and helps the whole network achieve global optimization.
Towards Unconstrained End-to-End Text Spotting
TLDR
An end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape and show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, leading to significant improvements in both the detection and recognition accuracy of the model.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks
TLDR
A unified network that simultaneously localizes and recognizes text with a single forward pass is proposed, avoiding intermediate processes, such as image cropping, feature re-calculation, word separation, and character grouping.
Deep Scene Text Detection with Connected Component Proposals
TLDR
A novel two-task network with integrating bottom and top cues that can detect arbitrary-orientation scene text with a finer output boundary and achieve the state-of-the-art performance in ICDAR 2013 text localization task.
Detecting Text in Natural Image with Connectionist Text Proposal Network
TLDR
A novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image and develops a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy.
Deep Features for Text Spotting
TLDR
A Convolutional Neural Network classifier is developed that can be used for text spotting in natural images and a method of automated data mining of Flickr, that generates word and character level annotations is used to form an end-to-end, state-of-the-art text spotting system.
EAST: An Efficient and Accurate Scene Text Detector
TLDR
This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.
Reading Text in the Wild with Convolutional Neural Networks
TLDR
An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.
Detecting Oriented Text in Natural Images by Linking Segments
TLDR
SegLink, an oriented text detection method to decompose text into two locally detectable elements, namely segments and links, achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.
Fused Text Segmentation Networks for Multi-oriented Scene Text Detection
TLDR
A novel end-end framework for multi-oriented scene text detection from an instance-aware semantic segmentation perspective, Fused Text Segmentation Networks, which combines multi-level features during the feature extracting as text instance may rely on finer feature expression compared to general objects.
Single Shot Text Detector with Regional Attention
TLDR
A novel single-shot text detector that directly outputs word-level bounding boxes in a natural image and develops a hierarchical inception module which efficiently aggregates multi-scale inception features.
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
TLDR
A novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images using the Region Proposal Network to generate axis-aligned bounding boxes that enclose the texts with different orientations.
...
1
2
3
4
5
...