ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT

@article{Sun2019ICDAR2C,
  title={ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT},
  author={Yipeng Sun and Zihan Ni and Chee-Kheng Chng and Yuliang Liu and Canjie Luo and Chun Chet Ng and Junyu Han and Errui Ding and Jingtuo Liu and Dimosthenis Karatzas and Chee Seng Chan and Lianwen Jin},
  journal={2019 International Conference on Document Analysis and Recognition (ICDAR)},
  year={2019},
  pages={1557-1562}
}
  • Yipeng Sun, Zihan Ni, Lianwen Jin
  • Published 1 September 2019
  • Computer Science
  • 2019 International Conference on Document Analysis and Recognition (ICDAR)
Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 5,0000… 

Figures and Tables from this paper

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
TLDR
This paper argues that a simple attention mechanism can do the same or even better job without any bells and whistles of multi-modality encoder design, and finds this simple baseline model consistently outperforms state-of-the-art (SOTA) models on two popular benchmarks, TextVQA and all three tasks of ST-V QA.
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
TLDR
A new end-to-end scene text spotting framework termed SwinTextSpotter is proposed, using a transformer encoder with dynamic head as the detector and a novel Recognition Conversion mechanism to explicitly guide text localization through recognition loss.
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
TLDR
This work manually collects Chinese text datasets from publicly available competitions, projects, and papers, then evaluates a series of representative text recognition methods on these datasets with unified evaluation methods to provide experimental results, and surprisingly observes that state-of-the-art baselines for recognizing English texts cannot perform well on Chinese scenarios.
Towards Open-Set Text Recognition via Label-to-Prototype Learning
TLDR
A label-to-prototype learning framework to handle novel characters without retraining the model, which achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets.
I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection
TLDR
Without bells and whistles, experimental results show that the proposed I3CL sets new state-of-the-art results on three challenging public benchmarks, i.e. an F-measure of 77.5% on ArT, 86.9% on Total-Text, and 86.4% on CTW-1500.
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
TLDR
This work proposes Text OCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset and uses a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion.
Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity
TLDR
This work investigates the significant differences in Indian and Latin Scene Text Recognition (STR) systems and presents utilizing additional non-Unicode fonts with generally employed Unicode fonts to cover font diversity in such synthesizers for Indian languages.
Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting
TLDR
A weakly supervised pre-training method that can acquire effective scene text representations by jointly learning and aligning visual and textual information and outperforms existing pre- training techniques consistently across multiple public datasets.
Text Recognition in the Wild
TLDR
This literature review attempts to present an entire picture of the field of scene text recognition, which provides a comprehensive reference for people entering this field and could be helpful in inspiring future research.
...
1
2
3
4
...

References

SHOWING 1-10 OF 32 REFERENCES
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
  • Baoguang Shi, C. Yao, X. Bai
  • Computer Science
    2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
  • 2017
TLDR
This report introduces RCTW, a new competition that focuses on Chinese text reading with a large-scale dataset with over 12,000 annotated images and calls for more future research on the Chinese textReading problem.
ICDAR 2015 competition on Robust Reading
TLDR
A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text and tasks assessing End-to-End system performance have been introduced to all Challenges.
Detecting Oriented Text in Natural Images by Linking Segments
TLDR
SegLink, an oriented text detection method to decompose text into two locally detectable elements, namely segments and links, achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
TLDR
To evaluate its robustness against curved text, DeconvNet is fine-tuned and benchmarked on Total-Text to facilitate a new research direction for the scene text community.
Synthetic Data for Text Localisation in Natural Images
TLDR
The relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning, are discussed.
Detecting Curve Text in the Wild: New Dataset and New Solution
TLDR
A polygon based curve text detector (CTD) which can directly detect curve text without empirical combination by seamlessly integrating the recurrent transverse and longitudinal offset connection (TLOC), which allows the CTD to explore context information instead of predicting points independently, resulting in more smooth and accurate detection.
Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection
  • Yuliang Liu, Lianwen Jin
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
A new Convolutional Neural Networks (CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle, which has better overall performance than L2 loss and smooth L1 loss in terms of robustness and stability.
Chinese Text in the Wild
TLDR
A newly created dataset of Chinese text with about 1 million Chinese characters annotated by experts in over 30 thousand street view images, suitable for training robust neural networks for various tasks, particularly detection and recognition.
ICDAR 2013 Robust Reading Competition
TLDR
The datasets and ground truth specification are described, the performance evaluation protocols used are details, and the final results are presented along with a brief summary of the participating methods.
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
TLDR
The COCO-Text dataset is described, which contains over 173k text annotations in over 63k images and presents an analysis of three leading state-of-the-art photo Optical Character Recognition (OCR) approaches on the dataset.
...
1
2
3
4
...