Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

  title={Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning},
  author={Jingyang Lin and Yingwei Pan and Rongfeng Lai and Xuehang Yang and Hongyang Chao and Ting Yao},
  journal={2021 IEEE International Conference on Multimedia and Expo (ICME)},
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision. Nevertheless, owing to the extremely varied aspect ratios and scales of text instances in real scenes, most conventional text detectors suffer from the sub-text problem that only localizes the fragments of text instance (i.e., sub-texts). In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module, to mitigate that… 
1 Citations

Figures and Tables from this paper

Transferrable Contrastive Learning for Visual Domain Adaptation
This work presents a particular paradigm of self-supervised learning tailored for domain adaptation, i.e., Transferrable Contrastive Learning (TCL), which links the SSL and the desired cross-domain transferability congruently, and finds contrastive learning intrinsically a suitable candidate fordomain adaptation.


Pyramid Mask Text Detector
A new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection, which performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance.
Detecting Text in Natural Image with Connectionist Text Proposal Network
A novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image and develops a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy.
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
To evaluate its robustness against curved text, DeconvNet is fine-tuned and benchmarked on Total-Text to facilitate a new research direction for the scene text community.
Shape Robust Text Detection With Progressive Scale Expansion Network
A novel Progressive Scale Expansion Network (PSENet) is proposed, which can precisely detect text instances with arbitrary shapes and is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
This paper proposes a novel unified relational reasoning graph network for arbitrary shape text detection through an innovative local graph that bridges a text proposal model via Convolutional Neural Network and a deep relational reasoning network via Graphconvolutional Network, making the network end-to-end trainable.
Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping
A novel bootstrapping technique is designed which samples multiple ‘subsections’ of a word or text line and accordingly relieves the constraint of limited training data effectively and improves the consistency of the predicted text feature maps which is critical in predicting a single complete instead of multiple broken boxes for long words or text lines.
Detecting Oriented Text in Natural Images by Linking Segments
SegLink, an oriented text detection method to decompose text into two locally detectable elements, namely segments and links, achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.
Detecting Curve Text in the Wild: New Dataset and New Solution
A polygon based curve text detector (CTD) which can directly detect curve text without empirical combination by seamlessly integrating the recurrent transverse and longitudinal offset connection (TLOC), which allows the CTD to explore context information instead of predicting points independently, resulting in more smooth and accurate detection.
Exploring Object Relation in Mean Teacher for Cross-Domain Detection
This work presents Mean Teacher with Object Relations (MTOR) that novelly remolds Mean Teacher under the backbone of Faster R-CNN by integrating the object relations into the measure of consistency cost between teacher and student modules.
ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT
  • Nibal Nayef, Fei Yin, J. Ogier
  • Computer Science
    2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
  • 2017
This paper presents the dataset, the tasks and the findings of this RRC-MLT challenge, which aims at assessing the ability of state-of-the-art methods to detect Multi-Lingual Text in scene images, such as in contents gathered from the Internet media and in modern cities where multiple cultures live and communicate together.