DSRN: A Deep Scale Relationship Network for Scene Text Detection

  title={DSRN: A Deep Scale Relationship Network for Scene Text Detection},
  author={Yuxin Wang and Hongtao Xie and Zilong Fu and Yongdong Zhang},
  booktitle={International Joint Conference on Artificial Intelligence},
Nowadays, scene text detection has become increasingly important and popular. However, the large variance of text scale remains the main challenge and limits the detection performance in most previous methods. To address this problem, we propose an end-to-end architecture called Deep Scale Relationship Network (DSRN) to map multi-scale convolution features onto a scale invariant space to obtain uniform activation of multi-size text instances. Firstly, we develop a Scale-transfer module to… 

Figures and Tables from this paper

R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection

The proposed relationship network (R-Net) is a novel bi-directional con-volutional framework that maps multi-scale convolutional features to a scale-invariant space to obtain consistent activation of multi-size text instances.

BDFPN: Bi-Direction Feature Pyramid Network for Scene Text Detection

A novel Bi-Direction Feature Pyramid Network (BDFPN), which draws inspiration from the two-way visual information processing mechanism of human beings, and proposes a novel fusion strategy named Attention Fusion Module (AFM).

FDTA: Fully Convolutional Scene Text Detection With Text Attention

The method first optimizes the regression branch by designing a diagonal adjustment factor to make the position regression more accurate, and adds an attention module to the model, which improves the accuracy of detecting small text regions and increases F-score by 1.2.

Shape awareness and structure-preserving network for arbitrary shape text detection

Benefiting from two modules, the proposed GCM and TFM method effectively separates the text instances which are close to each other, while preserving detailed text structure.

CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes

This work proposes an anchor-free scene text detector leveraging Center-aware Representation to achieve accurate arbitrary-shaped scene text detection namely CRNet and proposes a center-aware location algorithm to explicitly learn center regions and center points of text instances, which is able to separate adjacent text instances effectively.

Self-Training for Domain Adaptive Scene Text Detection

A self-training framework to automatically mine hard examples with pseudo-labels from unannotated videos or images to reduce the noise of hard examples and achieve comparable or even superior results with the state-of-the-art methods.

Deep learning approaches to scene text detection: a comprehensive review

This paper presents a comprehensive review of deep learning approaches towards scene text detection, suitable deep frameworks for this task followed by critical analysis, and a categorical study of publicly available scene image datasets and applicable standard evaluation protocols with their pros and cons.

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

The ContourNet is proposed, which effectively handles the problems taking a further step toward accurate arbitrary-shaped text detection and suppresses false positives by only outputting predictions with high response value in both orthogonal directions.

A Simple and Strong Baseline: Progressively Region-based Scene Text Removal Networks

A novel ProgrEssively Region-based scene Text eraser (PERT), which introduces region-based modification strategy to progressively erase the pixels in only text region to ensure the integrity of text-free areas.



R2 CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection

A Rotational Region CNN (R2CNN) is designed, which includes a Text Region Proposal Network (Text-RPN) to estimate approximate text regions and a multitask refinement network to get the precise inclined box.

Deep Direct Regression for Multi-oriented Scene Text Detection

A deep direct regression based method for multi-oriented scene text detection that achieves the F-measure of 81%, which is a new state-of-the-art and significantly outperforms previous approaches.

Single Shot Text Detector with Regional Attention

A novel single-shot text detector that directly outputs word-level bounding boxes in a natural image and develops a hierarchical inception module which efficiently aggregates multi-scale inception features.

Geometry-Aware Scene Text Detection with Instance Transformation Network

In this paper, a novel Instance Transformation Network is presented to learn the geometry-aware representation encoding the unique geometric configurations of scene text instances with in-network transformation embedding, resulting in a robust and elegant framework to detect words or text lines at one pass.

EAST: An Efficient and Accurate Scene Text Detector

This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

A more flexible representation for scene text is proposed, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms and outperforms the baseline on Total-Text by more than 40% in F-measure.

Rotation-Sensitive Regression for Oriented Scene Text Detection

The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on several oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17, and COCO-Text, and achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.

Feature Pyramid Networks for Object Detection

This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection

A Scale Aware Network (SAN) is proposed that maps the convolutional features from the different scales onto a scale-invariant subspace to make CNN-based detection methods more robust to the scale variation, and also construct a unique learning method which considers purely the relationship between channels without the spatial information for the efficient learning of SAN.

PixelLink: Detecting Scene Text via Instance Segmentation

Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/non-text