Deep Direct Regression for Multi-oriented Scene Text Detection

  title={Deep Direct Regression for Multi-oriented Scene Text Detection},
  author={Wenhao He and Xu-Yao Zhang and Fei Yin and Cheng-Lin Liu},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
In this paper, we first provide a new perspective to divide existing high performance object detection methods into direct and indirect regressions. [] Key Method Our detection framework is simple and effective with a fully convolutional network and one-step post processing.

Figures and Tables from this paper

Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression
This paper proposes a scene text detection framework based on fully convolutional network with a bi-task prediction module, in which one is a pixel-wise classification between the text and non-text and the other is pixels-wise regression to determine the vertex coordinates of quadrilateral text boundaries.
A Novel Method for Text Detection in Arbitrary Scenes Based on Multi-scale Segmentation Networks
A pixel-based method based on an efficient segmentation network with multi-scale features that requires less training time and eliminates the trouble of setting anchors is proposed.
PixelLink: Detecting Scene Text via Instance Segmentation
Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/non-text
Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation
This paper proposes to detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions and achieves better or comparable results in both accuracy and efficiency.
A Multi-Oriented Scene Text Detection Method Based on Location-Sensitive Segmentation
A multi-oriented scene text detection method based on location-sensitive segmentation that divides the whole text instance detection into three sub-text instances (left part, middle part, and right part) detection and filters false positives.
Irregular scene text detection via attention guided border labeling
A novel method to detect irregular scene texts based on instance-aware segmentation based on an attention guided semantic segmentation model to precisely label the weighted borders of text regions is proposed.
Rotation-Sensitive Regression for Oriented Scene Text Detection
The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on several oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17, and COCO-Text, and achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.
Sliding Line Point Regression for Shape Robust Scene Text Detection
  • Yixing Zhu, Jun Du
  • Computer Science
    2018 24th International Conference on Pattern Recognition (ICPR)
  • 2018
This study proposes a novel method named sliding line point regression (SLPR) in order to detect arbitrary-shape text in natural scene and achieved competitive results on traditional ICDAR2015 Incidental Scene Text benchmark and curve text detection dataset CTW1500.


Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection
  • Yuliang Liu, Lianwen Jin
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
A new Convolutional Neural Networks (CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle, which has better overall performance than L2 loss and smooth L1 loss in terms of robustness and stability.
Synthetic Data for Text Localisation in Natural Images
The relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning, are discussed.
Detecting Text in Natural Image with Connectionist Text Proposal Network
A novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image and develops a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy.
Text-Attentional Convolutional Neural Network for Scene Text Detection
A new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components and a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed.
A Hybrid Approach to Detect and Localize Texts in Natural Scene Images
A hybrid approach to robustly detect and localize texts in natural scene images using a text region detector, a conditional random field model, and a learning-based energy minimization method are presented.
Multi-oriented Text Detection with Fully Convolutional Networks
A novel approach for text detection in natural images that consistently achieves the state-of-the-art performance on three text detection benchmarks: MSRA-TD500, I CDAR2015 and ICDAR2013.
Text Flow: A Unified Text Detection System in Natural Scene Images
The proposed unified scene text detection system, namely Text Flow, is proposed by utilizing the minimum cost (min-cost) flow network model and it outperforms the state-of-the-art methods on all three datasets with much higher recall and F-score.
DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images
A novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN) and a set of text characteristic prior bounding boxes to achieve high word recall with only hundred level candidate proposals are developed.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
SSD: Single Shot MultiBox Detector
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.