Learning to Predict More Accurate Text Instances for Scene Text Detection

  title={Learning to Predict More Accurate Text Instances for Scene Text Detection},
  author={Xiaoqian Li and Jie Liu and Shuwu Zhang and Guixuan Zhang},
At present, multi-oriented text detection methods based on deep neural network have achieved promising performances on various benchmarks. Nevertheless, there are still some difficulties for arbitrary shape text detection, especially for a simple and proper representation of arbitrary shape text instances. In this paper, a pixel-based text detector is proposed to facilitate the representation and prediction of text instances with arbitrary shapes in a simple manner. Firstly, to alleviate the… 
2 Citations
Mining text from natural scene and video images: A survey
This article provides a comprehensive review of both the non‐spotting and spotting based mining techniques and identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions.


Shape Robust Text Detection With Progressive Scale Expansion Network
A novel Progressive Scale Expansion Network (PSENet) is proposed, which can precisely detect text instances with arbitrary shapes and is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.
TextField: Learning a Deep Direction Field for Irregular Scene Text Detection
A novel text detector named TextField, which outperforms the state-of-the-art methods by a large margin on two curved text datasets: Total-Text and SCUT-CTW1500, respectively; TextField also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500.
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
A more flexible representation for scene text is proposed, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms and outperforms the baseline on Total-Text by more than 40% in F-measure.
Character Region Awareness for Text Detection
A new scene text detection method to effectively detect text area by exploring each character and affinity between characters by exploiting both the given character- level annotations for synthetic images and the estimated character-level ground-truths for real images acquired by the learned interim model.
Deep Direct Regression for Multi-oriented Scene Text Detection
A deep direct regression based method for multi-oriented scene text detection that achieves the F-measure of 81%, which is a new state-of-the-art and significantly outperforms previous approaches.
Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation
Recurrent neural network based adaptive text region representation is proposed for text region refinement, where a pair of boundary points are predicted each time step until no new points are found, and text regions of arbitrary shapes are detected and represented with adaptive number of boundary Points.
EAST: An Efficient and Accurate Scene Text Detector
This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.
Curved scene text detection via transverse and longitudinal sequence connection
This work constructed a curved text dataset called CTW1500, and used it to formulate a polygon-based curved text detector that can detect curved text without using an empirical combination, with the seamless integration of recurrent transverse and longitudinal offset connection.
PixelLink: Detecting Scene Text via Instance Segmentation
Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/non-text
Elite Loss for scene text detection
The proposed Elite Loss is intended to down-weight the contributions of the in-box not-on-Stoke pixels while paying more attention to the on-stoke pixels, and a segmentation-based method is designed to validate the effectiveness of the proposed Elite loss.