Fourier Contour Embedding for Arbitrary-Shaped Text Detection

@article{Zhu2021FourierCE,
  title={Fourier Contour Embedding for Arbitrary-Shaped Text Detection},
  author={Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhuanghui Kuang and Lianwen Jin and Wayne Zhang},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={3122-3130}
}
One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances. Most of existing methods model text instances in image spatial domain via masks or contour point sequences in the Cartesian or the polar coordinate system. However, the mask representation might lead to expensive post-processing, while the point sequence one may have limited capability to model texts with highly-curved… 

Figures and Tables from this paper

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection
TLDR
Without bells and whistles, experimental results show that the proposed I3CL sets new state-of-the-art results on three challenging public benchmarks, i.e. an F-measure of 77.5% on ArT, 86.9% on Total-Text, and 86.4% on CTW-1500.
TPSNet: Thin-Plate-Spline Representation for Arbitrary Shape Scene Text Detection
TLDR
To solve the supervision problem of TPS training without key point annotations, two novel losses including the boundary set loss and the shape alignment loss are proposed.
On Exploring and Improving Robustness of Scene Text Detection Models
TLDR
A simple yet effective databased method to destroy the smoothness of text regions by merging background and foreground, which can significantly increase the robustness of different text detection networks.
What's Wrong with the Bottom-up Methods in Arbitrary-shape Scene Text Detection
TLDR
This paper revitalises the classic text detection frameworks by aggregating the visualrelational features of text with two effective false positive/negative suppression mechanisms, and develops a novel multiple-text-map-aware contourapproximation strategy.
Arbitrary Shape Text Detection via Boundary Transformer
TLDR
An arbitrary shape text detector with a boundary transformer, which can accurately and directly locate text boundaries without any post-processing is proposed, and a novel bound- ary energy loss is proposed which introduces an energy minimization constraint and an energy monotonically decreasing constraint for every boundary optimization step.
U NITAIL : D ETECTING , R EADING , AND M ATCHING IN R ETAIL S CENE
take advantages of the Unitail and provide comprehensive benchmark experiments on various state-of-the-art methods.
Unitail: Detecting, Reading, and Matching in Retail Scene
take advantages of the Unitail and provide comprehensive benchmark experiments on various state-of-the-art methods.
MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
We present MMOCR---an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key
Scene Uyghur Text Detection Based on Fine-Grained Feature Representation
TLDR
A multi-directional scene Uyghur text detection model based on fine-grained feature representation and spatial feature fusion is proposed, and feature extraction and feature fusion are improved to enhance the network’s ability to represent multi-scale features.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection
TLDR
This work proposes an arbitrary-shaped text detection method, namely TextRay, which conducts top-down contour-based geometric modeling and geometric parameter learning within a single-shot anchor-free framework, and designs a central-weighted training strategy and a content loss which builds propagation paths between geometric encodings and visual content.
ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network
TLDR
For the first time, a novel BezierAlign layer is designed for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods and introducing negligible computation overhead.
TextField: Learning a Deep Direction Field for Irregular Scene Text Detection
TLDR
A novel text detector named TextField, which outperforms the state-of-the-art methods by a large margin on two curved text datasets: Total-Text and SCUT-CTW1500, respectively; TextField also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500.
Edge and Curve Detection for Visual Scene Analysis
TLDR
Simple sets of parallel operations are described which can be used to detect texture edges, "spots," and "streaks" in digitized pictures and it is shown that a composite output is constructed in which edges between differently textured regions are detected, and isolated objects are also detected, but the objects composing the textures are ignored.
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
TLDR
This paper proposes a novel unified relational reasoning graph network for arbitrary shape text detection through an innovative local graph that bridges a text proposal model via Convolutional Neural Network and a deep relational reasoning network via Graphconvolutional Network, making the network end-to-end trainable.
Learning Shape-Aware Embedding for Scene Text Detection
TLDR
This work treats text detection as instance segmentation and proposes a segmentation-based framework, which extracts each text instance as an independent connected component and maps pixels onto an embedding space to distinguish different text instances.
Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes
TLDR
A novel text detector namely LOMO is presented, which localizes the text progressively for multiple times (or in other word, LOok More than Once), and the state-of-the-art results on several public benchmarks confirm the striking robustness and effectiveness of LomO.
Deformable ConvNets V2: More Deformable, Better Results
TLDR
This work presents a reformulation of Deformable Convolutional Networks that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training, and guides network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features.
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
TLDR
A more flexible representation for scene text is proposed, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms and outperforms the baseline on Total-Text by more than 40% in F-measure.
...
...