• Publications
  • Influence
Detecting Text in Natural Image with Connectionist Text Proposal Network
TLDR
We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. Expand
  • 450
  • 74
  • PDF
Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees
TLDR
In this paper, we propose a novel framework to tackle this problem by leveraging the high capability of convolutional neural network (CNN). Expand
  • 297
  • 21
  • PDF
Single Shot Text Detector with Regional Attention
TLDR
We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. Expand
  • 170
  • 15
  • PDF
An End-to-End TextSpotter with Explicit Alignment and Attention
TLDR
We propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in arbitrary orientation, which is the key to boost the performance; (2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition; (3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model. Expand
  • 92
  • 15
  • PDF
Text-Attentional Convolutional Neural Network for Scene Text Detection
TLDR
In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network that particularly focuses on extracting text-related regions and features from the image components. Expand
  • 202
  • 12
  • PDF
Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
TLDR
In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and text line levels. Expand
  • 226
  • 12
  • PDF
Deep Metric Learning with Hierarchical Triplet Loss
TLDR
We present a novel hierarchical triplet loss (HTL) capable of automatically collecting informative training samples (triplets) via a defined hierarchical tree that encodes global context information. Expand
  • 131
  • 12
  • PDF
Places205-VGGNet Models for Scene Recognition
TLDR
This report describes our implementation of training the VGGNets on the large-scale Places205 dataset with a Multi-GPU extension of Caffe. Expand
  • 84
  • 12
  • PDF
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images
TLDR
We present a simple yet efficient approach capable of training deep neural networks on large-scale weakly supervised web images, which are crawled raw from the Internet by using text queries, without any human annotation. Expand
  • 93
  • 10
  • PDF
Reading Scene Text in Deep Convolutional Sequences
TLDR
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. Expand
  • 166
  • 8
  • PDF