Learn More
Maximally Stable Extremal Regions (MSERs) have achieved great success in scene text detection. However, this low-level pixel operation inherently limits its capability for handling complex text information efficiently (e. g. connections between text or background components), leading to the difficulty in distinguishing texts from background components. In(More)
We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal,(More)
In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and text line levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color(More)
VGGNets have turned out to be effective for object recognition in still images. However, it is unable to yield good performance by directly adapting the VGGNet models trained on the ImageNet dataset for scene recognition. This report describes our implementation of training the VGGNets on the large-scale Places205 dataset. Specifically , we train three(More)
In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: a b s t r a c t Dimensionality reduction has long been associated with retinotopic(More)
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convo-lutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term(More)
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative(More)
Principal component analysis (PCA) has long been a simple, efficient technique for dimensionality reduction. However, many nonlinear methods such as local linear embedding and curvilinear component analysis have been proposed for increasingly complex nonlinear data recently. In this paper, we investigate and compare linear PCA and various nonlinear methods(More)
Local binary pattern (LBP) has recently been proposed for texture analysis and local feature description and has also been applied to face recognition with promising results. However, besides the descriptors, a suitable similarity measure that can efficiently learn to distinguish facial features is also important. In this paper, a novel framework for robust(More)