Weilin Huang

Learn More
Maximally Stable Extremal Regions (MSERs) have achieved great success in scene text detection. However, this low-level pixel operation inherently limits its capability for handling complex text information efficiently (e. g. connections between text or background components), leading to the difficulty in distinguishing texts from background components. In(More)
In this paper, we present a new approach for text lo-calization in natural images, by discriminating text and non-text regions at three levels: pixel, component and text-line levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color(More)
In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: a b s t r a c t Dimensionality reduction has long been associated with retinotopic(More)
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convo-lutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term(More)
Local binary pattern (LBP) has recently been proposed for texture analysis and local feature description and has also been applied to face recognition with promising results. However, besides the de-scriptors, a suitable similarity measure that can efficiently learn to distinguish facial features is also important. In this paper, a novel framework for(More)
a r t i c l e i n f o The curse of dimensionality has prompted intensive research in effective methods of mapping high dimensional data. Dimensionality reduction and subspace learning have been studied extensively and widely applied to feature extraction and pattern representation in image and vision applications. Although PCA has long been regarded as a(More)
—Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative(More)