Reading Digits in Natural Images with Unsupervised Feature Learning

Abstract

Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the problem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we introduce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed features. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.

7 Figures and Tables

0100200201220132014201520162017
Citations per Year

486 Citations

Semantic Scholar estimates that this publication has 486 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Netzer2011ReadingDI, title={Reading Digits in Natural Images with Unsupervised Feature Learning}, author={Yuval Netzer and Tao Wang and Adam Coates and Alessandro Bissacco and Bo Wu and Andrew Y. Ng}, year={2011} }