Rosetta: Large Scale System for Text Detection and Recognition in Images

  title={Rosetta: Large Scale System for Text Detection and Recognition in Images},
  author={Fedor Borisyuk and Albert Gordo and Viswanath Sivakumar},
  journal={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta , designed to process images uploaded daily at Facebook scale. [] Key Method We present modeling techniques for efficient detection and recognition of text in images and describe Rosetta 's system architecture. We perform extensive evaluation of presented technologies, explain useful practical approaches to build an OCR system at scale, and provide insightful intuitions as to why and how certain…

Figures and Tables from this paper

Improving Rotated Text Detection with Rotation Region Proposal Networks

This work extends the scene-text extraction system at Facebook, Rosetta, to efficiently handle text in various orientations and incorporates the Rotation Region Proposal Networks (RRPN) in the text extraction pipeline and offers practical suggestions for building and deploying a model for detecting and recognizing text in arbitrary orientations efficiently.

STRIDE: Scene Text Recognition In-Device

This work develops an efficient lightweight scene text recognition (STR) system, which has only 0.88M parameters and performs real-time text recognition and introduces a novel orientation classifier module, to support the simultaneous recognition of both horizontal and vertical text.

VisRel: Media Search at Scale

VisRel, a deployed large-scale media search system that leverages text understanding, media understanding, and multimodal technologies to deliver a modern multimedia search experience, is presented.

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

This work presents a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms, and is the first publicly available dataset with comprehensive annotations to address FoUn task.

2D Positional Embedding-based Transformer for Scene Text Recognition

This paper uses a Transformer-based architecture for recognizing both regular and irregular text-in-the-wild images and demonstrates that the proposed scene text recognition method outperformed the state-of- the-art in most cases, especially on irregular-text recognition.

PreSTU: Pre-Training for Scene-Text Understanding

P RE STU introduces OCR-aware pre-training objectives that encourage the model to recognize text from an image and to connect what is recognized to the rest of the image content.

Multi-granularity Deep Local Representations for Irregular Scene Text Recognition

A hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text and achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVt, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.

A Novel Joint Character Categorization and Localization Approach for Character-Level Scene Text Recognition

This paper proposes a novel character-level scene text recognition framework for simultaneously categorizing and localizing characters, and presents an effective joint learning strategy to help the approach to learn from both character- level annotation and word-level annotation.

An Approach for Detecting Image Spam in OSNs

This paper proposes an adversary-aware model for detecting spam images in OSNs that adopted EAST and CRNN models for text detection/ recognition tasks and is adaptable and robust against adversarial text attacks.

Scalable Document Image Information Extraction with Application to Domain-Specific Analysis

This paper provides an efficient text recognition approach that makes a trade-off between performance and running speed for document images and a novel information extraction method with both visual and semantic information.



COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

The COCO-Text dataset is described, which contains over 173k text annotations in over 63k images and presents an analysis of three leading state-of-the-art photo Optical Character Recognition (OCR) approaches on the dataset.

Reading Text in the Wild with Convolutional Neural Networks

An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

Detecting Oriented Text in Natural Images by Linking Segments

SegLink, an oriented text detection method to decompose text into two locally detectable elements, namely segments and links, achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin.

Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild

A novel text detection algorithm which is composed of two cascaded steps that can accurately localize word or text line in arbitrary orientations, including curved text lines which cannot be handled in a lot of other frameworks.

Synthetic Data for Text Localisation in Natural Images

The relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning, are discussed.

Word Spotting in the Wild

It is argued that the appearance of words in the wild spans this range of difficulties and a new word recognition approach based on state-of-the-art methods from generic object recognition is proposed, in which object categories are considered to be the words themselves.

ImageNet: A large-scale hierarchical image database

A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

Scene Text Recognition with Sliding Convolutional Character Models

The proposed scene text recognition method with character models on convolutional feature map bases on character models trained free of lexicon, and can recognize unknown words has a number of appealing properties.

ICDAR2017 Robust Reading Challenge on COCO-Text

The datasets and the ground truth are described, the performance evaluation protocols used are detailed and the final results are presented along with a brief summary of the participating methods.

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically,