A Real-Time Scene Text to Speech System


 The system is based on an efficient end-to-end real-time scene text localization and recognition method [1,2,3]  Individual characters detected as Class-Specific Extremal Regions (CSERs) [4]  An efficient sequential classifier selects only ERs with locally maximal probability p(region|character) with complexity linear in the number of image pixels  The stability requirement of MSERs [5] is dropped; the detector has a lower memory footprint and handles better blurred, noisy and low-contrast text  A novel sequential classifier exploits more computationally expensive features without a negative impact on performance  Recognized text from subsequent frames is aggregated and sent to speech synthesizer EVALUATION 1: ICDAR 2011 DATASET

DOI: 10.1007/978-3-642-33885-4_66

Extracted Key Phrases

