Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video

  title={Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video},
  author={Oussama Zayene and Sameh Masmoudi Touj and Jean Hennebert and Rolf Ingold and Najoua Essoukri Ben Amara},
  journal={IET Comput. Vis.},
This study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non-trivial task due to many challenges like the variability of text patterns and the complexity of backgrounds. In the case of Arabic, the presence of diacritic marks, the cursive nature of the script and the non… 

Figures and Tables from this paper

A Review of Arabic Text Recognition Dataset
This paper has contribute in profiling the Arabic text recognition dataset thus determining the nature of OCR system developed and has identified research direction in building Arabic text Recognition dataset.
Impact of Pre-Processing on Recognition of Cursive Video Text
Experimental study on a dataset of 12,000 text lines in cursive Urdu text reveals that appropriately pre-processing the text line images significantly improves the recognition rates.
Quranic Optical Text Recognition Using Deep Learning Models
A Quranic optical character recognition (OCR) system based on convolutional neural network (CNN) followed by recurrent Neural Network (RNN) is introduced in this work, and a better performance in word recognition rate (WRR) and character recognition rates (CRR) is achieved in the experiments.
Cursive Character Recognition in Natural Scene Images Using a Multilevel Convolutional Neural Network Fusion
A multi-scale feature aggregation (MSFA) and a multi-level feature fusion (MLFF) network architecture to recognize isolated Urdu characters in natural images is proposed and experimental results show that the aggregation of multi- scale and multilevel features and their fusion is more effective, and outperforms other methods on the Urdu character image and Chars74K datasets.
Persian OCR with Cascaded Convolutional Neural Networks Supported by Language Model
A cascaded Convolutional Neural Network is utilized to convert sub-word images into text and comparison results with Tesseract OCR engine shows the superiority of the algorithm.
Detection and recognition of cursive text from video frames
This paper presents a comprehensive framework for detection and recognition of textual content in video frames, and proposes a UrduNet, a combination of CNNs and long short- term memory (LSTM) networks.
Sub-word based Persian OCR Using Auto-Encoder Features and Cascade Classifier
With the help of a complete sub-word image dictionary, a new approach for Persian OCR is presented and two cascade SVM classifiers that are trained with the features extracted by Auto-Encoder, are exploited.
Text recognition and classification of english teaching content based on SVM
  • Huiyan Li
  • Computer Science
    J. Intell. Fuzzy Syst.
  • 2020
This study applies convolutional neural network algorithm to text recognition of English teaching content, and effectively recognizes text features, and can be applied to the text recognition classification ofEnglish teaching content and can provide reference direction for related research.
Design and Implementation of Natural Scene Text Recognition System Based on Deep Learning
The design and implementation of a natural scene text recognition system based on deep learning is studied, each functional module of the system is designed in detail, and finally in SVT.


Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features
A robust feature extraction approach that extracts feature based on right-to-left sliding window that significantly reduce the label error for Urdu Nasta’liq text lines and outperforms the state-of-the-art results.
Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos
This paper focuses on recognizing Arabic embedded text in videos using Convolutional Neural Networks and Deep Auto-Encoders and a connectionist recurrent approach to predict correct transcriptions of the input image from the associated sequence of features.
Segmentation-free handwritten Chinese text recognition with LSTM-RNN
We present initial results on the use of Multi-Dimensional Long-Short Term Memory Recurrent Neural Networks (MDLSTM-RNN) in recognizing lines of handwritten Chinese text without explicit segmentation
Chinese Image Text Recognition with BLSTM-CTC: A Segmentation-Free Method
A novel scheme to tackle the Chinese image text recognition problem, different from traditional methods that perform the recognition on the single character level, the input of BLSTM-CTC is an image text composed of a line of characters and the output is a recognized text sequence, where the recognition is carried out on the whole image text level.
Accurate Scene Text Recognition Based on Recurrent Neural Network
This paper presents a novel approach to recognize text in scene images that outperforms the state-of-the-art techniques significantly and is able to recognize the whole word images without character-level segmentation and recognition.
NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR
A robust approach for text extraction and recognition from video clips which is called Neuro-Fuzzy system for Arabic Video OCR is proposed.
Text Recognition in Videos Using a Recurrent Connectionist Approach
This paper proposes a novel approach able to avoid any explicit character segmentation in OCR systems developed to recognize texts embedded in multimedia documents using a multi-scale scanning scheme that achieves very high recognition rates.
Recognition and transition frame detection of Arabic news captions for video retrieval
A dedicated OCR for recognizing low resolution news captions in video images and a technique based on inter-frame text difference to detect transition frame of still newsCaptions, which is necessary for efficient video retrieve and play.
A dataset for Arabic text detection, tracking and recognition in news videos- AcTiV
A region-based text detection approach is presented in addition to a set of evaluation protocols on which the performance of different systems can be measured.
Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition
A system that directly transcribes scene text images to text without character segmentation is developed and achieves competitive performance comparison with the state of the art on several public scene text datasets, including both lexicon-based and non-lexicon ones.