Corpus ID: 232013716

Page Layout Analysis System for Unconstrained Historic Documents

  title={Page Layout Analysis System for Unconstrained Historic Documents},
  author={O. Kodym and Michal Hradi{\vs}},
Extraction of text regions and individual text lines from historic documents is necessary for automatic transcription. We propose extending a CNN-based text baseline detection system by adding line height and text block boundary predictions to the model output, allowing the system to extract more comprehensive layout information. We also show that pixel-wise text orientation prediction can be used for processing documents with multiple text orientations. We demonstrate that the proposed method… Expand

Figures and Tables from this paper

TS-Net: OCR Trained to Switch Between Text Transcription Styles


Multi-Task Handwritten Document Layout Analysis
Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization
Complete System for Text Line Extraction Using Convolutional Neural Networks and Watershed Transform
Dense prediction for text line segmentation in handwritten document images
  • Q. Vo, Gueesang Lee
  • Computer Science
  • 2016 IEEE International Conference on Image Processing (ICIP)
  • 2016
Textline detection in degraded historical document images
Labeling, Cutting, Grouping: An Efficient Text Line Segmentation Method for Medieval Manuscripts
Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks
docExtractor: An off-the-shelf historical document element extraction
  • Tom Monnier, Mathieu Aubry
  • Computer Science
  • 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)
  • 2020
Fast and Lightweight Text Line Detection on Historical Documents