dhSegment: A Generic Deep-Learning Approach for Document Segmentation

@article{Oliveira2018dhSegmentAG,
  title={dhSegment: A Generic Deep-Learning Approach for Document Segmentation},
  author={Sofia Ares Oliveira and Benoit Seguin and F. Kaplan},
  journal={2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)},
  year={2018},
  pages={7-12}
}
In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies. [...] Key Method We propose an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks. We show that a single CNN-architecture can be used across tasks with competitive results. Moreover most of the task-specific post-precessing steps can be decomposed in a small number of simple and standard…Expand
docExtractor: An off-the-shelf historical document element extraction
  • Tom Monnier, Mathieu Aubry
  • Computer Science
  • 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR)
  • 2020
TLDR
It is argued that the performance obtained without fine-tuning on a specific dataset is critical for applications, in particular in digital humanities, and that the line-level page segmentation is the most relevant for a general purpose element extraction engine. Expand
An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers
TLDR
A systematic evaluation of 11 different published backbone architectures and 9 different tiling and scaling configurations for separating text, tables or table column lines shows that (depending on the task) Inception-ResNet-v2 and EfficientNet backbones work best, vertical tiling is generally preferable to other tiling approaches, and training data that comprises 30 to 40 pages will be sufficient most of the time. Expand
End-to-End Information Extraction by Character-Level Embedding and Multi-Stage Attentional U-Net
TLDR
A novel deep learning architecture for end-to-end information extraction on the 2D character-grid embedding of the document, namely the Multi-Stage Attentional U-Net, which leverages a specialized multi-stage encoder-decoders design, in conjunction with efficient uses of the self-attention mechanism and the box convolution. Expand
Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data
TLDR
This paper proposes a novel method for the synthesis of training data for semantic segmentation of document images by utilizing clusters found in intermediate features of a StyleGAN generator for the synthesisation of RGB and label images at the same time. Expand
Multi-scale Gated Fully Convolutional DenseNets for semantic labeling of historical newspaper images
TLDR
This work proposes a fully convolutional neural network architecture (FCN) that outputs a pixel-labeling of the various semantic entities that occur in historical newspaper images and demonstrates that this proposition outperforms standard FCN architectures. Expand
A Large Dataset of Historical Japanese Documents with Complex Layouts
TLDR
This work presents HJDataset, a Large Dataset of Historical Japanese Documents with Complex Layouts, a large-scale dataset that contains over 250,000 layout element annotations of seven types and demonstrates the usefulness of the dataset on real-world document digitization tasks. Expand
Multimodal deep networks for text and image-based document classification
TLDR
A multimodal neural network able to learn from word embeddings, computed on text extracted by OCR, and from the image is designed that boosts pure image accuracy by 3% on Tobacco3482 and RVL-CDIP augmented by the new QS-OCR text dataset, even without clean text information. Expand
SPAN: a Simple Predict & Align Network for Handwritten Paragraph Recognition
TLDR
The Simple Predict & Align Network is proposed: an end-to-end recurrence-free Fully Convolutional Network performing OCR at paragraph level without any prior segmentation stage without any loss of accuracy. Expand
Ancient Document Layout Analysis: Autoencoders meet Sparse Coding
TLDR
Experimental results on DIVA - HisDB dataset demonstrate that the proposed method outperforms previous approaches based on unsupervised representation learning while achieving performances comparable to the state-of-the-art fully supervised methods. Expand
Boosting Offline Handwritten Text Recognition in Historical Documents With Few Labeled Lines
TLDR
This paper addresses the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set and proposes an algorithm to mitigate the effects of incorrect labeling in the training set. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
Robust, Simple Page Segmentation Using Hybrid Convolutional MDLSTM Networks
  • T. Breuel
  • Computer Science
  • 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
  • 2017
TLDR
It is demonstrated that relatively simple networks are capable of fast, reliable text line segmentation and document layout analysis even on complex and noisy inputs, without manual parameter tuning or heuristics. Expand
PageNet: Page Boundary Extraction in Historical Handwritten Documents
TLDR
A deep learning system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise and can segment documents that are overlayed on top of other documents. Expand
Convolutional Neural Networks for Page Segmentation of Historical Document Images
  • Kai Chen, Mathias Seuret
  • Computer Science, Mathematics
  • 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
  • 2017
TLDR
A simple CNN is trained with only one convolution layer for page segmentation for handwritten historical document images based on a Convolutional Neural Network to learn features from raw image pixels using a CNN. Expand
DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts
TLDR
A publicly available historical manuscript database DIVA-HisDB is introduced for the evaluation of several Document Image Analysis (DIA) tasks and a layout analysis ground-truth which has been iterated on, reviewed, and refined by an expert in medieval studies is provided. Expand
Page Segmentation for Historical Handwritten Documents Using Fully Convolutional Networks
TLDR
Experimental results on the public dataset DIVA-HisDB containing challenging medieval manuscripts demonstrate the effectiveness and superiority of the proposed pixel-wise segmentation method, which yields pixel-level accuracy of above 99%. Expand
READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents
TLDR
This paper collects and annotates 2036 archival document images from different locations and time periods and proposes a new evaluation scheme that is based on baselines, which has no need for binarization and it can handle skewed as well as rotated text lines. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts
TLDR
A new challenging dataset and state-of-the-art benchmark results for pixel-labelling and text line segmentation and a combination of the best layout analysis method with an adapted seam-carving based method achieves better results than the best contestant. Expand
U-Net: Convolutional Networks for Biomedical Image Segmentation
TLDR
It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Expand
cBAD: ICDAR2017 Competition on Baseline Detection
TLDR
The cBAD competition aims at benchmarking state-of-the-art baseline detection algorithms and presents a new one that introduces baselines to measure performance. Expand
...
1
2
3
...