Unsupervised learning of text line segmentation by differentiating coarse patterns

  title={Unsupervised learning of text line segmentation by differentiating coarse patterns},
  author={Berat Kurar Barakat and Ahmad Droby and Raid Saabni and Jihad El-Sana},
Despite recent advances in the field of supervised deep learning for text line segmentation, unsupervised deep learning solutions are beginning to gain popularity. In this paper, we present an unsupervised deep learning method that embeds document image patches to a compact Euclidean space where distances correspond to a coarse text line pattern similarity. Once this space has been produced, text line segmentation can be easily implemented using standard techniques with the embedded feature… 



Unsupervised deep learning for text line segmentation

An unsupervised embedding of document image patches without a need for annotations is presented, with a challenging Arabic handwritten text line segmentation dataset, VML-AHTE, and achieves superior performance over the supervised methods.

Learning-Free Text Line Segmentation for Historical Handwritten Documents

We present a learning-free method for text line segmentation of historical handwritten document images. This method relies on automatic scale selection together with second derivative of anisotropic

Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture

A novel deep learning-based method for text line segmentation of historical documents based on using an adaptive U-Net architecture is presented.

Text Line Segmentation for Challenging Handwritten Document Images using Fully Convolutional Network

Using a new evaluation metric that is sensitive to over segmentation as well as under segmentation, testing results on a publicly available challenging handwritten dataset are comparable with the results of a previous work on the same dataset.

Labeling, Cutting, Grouping: An Efficient Text Line Segmentation Method for Medieval Manuscripts

This work proposes a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step, and demonstrates that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction.

Text line extraction using fully convolutional network and energy minimization

A fully convolutional network is proposed that is capable of finding out the pixels of text lines with various heights and interline proximity independent of their orientations and can finely split the touching and overlapping text lines without an orientation assumption.

Fully convolutional network with dilated convolutions for handwritten text line segmentation

A learning-based method for handwritten text line segmentation in document images using a variant of deep fully convolutional networks (FCNs) with dilated convolutions that outperforms the most popular variants of FCN, based on deconvolution or unpooling layers, on a public dataset.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

The Pinkas Dataset

Meta features of Pinkas dataset are presented and recent word spotting algorithms are applied to analyze the room for improvement in terms of performance and identify strengths and weaknesses of available processing algorithms.

A two-stage method for text line detection in historical documents

The developed method is capable of handling complex layouts as well as curved and arbitrarily oriented text lines and substantially outperforms current state-of-the-art approaches.