Corpus ID: 236169716

Post-OCR Paragraph Recognition by Graph Convolutional Networks

@inproceedings{Wang2021PostOCRPR,
  title={Post-OCR Paragraph Recognition by Graph Convolutional Networks},
  author={Renshen Wang and Yasuhisa Fujii and Ashok Popat},
  year={2021}
}
We propose a new approach for paragraph recognition in document images by spatial graph convolutional networks (GCN) applied on OCR text boxes. Two steps, namely line splitting and line clustering, are performed to extract paragraphs from the lines in OCR results. Each step uses a βskeleton graph constructed from bounding boxes, where the graph edges provide efficient support for graph convolution operations. With pure layout input features, the GCN model size is 3∼4 orders of magnitude smaller… Expand
ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
TLDR
This work proposes Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents that consistently improves existing GCNs with a margin up to 8.4% F1-score. Expand

References

SHOWING 1-10 OF 51 REFERENCES
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks
TLDR
An end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images using a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Expand
Table Detection in Invoice Documents by Graph Neural Networks
TLDR
This work proposes a graph-based approach for detecting tables in document images that makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Expand
Page Segmentation using a Convolutional Neural Network with Trainable Co-Occurrence Features
TLDR
A method for page segmentation using a CNN with trainable multiplication layers (TMLs) specialized for extracting co-occurrences from feature maps, thereby supporting the detection of objects with similar textures and periodicities is proposed. Expand
R2 CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection
  • Yingying Jiang, Xiangyu Zhu, +5 authors Zhenbo Luo
  • Computer Science
  • 2018 24th International Conference on Pattern Recognition (ICPR)
  • 2018
TLDR
A Rotational Region CNN (R2CNN) is designed, which includes a Text Region Proposal Network (Text-RPN) to estimate approximate text regions and a multitask refinement network to get the precise inclined box. Expand
A Machine Learning Approach for Graph-Based Page Segmentation
TLDR
This work proposes a new approach for segmenting a document image into its page components, based on simple machine learning models and graph-based techniques, which is easily adapted to the segmentation of a variety of document types. Expand
DeepLayout: A Semantic Segmentation Approach to Page Layout Analysis
TLDR
This paper introduces semantic segmentation which is an end-to-end trainable deep neural network which takes only document image as input and predicts per pixel saliency maps and successfully brings RLSA into post-processing procedures to specify the boundaries. Expand
Deep Visual Template-Free Form Parsing
TLDR
This work presents a learned, template-free solution to detecting pre-printed text and input text/handwriting and predicting pair-wise relationships between them, and shows that the proposed pairing method outperforms heuristic rules and that visual features are critical to obtaining high accuracy. Expand
PubLayNet: Largest Dataset Ever for Document Layout Analysis
TLDR
The PubLayNet dataset for document layout analysis is developed by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central and demonstrated that deep neural networks trained on Pub LayNet accurately recognize the layout of scientific articles. Expand
Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning
TLDR
It is shown that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers. Expand
ICDAR2019 Competition on Recognition of Documents with Complex Layouts - RDCL2019
TLDR
An objective comparative evaluation of page segmentation and region classification methods for docu-ments with complex layouts indicates that an innovative approach has a clear advantage but there is still a considerable need to develop robust methods that deal with layout challenges, especially with the non-textual content. Expand
...
1
2
3
4
5
...