Post-OCR Paragraph Recognition by Graph Convolutional Networks
@article{Wang2021PostOCRPR, title={Post-OCR Paragraph Recognition by Graph Convolutional Networks}, author={Renshen Wang and Yasuhisa Fujii and Ashok Popat}, journal={2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, year={2021}, pages={2533-2542} }
We propose a new approach for paragraph recognition in document images by spatial graph convolutional networks (GCN) applied on OCR text boxes. Two steps, namely line splitting and line clustering, are performed to extract paragraphs from the lines in OCR results. Each step uses a β-skeleton graph constructed from bounding boxes, where the graph edges provide efficient support for graph convolution operations. With pure layout input features, the GCN model size is 3~4 orders of magnitude…
Figures and Tables from this paper
One Citation
Unified Line and Paragraph Detection by Graph Convolutional Networks
- Computer ScienceDAS
- 2022
A graph convolutional network is used to predict the relations between text detection boxes and then build both levels of clusters from these predictions, demonstrating that the unified approach can be highly efficient while still achieving state-of-the-art quality for detecting paragraphs in public benchmarks and real-world images.
ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
- Computer ScienceACL
- 2021
This work proposes Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents that consistently improves existing GCNs with a margin up to 8.4% F1-score.
Towards End-to-End Unified Scene Text Detection and Layout Analysis
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
A novel method is proposed that is able to simultaneously detect scene text and form text clusters in a unified way and achieves state-of-the-art results on multiple scene text detection datasets without the need of complex post-processing.
CaptchaGG: A linear graphical CAPTCHA recognition model based on CNN and RNN
- Computer Science2022 9th International Conference on Digital Home (ICDH)
- 2022
CaptchaGG, a model for recognizing linear graphical CAPTCHAs, which has a simple architecture, extracting features by convolutional neural network, sequence modeling by recurrent neuralnetwork, and finally classification and recognition, can achieve an accuracy of 96% or more recognition at a lower complexity.
References
SHOWING 1-10 OF 45 REFERENCES
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
An end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images using a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text.
Table Detection in Invoice Documents by Graph Neural Networks
- Computer Science2019 International Conference on Document Analysis and Recognition (ICDAR)
- 2019
This work proposes a graph-based approach for detecting tables in document images that makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents.
A Machine Learning Approach for Graph-Based Page Segmentation
- Computer Science2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)
- 2018
This work proposes a new approach for segmenting a document image into its page components, based on simple machine learning models and graph-based techniques, which is easily adapted to the segmentation of a variety of document types.
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper proposes a novel unified relational reasoning graph network for arbitrary shape text detection through an innovative local graph that bridges a text proposal model via Convolutional Neural Network and a deep relational reasoning network via Graphconvolutional Network, making the network end-to-end trainable.
DeepLayout: A Semantic Segmentation Approach to Page Layout Analysis
- Computer ScienceICIC
- 2018
This paper introduces semantic segmentation which is an end-to-end trainable deep neural network which takes only document image as input and predicts per pixel saliency maps and successfully brings RLSA into post-processing procedures to specify the boundaries.
R2 CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection
- Computer Science2018 24th International Conference on Pattern Recognition (ICPR)
- 2018
A Rotational Region CNN (R2CNN) is designed, which includes a Text Region Proposal Network (Text-RPN) to estimate approximate text regions and a multitask refinement network to get the precise inclined box.
Deep Visual Template-Free Form Parsing
- Computer Science2019 International Conference on Document Analysis and Recognition (ICDAR)
- 2019
This work presents a learned, template-free solution to detecting pre-printed text and input text/handwriting and predicting pair-wise relationships between them, and shows that the proposed pairing method outperforms heuristic rules and that visual features are critical to obtaining high accuracy.
PubLayNet: Largest Dataset Ever for Document Layout Analysis
- Computer Science2019 International Conference on Document Analysis and Recognition (ICDAR)
- 2019
The PubLayNet dataset for document layout analysis is developed by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central and demonstrated that deep neural networks trained on Pub LayNet accurately recognize the layout of scientific articles.
Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning
- Computer ScienceAAAI
- 2018
It is shown that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers.
Graph Attention Networks
- Computer ScienceICLR
- 2018
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior…