LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

@inproceedings{Shen2021LayoutParserAU,
  title={LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis},
  author={Zejiang Shen and Ruochen Zhang and Melissa Dell and B. Lee and Jacob Carlson and Weining Li},
  booktitle={ICDAR},
  year={2021}
}
Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model… 
TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets
TLDR
This paper devise TableParser, a system capable of parsing tables in both native PDFs and scanned images with high precision and conducts extensive experiments to show the efficacy of domain adaptation in developing such a tool.
Hierarchical Visual Interface for Lecture Video Retrieval and Summarization
TLDR
A hierarchical visual interface for retrieving and summarizing lecture videos that can achieve high retrieval accuracy and good user experience is proposed.
Infrastructure for Rapid Open Knowledge Network Development
TLDR
A National Science Foundation Convergence Accelerator project is described to build a set of Knowledge Network Programming Infrastructure systems to address the issue of frustratingly slow building, using, and scaling large knowledge networks.
OCR Synthetic Benchmark Dataset for Indic Languages
We present the largest publicly available synthetic OCR benchmark dataset for Indic languages. The collection contains a total of 90k images and their ground truth for 23 Indic languages. OCR model
Perks and Pitfalls of City Directories as a Micro-Geographic Data Source
Historical city directories are rich sources of micro-geographic data. They provide information on the location of households and firms and their occupations and industries, respectively. We develop
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups
TLDR
New methods that explicitly model VIsual LAyout (VILA) groups, that is, text lines or text blocks, to further improve performance are introduced and it is shown that simply inserting special tokens denoting layout group boundaries into model inputs can lead to a 1.9% Macro F1 improvement in token classification.
A Hybrid Information Extraction Approach using Transfer Learning on Richly-Structured Documents
TLDR
A hybrid information extraction approach for documents with complex structures is proposed, which features a pipeline which uses OCR for plain textual information extraction and transfer learning for table detection from documents with such rich and complex structure.
Designing Gender Equity: Evidence from Hiring Practices and Committees*
This paper analyzes how different screening practices affect gender equity in hiring. I transform tens of millions of high-dimensional, unstructured records from Brazils public sector into selection
Incorporating Visual Layout Structures for Scientific Text Classification
TLDR
This work introduces new methods for incorporating VIsual LAyout (VILA) structures, e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance and designs a hierarchical model, H-VILA, that encodes the text based on layout structures.
...
1
2
...

References

SHOWING 1-10 OF 40 REFERENCES
dhSegment: A Generic Deep-Learning Approach for Document Segmentation
TLDR
This paper proposes an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks and shows that a single CNN-architecture can be used across tasks with competitive results.
A Large Dataset of Historical Japanese Documents with Complex Layouts
TLDR
This work presents HJDataset, a Large Dataset of Historical Japanese Documents with Complex Layouts, a large-scale dataset that contains over 250,000 layout element annotations of seven types and demonstrates the usefulness of the dataset on real-world document digitization tasks.
PubLayNet: Largest Dataset Ever for Document Layout Analysis
TLDR
The PubLayNet dataset for document layout analysis is developed by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central and demonstrated that deep neural networks trained on Pub LayNet accurately recognize the layout of scientific articles.
LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding
TLDR
LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks.
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
TLDR
A comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval finds a clear trend across different network architectures that ImageNetPre-training has a positive effect on classification as well as content- based retrieval.
DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images
TLDR
In contrast to most existing table detection and structure recognition methods, which are applicable only to PDFs, DeepDeSRT processes document images, which makes it equally suitable for born-digital PDFs as well as even harder problems, e.g. scanned documents.
Rethinking Table Recognition using Graph Neural Networks
TLDR
This paper proposes an architecture based on graph networks as a better alternative to standard neural networks for table recognition, argues that graph networks are a more natural choice for these problems, and explores two gradient-based graph neural networks.
Evaluation of deep convolutional nets for document image classification and retrieval
TLDR
A new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs), and makes available a new labelled subset of the IIT-CDIP collection, containing 400,000 document images across 16 categories.
ImageNet: A large-scale hierarchical image database
TLDR
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
AllenNLP: A Deep Semantic Natural Language Processing Platform
TLDR
AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily and provides a flexible data API that handles intelligent batching and padding, and a modular and extensible experiment framework that makes doing good science easy.
...
1
2
3
4
...