• Publications
  • Influence
LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
TLDR
The core LayoutParser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks and incorporates a community platform for sharing both pre-trained models and full document digitization pipelines.
A Large Dataset of Historical Japanese Documents with Complex Layouts
TLDR
This work presents HJDataset, a Large Dataset of Historical Japanese Documents with Complex Layouts, a large-scale dataset that contains over 250,000 layout element annotations of seven types and demonstrates the usefulness of the dataset on real-world document digitization tasks.
Information Extraction from Text Regions with Complex Tabular Structure
TLDR
This paper presents a new dataset with complex tabular structure, and proposes new methods to robustly retrieve information from the complex text region.
Deep Learning based Framework for Automatic Damage Detection in Aircraft Engine Borescope Inspection
TLDR
A deep learning based framework is proposed which utilizes the state-of-the-art algorithm called Fully Convolutional Networks (FCN) to identify and locate damages from borescope images to identify two major types of damages, namely crack and burn.
OLALA: Object-Level Active Learning Based Layout Annotation
TLDR
This work introduces an Object-Level Active Learning based Layout Annotation framework, OLALA, which includes an object scoring method and a prediction correction algorithm that selects only the most ambiguous object prediction regions within an image for annotators to label, optimizing the use of the annotation budget.
Incorporating Visual Layout Structures for Scientific Text Classification
TLDR
This work introduces new methods for incorporating VIsual LAyout (VILA) structures, e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance and designs a hierarchical model, H-VILA, that encodes the text based on layout structures.
OLALA: Object-Level Active Learning for Efficient Document Layout Annotation
TLDR
An Object-Level Active Learning framework for efficient document layout Annotation, OLALA, where only regions with the most ambiguous object predictions within an image are selected for annotators to label, optimizing the use of the annotation budget.
PAWLS: PDF Annotation With Labels and Structure
TLDR
This paper presents PDF Annotation with Labels and Structure (PAWLS), a new annotation tool designed specifically for the PDF document format, particularly suited for mixed-mode annotation and scenarios in which annotators require extended context to annotate accurately.
Improving Unpaired Object Translation for Unaligned Domains
Generative Adversarial Networks have shown promise in unpaired image translation. However, translating unpaired objects from unaligned domains is an unsolved problem. Existing methods are restricted
Generating Object Stamps
TLDR
An algorithm to generate diverse foreground objects and composite them into background images using a GAN architecture that allows for improved overall quality and diversity compared to state-of-the-art object insertion approaches.
...
1
2
...