• Corpus ID: 88518818

CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor

@article{Zhao2019CUTIELT,
  title={CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor},
  author={Xiaohui Zhao and Zhuo Wu and Xiaoguang Wang},
  journal={ArXiv},
  year={2019},
  volume={abs/1903.12363}
}
Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. [] Key Method Specifically, our proposed model, Convolutional Universal Text Information Extractor (CUTIE), applies convolutional neural networks on gridded texts where texts are embedded as features with semantical…

Figures and Tables from this paper

Key Information Extraction From Documents: Evaluation And Generator

TLDR
A template-based document generator was created and the results have shown that NLP based pre-processing is beneficial for model performance, however, the use of a bounding box regression decoder increases the model performance only for fields that do not follow a rectangular shape.

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

TLDR
This paper proposes a unified end-to-end text reading and information extraction network, where the two tasks can reinforce each other and the multimodal visual and textual features of text reading are fused for information extraction and in turn, the semantics in information extraction contribute to the optimization of text read.

TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents

TLDR
A end-to-end information extraction framework from visually rich documents, where text reading and information extraction can reinforce each other via a well-designed multi-modal context block is proposed.

Data-Efficient Information Extraction from Documents with Pre-trained Language Models

TLDR
LayoutLM, a pre-trained model recently proposed for encoding 2D documents, reveals a high sample-efficiency when fine-tuned on public and real-world Information Extraction (IE) datasets, thus indicating valuable knowledge transfer abilities.

Data-Efficient Information Extraction from Documents with Pre-Trained Language Models

TLDR
LayoutLM, a pre-trained model recently proposed for encoding 2D documents, reveals a high sample-efficiency when fine-tuned on public and real-world Information Extraction (IE) datasets, thus indicating valuable knowledge transfer abilities.

Information Extraction from Invoices

TLDR
A system that achieves competitive results using a small amount of data compared to the state-of-the-art systems that need to be trained on large datasets, that are costly and impractical to produce in real-world applications is presented.

Extracting Zero-shot Structured Information from Form-like Documents: Pretraining with Keys and Triggers

TLDR
Experiments with the fine-tuning step to two applications show that the proposed model achieves more than 70% accuracy for the extraction of zero-shot keys while previous methods all fail.

Attention-Based Graph Neural Network with Global Context Awareness for Document Understanding

TLDR
This work proposes the attention-based graph neural network to combine textual and visual information from document images and shows that this method outperforms baseline methods by significant margins.

Deep Learning based Automatic Extraction of Student Performance from Gazette Assessment Data

TLDR
An approach for extracting the data from Exam Result Gazette document and then storing it in a CSV file and results indicate that the system could be used for efficiently extracting the required fields from given exam result gazette document.

Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout

TLDR
A new task is introduced (named Kleister) with two new datasets to encourage progress on deeper and more complex Information Extraction (IE) and Pipeline method is proposed as a text-only baseline with different Named Entity Recognition architectures (Flair, BERT, RoBERTa).

References

SHOWING 1-10 OF 11 REFERENCES

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

TLDR
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Intellix -- End-User Trained Information Extraction for Document Archiving

TLDR
This work presents an approach for information extraction which purely builds on end-user provided training examples and intentionally omits efficient known extraction techniques like rule based extraction that require intense training and/or information extraction expertise.

Rethinking Atrous Convolution for Semantic Image Segmentation

TLDR
The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

TLDR
This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks

TLDR
A recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system are described.

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

TLDR
This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF).

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Automatic indexing of scanned documents: a layout-based approach

TLDR
This work presents a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms based on document templates stored in a common full text search index to find index positions that were successfully extracted in the past.

Deep High-Resolution Representation Learning for Human Pose Estimation

TLDR
This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

smartFIX: A Requirements-Driven System for Document Analysis and Understanding

TLDR
The system smartFIX which is a document analysis and understanding system developed by the DFKI spin-off INSIDERS permits the processing of documents ranging from fixed format forms to unstructured letters of any format.