Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents

  title={Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents},
  author={Nguyen Hong Son and Hieu M. Vu and Tuan-Anh Dang Nguyen and Minh Le Nguyen},
  journal={2022 International Joint Conference on Neural Networks (IJCNN)},
This paper introduces a new information extraction model for business documents. Different from prior studies which only base on span extraction or sequence labeling, the model takes into account advantage of both span extraction and sequence labeling. The combination allows the model to deal with long documents with sparse information (the small amount of extracted information). The model is trained end-to-end to jointly optimize the two tasks in a unified manner. Experimental results on four… 

Figures and Tables from this paper

Improving Document Image Understanding with Reinforcement Finetuning

This paper proposes a novel finetuning method that treats the Information Extraction model as a policy network and uses policy gradient training to update the model to maximize combined reward functions that complement the traditional cross-entropy losses.



Contextual String Embeddings for Sequence Labeling

This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text.

A Unified MRC Framework for Named Entity Recognition

This paper proposes to formulate the task of NER as a machine reading comprehension (MRC) task, and naturally tackles the entity overlapping issue in nested NER: the extraction of two overlapping entities with different categories requires answering two independent questions.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity

Information Extraction of Domain-Specific Business Documents with Limited Data

A model, which employs pre-trained language models with a customized CNN layer for domain adaptation, is introduced, which achieves promising results which are applicable for actual business scenarios.

A Span Extraction Approach for Information Extraction on Visually-Rich Documents

A new query-based IE model that employs span extraction instead of using the common sequence labeling approach is introduced and a new training task focusing on modelling the relationships among semantic entities within a document is proposed.

A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition

Experimental results on multiple benchmark datasets show that the proposed span-based model can be regarded as a relation extraction paradigm essentially and is highly competitive for overlapped and discontinuous NER.

Transformers-based information extraction with limited data for domain-specific business documents

Automated Concatenation of Embeddings for Structured Prediction

This paper proposes Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks, based on a formulation inspired by recent progress on neural architecture search.

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents

The adaption of BERT to two types of business documents: regulatory filings and property lease agreements is demonstrated and it is found that modest amounts of annotated data (less than 100 documents) are sufficient to achieve reasonable accuracy.