End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

@article{Ma2016EndtoendSL,
  title={End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF},
  author={Xuezhe Ma and Eduard H. Hovy},
  journal={ArXiv},
  year={2016},
  volume={abs/1603.01354}
}
State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of hand-crafted features and data pre-processing. [] Key Result We obtain state-of-the-art performance on both the two data --- 97.55\% accuracy for POS tagging and 91.21\% F1 for NER.

Figures and Tables from this paper

A Two-Stage Deep Neural Network for Sequence Labeling

TLDR
A two-stage deep neural network architecture is proposed for sequence labeling, which enable the higher-layer to make use of the coarse-grained labeling information of the lower-level of the model.

Bidirectional LSTM-CNNs-CRF Models for POS Tagging

TLDR
A discriminative word embedding, character embedding and byte pair encoding (BPE) hybrid neural network architecture to implement a true end-to-end system without feature engineering and data pre-processing for part-of-speech(POS) tagging is presented.

Learning Task-specific Representation for Novel Words in Sequence Labeling

TLDR
This work proposes a novel method to predict representations for OOV words from their surface-forms and contexts and shows that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.

Neural Joint Model for Part-of-Speech Tagging and Entity Extraction

TLDR
A neural joint model based on a bidirectional long-short term memory (BiLSTM) network and adversarial transfer learning to incorporate syntactic information from two tasks by using task-shared information is proposed.

A Survey on Recent Advances in Sequence Labeling from Deep Learning Models

TLDR
This paper presents a comprehensive review of existing deep learning-based sequence labeling models, which consists of three related tasks, e.g., part-of-speech tagging, named entity recognition, and text chunking, and systematically presents the existing approaches base on a scientific taxonomy.

Bidirectional LSTM-CRF for Named Entity Recognition

TLDR
This work is the first to experiment BI-CRF in neural architectures for sequence labeling task and it is shown that CRF can be extended to capture the dependencies between labels in both right and left directions of the sequence.

Learning Context Using Segment-Level LSTM for Neural Sequence Labeling

TLDR
The proposed model enhances the performance of tasks for finding appropriate labels of multiple token segments by employing an additional segment-level long short-term memory (LSTM) that trains features by learning adjacent context in a segment.

Improved Named Entity Recognition for Noisy Call Center Transcripts

TLDR
This work proposes a set of models which utilize state-of-the-art Transformer language models (RoBERTa) to develop a high-accuracy NER system trained on custom annotated set of call center transcripts and proposes a new general annotation scheme for NER in the call-center environment.

Shallow learning for MTL in end-to-end RNN for basic sequence tagging

TLDR
A novel pipeline architecture has been proposed that effectively combines two RNN based sub-networks for sequence tagging tasks with minimal training overhead and truly acts as an end-to-end system.
...

References

SHOWING 1-10 OF 65 REFERENCES

Boosting Named Entity Recognition with Neural Character Embeddings

TLDR
This work proposes a language-independent NER system that uses automatically learned features only and demonstrates that the same neural network which has been successfully applied to POS tagging can also achieve state-of-the-art results for language-independet NER, using the same hyperparameters, and without any handcrafted features.

Named Entity Recognition with Bidirectional LSTM-CNNs

TLDR
A novel neural network architecture is presented that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering.

Learning Character-level Representations for Part-of-Speech Tagging

TLDR
A deep neural network is proposed that learns character-level representation of words and associate them with usual word representations to perform POS tagging and produces state-of-the-art POS taggers for two languages.

Bidirectional LSTM-CRF Models for Sequence Tagging

TLDR
This work is the first to apply a bidirectional LSTM CRF model to NLP benchmark sequence tagging data sets and it is shown that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a biddirectional L STM component.

Lexicon Infused Phrase Embeddings for Named Entity Resolution

TLDR
A new form of learning word embeddings that can leverage information from relevant lexicons to improve the representations, and the first system to use neural word embedDings to achieve state-of-the-art results on named-entity recognition in both CoNLL and Ontonotes NER are presented.

Multi-Task Cross-Lingual Sequence Tagging from Scratch

TLDR
A deep hierarchical recurrent neural network for sequence tagging that employs deep gated recurrent units on both character and word levels to encode morphology and context information, and applies a conditional random field layer to predict the tags.

Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning

TLDR
It is shown that new state-of-the-art word segmentation systems use neural models to learn representations for predicting word boundaries, and these same representations, jointly trained with an NER system, yield significant improvements in NER for Chinese social media.

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network

TLDR
A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.

Non-lexical neural architecture for fine-grained POS Tagging

TLDR
Experimental results show that the convolutional network can infer meaningful word representations, while for the prediction stage, a well designed and structured strategy allows the model to outperform stateof-the-art results, without any feature engineering.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity
...