Semi-supervised sequence tagging with bidirectional language models

  title={Semi-supervised sequence tagging with bidirectional language models},
  author={Matthew E. Peters and Waleed Ammar and Chandra Bhagavatula and Russell Power},
Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. [] Key Result We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.

Figures and Tables from this paper

Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks

This work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge.


This work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge.

A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

A comprehensive comparison of state-of-the-art multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging and proposes a new method for creating multilingual contextualized word embeddings that allows for better knowledge sharing across languages in a joint training setting.

Improved Dependency Parsing using Implicit Word Connections Learned from Unlabeled Data

Experiments show that these implicit word connections do improve the parsing model and by combining with a pre-trained language model, the model gets state-of-the-art performance on the English PTB dataset.

Deep contextualized word embeddings from character language models for neural sequence labeling

This thesis evaluates the performance of different embedding setups (context-sensitive, context-insensitive word, as well as task-specific word, character, lemma, and PoS) on the three abovementioned sequence labeling tasks using a deep learning model (BiLSTM) and Portuguese datasets.

Named Entity Recognition in Russian with Word Representation Learned by a Bidirectional Language Model

This paper presents a semi-supervised approach for adding deep contextualized word representation that models both complex characteristics of word usage and how these usages vary across linguistic contexts, and evaluates the model on FactRuEval-2016 dataset for named entity recognition in Russian and achieves state of the art results.

Exploiting Redundancy in Pre-trained Language Models for Efficient Transfer Learning

A novel feature selection method is proposed, which takes advantage of contextual representations have both intrinsic and task-specific redundancies to reduce the size of the pre-trained features.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Empower Sequence Labeling with Task-Aware Neural Language Model

A novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task by leveraging character-level knowledge from self-contained order information of training sequences is developed.



context2vec: Learning Generic Context Embedding with Bidirectional LSTM

This work presents a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM, and suggests they could be useful in a wide variety of NLP tasks.

Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data

Evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition is provided.

Semi-Supervised Sequence Modeling with Syntactic Topic Models

This paper presents an approach that leverages recent work in manifold-learning on sequences to discover word clusters from language data, including both syntactic classes and semantic topics, with statistically-significant improvements over a related semi-supervised sequence tagging method.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

A novel neutral network architecture is introduced that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF, thus making it applicable to a wide range of sequence labeling tasks.

Learning Distributed Representations of Sentences from Unlabelled Data

A systematic comparison of models that learn distributed phrase or sentence representations from unlabelled data finds that the optimal approach depends critically on the intended application.

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

A joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks and uses a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks.

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

The effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages are examined, and it is shown that significant improvement can often be obtained.

A Bidirectional Recurrent Neural Language Model for Machine Translation

Results show that the neural model trained with the selected data outperforms the results obtained by an n-gram language model.