Semi-supervised sequence tagging with bidirectional language models

@inproceedings{Peters2017SemisupervisedST,
  title={Semi-supervised sequence tagging with bidirectional language models},
  author={Matthew E. Peters and Waleed Ammar and Chandra Bhagavatula and Russell Power},
  booktitle={ACL},
  year={2017}
}
Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. [...] Key Result We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.Expand
Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks
TLDR
This work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge. Expand
TRANSFER LEARNING IN NATURAL LANGUAGE PRO-
Pre-trained word embeddings are the primary method for transfer learning in several Natural Language Processing (NLP) tasks. Recent works have focused on using unsupervised techniques such asExpand
A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings
TLDR
A comprehensive comparison of state-of-theart multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging is performed and a new method for creating multilingual contextualized word embeddings is proposed. Expand
Improved Dependency Parsing using Implicit Word Connections Learned from Unlabeled Data
TLDR
Experiments show that these implicit word connections do improve the parsing model and by combining with a pre-trained language model, the model gets state-of-the-art performance on the English PTB dataset. Expand
Deep contextualized word embeddings from character language models for neural sequence labeling
TLDR
This thesis evaluates the performance of different embedding setups (context-sensitive, context-insensitive word, as well as task-specific word, character, lemma, and PoS) on the three abovementioned sequence labeling tasks using a deep learning model (BiLSTM) and Portuguese datasets. Expand
Named Entity Recognition in Russian with Word Representation Learned by a Bidirectional Language Model
TLDR
This paper presents a semi-supervised approach for adding deep contextualized word representation that models both complex characteristics of word usage and how these usages vary across linguistic contexts, and evaluates the model on FactRuEval-2016 dataset for named entity recognition in Russian and achieves state of the art results. Expand
Semi-Supervised Sequence Modeling with Cross-View Training
TLDR
Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data, is proposed and evaluated, achieving state-of-the-art results. Expand
Exploiting Redundancy in Pre-trained Language Models for Efficient Transfer Learning
TLDR
A novel feature selection method is proposed, which takes advantage of contextual representations have both intrinsic and task-specific redundancies to reduce the size of the pre-trained features. Expand
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals. Expand
Empower Sequence Labeling with Task-Aware Neural Language Model
TLDR
A novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task by leveraging character-level knowledge from self-contained order information of training sequences is developed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
TLDR
This work presents a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM, and suggests they could be useful in a wide variety of NLP tasks. Expand
Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data
TLDR
Evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition is provided. Expand
Semi-Supervised Sequence Modeling with Syntactic Topic Models
TLDR
This paper presents an approach that leverages recent work in manifold-learning on sequences to discover word clusters from language data, including both syntactic classes and semantic topics, with statistically-significant improvements over a related semi-supervised sequence tagging method. Expand
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entityExpand
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct theExpand
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
TLDR
A novel neutral network architecture is introduced that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF, thus making it applicable to a wide range of sequence labeling tasks. Expand
Learning Distributed Representations of Sentences from Unlabelled Data
TLDR
A systematic comparison of models that learn distributed phrase or sentence representations from unlabelled data finds that the optimal approach depends critically on the intended application. Expand
A Bidirectional Recurrent Neural Language Model for Machine Translation
TLDR
Results show that the neural model trained with the selected data outperforms the results obtained by an n-gram language model. Expand
Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
TLDR
The effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages are examined, and it is shown that significant improvement can often be obtained. Expand
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
TLDR
A joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks and uses a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Expand
...
1
2
3
4
5
...