Natural Language Processing (Almost) from Scratch

  title={Natural Language Processing (Almost) from Scratch},
  author={Ronan Collobert and Jason Weston and L{\'e}on Bottou and Michael Karlen and Koray Kavukcuoglu and Pavel P. Kuksa},
  journal={J. Mach. Learn. Res.},
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis… 

Figures and Tables from this paper

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.

Semi-supervised sequence tagging with bidirectional language models

A general semi-supervised approach for adding pretrained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.

Bidirectional Recursive Neural Networks for Token-Level Labeling with Structure

This work proposes a novel architecture that aims to capture the structural information around an input, and use it to label instances, and applies it to the task of opinion expression extraction.

Training neural word embeddings for transfer learning and translation

This dissertation hypothesises that neural word embeddings, i.e. representations that use continuous values to represent words in a learned vector space of meaning, are a suitable and efficient approach for learning representations of natural languages that are useful for predicting various aspects related to their meaning, and presents several contributions which make inducing word representations faster and applicable for monolingual and various cross-lingual prediction tasks.

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

A sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations, which obtains (near) state-of-the art performance for both part- of-speech tagging and named entity recognition for a variety of languages.

Phrase Representations for Multiword Expressions

A model that takes advantage of dense word representations to perform phrase tagging by directly identifying and classifying phrases is introduced and it is shown that the model outperforms the state of the art model for this task.

Neural Networks Architecture for Amazigh POS Tagging

Instead of extracting from the sentence a rich set of hand-crafted features which are the fed to a standard classification algorithm, this work drew its inspiration from recent papers about the automatic extraction of word embeddings from large unlabelled data sets to improve the Amazigh POS Tagging system performances.

Semi-supervised Multitask Learning for Sequence Labeling

A sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset, which incentivises the system to learn general-purpose patterns of semantic and syntactic composition, useful for improving accuracy on different sequence labeling tasks.

A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding

This work proposes to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition, requiring no task specific knowledge or sophisticated feature engineering.



Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons

This work has shown that conditionally-trained models, such as conditional maximum entropy models, handle inter-dependent features of greedy sequence modeling in NLP well.

Semi-Supervised Learning for Natural Language

This thesis focuses on two segmentation tasks, named-entity recognition and Chinese word segmentation, and shows that features derived from unlabeled data substantially improves performance, both in terms of reducing the amount of labeled data needed to achieve a certain performance level and in termsof reducing the error using a fixed amount of labeling data.

Deep Learning for Efficient Discriminative Parsing

We propose a new fast purely discriminative algorithm for natural language parsing, based on a “deep” recurrent convolutional graph transformer network (GTN). Assuming a decomposition of a parse tree

Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data

Evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition is provided.

Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

It is demonstrated that distributional representations of word types, trained on unannotated text, can be used to improve performance on rare words and reduces the sample complexity of sequence labeling.

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network

A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.

Word Representations: A Simple and General Method for Semi-Supervised Learning

This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.

Joint Parsing and Semantic Role Labeling

This paper jointly performs parsing and semantic role labeling, using a probabilistic SRL system to rerank the results of a ProbabilisticParser, because a locally-trained SRL model can return inaccurate probability estimates.

Shallow Semantic Parsing using Support Vector Machines

A machine learning algorithm for shallow semantic parsing based on Support Vector Machines which shows performance improvements through a number of new features and their ability to generalize to a new test set drawn from the AQUAINT corpus.

Simple Semi-supervised Dependency Parsing

This work focuses on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus, and shows that the cluster-based features yield substantial gains in performance across a wide range of conditions.