A unified architecture for natural language processing: deep neural networks with multitask learning

@inproceedings{Collobert2008AUA,
  title={A unified architecture for natural language processing: deep neural networks with multitask learning},
  author={Ronan Collobert and Jason Weston},
  booktitle={ICML '08},
  year={2008}
}
We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language… 

Figures and Tables from this paper

A Convolutional Neural Network for Modelling Sentences
TLDR
A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations.
A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding
TLDR
This work proposes to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition, requiring no task specific knowledge or sophisticated feature engineering.
Recurrent Neural Network for Text Classification with MultiTask Learning
TLDR
This paper uses the multitask learning framework to jointly learn across multiple related tasks based on recurrent neural network to propose three different mechanisms of sharing information to model text with task-specific and shared layers.
Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding
TLDR
The proposed multi-task model delivers better performance with less data by leveraging patterns that it learns from the other tasks, and supports an open vocabulary, which allows the models to generalize to unseen words.
Deep Multitask Learning for Semantic Dependency Parsing
We present a deep neural architecture that parses sentences into three semantic dependency graph formalisms. By using efficient, nearly arc-factored inference and a bidirectional-LSTM composed with a
DRWS: A Model for Learning Distributed Representations for Words and Sentences
TLDR
A new model called DRWS is introduced which can learn distributed representations for words and variable-length sentences based on their probability of co-occurrence between words and sentences using a neural network.
ONENET: Joint domain, intent, slot prediction for spoken language understanding
TLDR
This work presents a unified neural network that jointly performs domain, intent, and slot predictions in spoken language understanding systems and adopts a principled architecture for multitask learning to fold in the state-of-the-art models for each task.
Recurrent Neural Network for Text Classification with Multi-Task Learning
TLDR
This paper uses the multi-task learning framework to jointly learn across multiple related tasks based on recurrent neural network to propose three different mechanisms of sharing information to model text with task-specific and shared layers.
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
TLDR
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
TLDR
It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms.
...
...

References

SHOWING 1-10 OF 27 REFERENCES
Fast Semantic Extraction Using a Novel Neural Network Architecture
TLDR
A novel neural network architecture for the problem of semantic role labeling that learns a direct mapping from source sentence to semantic tags for a given predicate without the aid of a parser or a chunker.
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
Composition of Conditional Random Fields for Transfer Learning
TLDR
Joint decoding of separately-trained sequence models is performed, preserving uncertainty between the tasks and allowing information from the new task to affect predictions on the old task.
Connectionist language modeling for large vocabulary continuous speech recognition
  • Holger Schwenk, J. Gauvain
  • Computer Science
    2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2002
TLDR
The connectionist language model is being evaluated on the DARPA HUB5 conversational telephone speech recognition task and preliminary results show consistent improvements in both perplexity and word error rate.
A discriminative language model with pseudo-negative samples
TLDR
Experimental results show that pseudo-negative examples can be treated as real negative examples and the proposed discriminative language model can classify these sentences correctly.
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
TLDR
This paper presents a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data, and algorithms for structural learning will be proposed, and computational issues will be investigated.
Transductive learning for statistical machine translation
TLDR
This paper explores the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality and proposes several algorithms with this aim.
Multitask Learning
  • R. Caruana
  • Computer Science
    Encyclopedia of Machine Learning and Data Mining
  • 1998
TLDR
Prior work on MTL is reviewed, new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals is presented, and new results for MTL with k-nearest neighbor and kernel regression are presented.
Joint Parsing and Semantic Role Labeling
TLDR
This paper jointly performs parsing and semantic role labeling, using a probabilistic SRL system to rerank the results of a ProbabilisticParser, because a locally-trained SRL model can return inaccurate probability estimates.
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data
TLDR
On a natural-language chunking task, it is shown that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.
...
...