A unified architecture for natural language processing: deep neural networks with multitask learning

@inproceedings{Collobert2008AUA,
  title={A unified architecture for natural language processing: deep neural networks with multitask learning},
  author={Ronan Collobert and Jason Weston},
  booktitle={ICML '08},
  year={2008}
}
We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language… Expand
A Convolutional Neural Network for Modelling Sentences
TLDR
A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations. Expand
A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding
TLDR
This work proposes to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition, requiring no task specific knowledge or sophisticated feature engineering. Expand
Recurrent Neural Network for Text Classification with MultiTask Learning
Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervisedExpand
Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding
TLDR
The proposed multi-task model delivers better performance with less data by leveraging patterns that it learns from the other tasks, and supports an open vocabulary, which allows the models to generalize to unseen words. Expand
DRWS: A Model for Learning Distributed Representations for Words and Sentences
TLDR
A new model called DRWS is introduced which can learn distributed representations for words and variable-length sentences based on their probability of co-occurrence between words and sentences using a neural network. Expand
ONENET: Joint domain, intent, slot prediction for spoken language understanding
TLDR
This work presents a unified neural network that jointly performs domain, intent, and slot predictions in spoken language understanding systems and adopts a principled architecture for multitask learning to fold in the state-of-the-art models for each task. Expand
Deep Multitask Learning for Semantic Dependency Parsing
We present a deep neural architecture that parses sentences into three semantic dependency graph formalisms. By using efficient, nearly arc-factored inference and a bidirectional-LSTM composed with aExpand
Recurrent Neural Network for Text Classification with Multi-Task Learning
TLDR
This paper uses the multi-task learning framework to jointly learn across multiple related tasks based on recurrent neural network to propose three different mechanisms of sharing information to model text with task-specific and shared layers. Expand
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
TLDR
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Expand
Weakly supervised neural networks for Part-Of-Speech tagging
  • S. Chopra, S. Bangalore
  • Computer Science
  • 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
We introduce a simple and novel method for the weakly supervised problem of Part-Of-Speech tagging with a dictionary. Our method involves training a connectionist network that simultaneously learns aExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 27 REFERENCES
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. Expand
Composition of Conditional Random Fields for Transfer Learning
TLDR
Joint decoding of separately-trained sequence models is performed, preserving uncertainty between the tasks and allowing information from the new task to affect predictions on the old task. Expand
Connectionist language modeling for large vocabulary continuous speech recognition
  • Holger Schwenk, J. Gauvain
  • Computer Science
  • 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2002
TLDR
The connectionist language model is being evaluated on the DARPA HUB5 conversational telephone speech recognition task and preliminary results show consistent improvements in both perplexity and word error rate. Expand
A discriminative language model with pseudo-negative samples
TLDR
Experimental results show that pseudo-negative examples can be treated as real negative examples and the proposed discriminative language model can classify these sentences correctly. Expand
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
TLDR
This paper presents a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data, and algorithms for structural learning will be proposed, and computational issues will be investigated. Expand
Transductive learning for statistical machine translation
TLDR
This paper explores the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality and proposes several algorithms with this aim. Expand
Multitask Learning
  • R. Caruana
  • Computer Science
  • Encyclopedia of Machine Learning and Data Mining
  • 1998
TLDR
Suggestions for how to get the most out of multitask learning in artificial neural nets are presented, an algorithm forMultitask learning with case-based methods like k-nearest neighbor and kernel regression is presented, and an algorithms for multitasklearning in decision trees are sketched. Expand
Joint Parsing and Semantic Role Labeling
TLDR
This paper jointly performs parsing and semantic role labeling, using a probabilistic SRL system to rerank the results of a ProbabilisticParser, because a locally-trained SRL model can return inaccurate probability estimates. Expand
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data
TLDR
On a natural-language chunking task, it is shown that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. Expand
The Proposition Bank: An Annotated Corpus of Semantic Roles
TLDR
An automatic system for semantic role tagging trained on the corpus is described and the effect on its performance of various types of information is discussed, including a comparison of full syntactic parsing with a flat representation and the contribution of the empty trace categories of the treebank. Expand
...
1
2
3
...