A unified architecture for natural language processing: deep neural networks with multitask learning
@inproceedings{Collobert2008AUA, title={A unified architecture for natural language processing: deep neural networks with multitask learning}, author={Ronan Collobert and Jason Weston}, booktitle={ICML '08}, year={2008} }
We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language…
5,046 Citations
A Convolutional Neural Network for Modelling Sentences
- Computer ScienceACL
- 2014
A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations.
A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding
- Computer ScienceArXiv
- 2015
This work proposes to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition, requiring no task specific knowledge or sophisticated feature engineering.
Recurrent Neural Network for Text Classification with MultiTask Learning
- Computer Science
- 2016
This paper uses the multitask learning framework to jointly learn across multiple related tasks based on recurrent neural network to propose three different mechanisms of sharing information to model text with task-specific and shared layers.
Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding
- Computer ScienceINTERSPEECH
- 2016
The proposed multi-task model delivers better performance with less data by leveraging patterns that it learns from the other tasks, and supports an open vocabulary, which allows the models to generalize to unseen words.
Deep Multitask Learning for Semantic Dependency Parsing
- Computer ScienceACL
- 2017
We present a deep neural architecture that parses sentences into three semantic dependency graph formalisms. By using efficient, nearly arc-factored inference and a bidirectional-LSTM composed with a…
DRWS: A Model for Learning Distributed Representations for Words and Sentences
- Computer SciencePRICAI
- 2014
A new model called DRWS is introduced which can learn distributed representations for words and variable-length sentences based on their probability of co-occurrence between words and sentences using a neural network.
ONENET: Joint domain, intent, slot prediction for spoken language understanding
- Computer Science2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2017
This work presents a unified neural network that jointly performs domain, intent, and slot predictions in spoken language understanding systems and adopts a principled architecture for multitask learning to fold in the state-of-the-art models for each task.
Recurrent Neural Network for Text Classification with Multi-Task Learning
- Computer ScienceIJCAI
- 2016
This paper uses the multi-task learning framework to jointly learn across multiple related tasks based on recurrent neural network to propose three different mechanisms of sharing information to model text with task-specific and shared layers.
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
- Computer ScienceEMNLP
- 2017
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
- Computer ScienceEAMT
- 2020
It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms.
References
SHOWING 1-10 OF 27 REFERENCES
Fast Semantic Extraction Using a Novel Neural Network Architecture
- Computer ScienceACL
- 2007
A novel neural network architecture for the problem of semantic role labeling that learns a direct mapping from source sentence to semantic tags for a given predicate without the aid of a parser or a chunker.
A Neural Probabilistic Language Model
- Computer ScienceJ. Mach. Learn. Res.
- 2000
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
Composition of Conditional Random Fields for Transfer Learning
- Computer ScienceHLT
- 2005
Joint decoding of separately-trained sequence models is performed, preserving uncertainty between the tasks and allowing information from the new task to affect predictions on the old task.
Connectionist language modeling for large vocabulary continuous speech recognition
- Computer Science2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
- 2002
The connectionist language model is being evaluated on the DARPA HUB5 conversational telephone speech recognition task and preliminary results show consistent improvements in both perplexity and word error rate.
A discriminative language model with pseudo-negative samples
- Computer ScienceACL
- 2007
Experimental results show that pseudo-negative examples can be treated as real negative examples and the proposed discriminative language model can classify these sentences correctly.
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
- Computer ScienceJ. Mach. Learn. Res.
- 2005
This paper presents a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data, and algorithms for structural learning will be proposed, and computational issues will be investigated.
Transductive learning for statistical machine translation
- Computer ScienceACL
- 2007
This paper explores the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality and proposes several algorithms with this aim.
Multitask Learning
- Computer ScienceEncyclopedia of Machine Learning and Data Mining
- 1998
Prior work on MTL is reviewed, new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals is presented, and new results for MTL with k-nearest neighbor and kernel regression are presented.
Joint Parsing and Semantic Role Labeling
- Computer ScienceCoNLL
- 2005
This paper jointly performs parsing and semantic role labeling, using a probabilistic SRL system to rerank the results of a ProbabilisticParser, because a locally-trained SRL model can return inaccurate probability estimates.
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data
- Computer ScienceICML
- 2004
On a natural-language chunking task, it is shown that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.