• Corpus ID: 629094

Word Representations: A Simple and General Method for Semi-Supervised Learning

  title={Word Representations: A Simple and General Method for Semi-Supervised Learning},
  author={Joseph P. Turian and Lev-Arie Ratinov and Yoshua Bengio},
If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining… 

Figures and Tables from this paper

Improving Word Representations via Global Context and Multiple Word Prototypes
A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.
Substitute Based SCODE Word Embeddings in Supervised NLP Tasks
The proposed word embedding method achieves state-of-the-art results in multilingual dependency parsing and compared word embeddings, including more recent representations, in Named Entity Recognition, Chunking, and Dependency Parsing.
Polyglot: Distributed Word Representations for Multilingual NLP
This work quantitatively demonstrates the utility of word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages and investigates the semantic features captured through the proximity of word groupings.
Tailoring Continuous Word Representations for Dependency Parsing
It is found that all embeddings yield significant parsing gains, including some recent ones that can be trained in a fraction of the time of others, suggesting their complementarity.
Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing
We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our
Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings
Delta-training, a novel and simple method for semi-supervised text classification, outperforms the self-training and the co-training framework in 4 different text classification datasets, showing robustness against error accumulation.
Multi-View Learning of Word Embeddings via CCA
Low Rank Multi-View Learning (LR-MVL) is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves state-of-the-art performance on named entity recognition (NER) and chunking problems.
Re-embedding words
This work proposes a method that takes as input an existing embedding, some labeled data, and produces an embedding in the same space, but with a better predictive performance in the supervised task.
Learning Word Embeddings for Aspect-Based Sentiment Analysis
This paper proposes a new model using a combination of unsupervised and supervised techniques to capture the three kinds of information, including the general semantic distributed representation, and the aspect category and aspect sentiment from labeled and unlabeled data.
Lexicon Infused Phrase Embeddings for Named Entity Resolution
A new form of learning word embeddings that can leverage information from relevant lexicons to improve the representations, and the first system to use neural word embedDings to achieve state-of-the-art results on named-entity recognition in both CoNLL and Ontonotes NER are presented.


A preliminary evaluation of word representations for named-entity recognition
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009)embeddings of words for named-entity recognition with a linear model, finding that all three representations improve accuracy on NER.
Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data
Evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition is provided.
Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling
It is demonstrated that distributional representations of word types, trained on unannotated text, can be used to improve performance on rare words and reduces the sample complexity of sequence labeling.
Semi-supervised Semantic Role Labeling Using the Latent Words Language Model
The Latent Words Language Model is presented, which is a language model that learns word similarities from unlabeled texts that uses these similarities for different semi-supervised SRL methods as additional features or to automatically expand a small training set.
Simple Semi-supervised Dependency Parsing
This work focuses on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus, and shows that the cluster-based features yield substantial gains in performance across a wide range of conditions.
A High-Performance Semi-Supervised Learning Method for Text Chunking
A novel semi-supervised method that employs a learning paradigm which is to find "what good classifiers are like" by learning from thousands of automatically generated auxiliary classification problems on unlabeled data, which produces performance higher than the previous best results.
Semi-Supervised Sequence Modeling with Syntactic Topic Models
This paper presents an approach that leverages recent work in manifold-learning on sequences to discover word clusters from language data, including both syntactic classes and semantic topics, with statistically-significant improvements over a related semi-supervised sequence tagging method.
A unified architecture for natural language processing: deep neural networks with multitask learning
We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic
Semi-Supervised Learning for Natural Language
This thesis focuses on two segmentation tasks, named-entity recognition and Chinese word segmentation, and shows that features derived from unlabeled data substantially improves performance, both in terms of reducing the amount of labeled data needed to achieve a certain performance level and in termsof reducing the error using a fixed amount of labeling data.
Improving generative statistical parsing with semi-supervised word clustering
A semi-supervised method to improve statistical parsing performance and a combination of lexicon-aided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large unannotated corpus are presented.