• Corpus ID: 37038698

Effective Word Representation for Named Entity Recognition

  title={Effective Word Representation for Named Entity Recognition},
  author={Jun-Ting Hsieh},
Recently, various machine learning models have been built using word-level embeddings and have achieved substantial improvement in NER prediction accuracy. Most NER models only take words as input and ignore character-level information. In this paper, we propose an effective word representation that efficiently includes both the word-level and character-level information by averaging its character n-gram embeddings. Our best performing model uses a bidirectional LSTM with word and character n… 

Figures and Tables from this paper


Lexicon Infused Phrase Embeddings for Named Entity Resolution
A new form of learning word embeddings that can leverage information from relevant lexicons to improve the representations, and the first system to use neural word embedDings to achieve state-of-the-art results on named-entity recognition in both CoNLL and Ontonotes NER are presented.
Boosting Named Entity Recognition with Neural Character Embeddings
This work proposes a language-independent NER system that uses automatically learned features only and demonstrates that the same neural network which has been successfully applied to POS tagging can also achieve state-of-the-art results for language-independet NER, using the same hyperparameters, and without any handcrafted features.
Named Entity Recognition with Bidirectional LSTM-CNNs
A novel neural network architecture is presented that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering.
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Multilingual Language Processing From Bytes
An LSTM-based model that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in the authors' vocabulary is described.
Distributed Representations of Sentences and Documents
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
Distributed Representations of Words and Phrases and their Compositionality
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Character-level Convolutional Networks for Text Classification
This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.
Factored Neural Language Models
A new type of neural probabilistic language model is presented that learns a mapping from both words and explicit word features into a continuous space that is then used for word prediction and significantly reduces perplexity on sparse-data tasks.
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
A joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks and uses a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks.