Improving Word Embeddings for Low Frequency Words by Pseudo Contexts

  title={Improving Word Embeddings for Low Frequency Words by Pseudo Contexts},
  author={Fang Li and Xiaojie Wang},
This paper investigates relations between word semantic density and word frequency. A distributed representations based word average similarity is defined as the measure of word semantic density. We find that the average similarities of low frequency words are always bigger than that of high frequency words, when the frequency approaches to 400 around, the average similarity tends to stable. The finding keeps correct with changes of the size of training corpus, dimension of distributed… Expand
1 Citations
The expansion of isms, 1820-1917: Data-driven analysis of political language in digitized newspaper collections
This paper studies isms in a historical record of digitized newspapers from 1820 to 1917 published in Finland to find out how the language of isms developed historically and shows how they became more common and entered more and more domains. Expand


Joint Learning of Character and Word Embeddings
A character-enhanced word embedding model (CWE) is presented to address the issues of character ambiguity and non-compositional words, and the effectiveness of CWE on word relatedness computation and analogical reasoning is evaluated. Expand
Enriching Word Vectors with Subword Information
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks. Expand
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
It is suggested that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods. Expand
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Efficient Estimation of Word Representations in Vector Space
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities. Expand
Neural Word Embedding as Implicit Matrix Factorization
It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization. Expand
Word Representations: A Simple and General Method for Semi-Supervised Learning
This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines. Expand
Better Word Representations with Recursive Neural Networks for Morphology
This paper combines recursive neural networks, where each morpheme is a basic unit, with neural language models to consider contextual information in learning morphologicallyaware word representations and proposes a novel model capable of building representations for morphologically complex words from their morphemes. Expand
Linguistic Regularities in Continuous Space Word Representations
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. Expand
Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement
This task provides a benchmark dataset of Chinese word similarity (PKU-500 dataset), including 500 word pairs with their similarity scores, and describes clearly the data preparation and word similarity annotation. Expand