Towards Lexical Chains for Knowledge-Graph-based Word Embeddings

  title={Towards Lexical Chains for Knowledge-Graph-based Word Embeddings},
  author={Kiril Ivanov Simov and Svetla Boytcheva and Petya N. Osenova},
Word vectors with varying dimensionalities and produced by different algorithms have been extensively used in NLP. The corpora that the algorithms are trained on can contain either natural language text (e.g. Wikipedia or newswire articles) or artificially-generated pseudo corpora due to natural data sparseness. We exploit Lexical Chain based templates over Knowledge Graph for generating pseudo-corpora with controlled linguistic value. These corpora are then used for learning word embeddings. A… 

Figures and Tables from this paper

Synthetic, yet natural: Properties of WordNet random walk corpora and the impact of rare words on embedding performance
It is found that the distributions in the psuedo-corpora exhibit properties found in natural corpora, such as Zipf’s and Heaps’ law, and also observe that the proportion of rare words in a pseudo-corpus affects the performance of its embeddings on word similarity.
English WordNet Random Walk Pseudo-Corpora
It is found that different combinations of parameters result in varying statistical properties of the generated pseudo-corpora, which can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.
English WordNet Taxonomic Random Walk Pseudo-Corpora
It is shown that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora, which can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.
Semantic Feature Extraction Using Multi-Sense Embeddings and Lexical Chains
The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words individually. Natural language understanding has seen an
Trends in Media Coverage and Information Diffusion Over Time: The Case of the American Earth Systems Research Centre Biosphere 2
This study examined research centre Biosphere 2 (B2) coverage by US newspapers between 1984 (as stories of conception before construction emerged) and 2019 (at the time this research was conducted)
Text Representations and Word Embeddings
  • R. Egger
  • Applied Data Science in Tourism
  • 2022


Comparison of Word Embeddings from Different Knowledge Graphs
The results from the performed experiments show that the addition of more relations generally improves performance along both dimensions – similarity and relatedness.
Random Walks and Neural Network Language Models on Knowledge Bases
A novel algorithm is presented which encodes the structure of a knowledge base in a continuous vector space, combining random walks and neural net language models in order to produce novel word representations, improving the state of the art in the similarity dataset.
Random Walks for Knowledge-Based Word Sense Disambiguation
This article presents a WSD algorithm based on random walks over large Lexical Knowledge Bases (LKB) that performs better than other graph-based methods when run on a graph built from WordNet and eXtended WordNet.
Using Context Information for Knowledge-Based Word Sense Disambiguation
This paper presents a strategy for the enrichment of WSD knowledge bases with data-driven relations from a gold standard corpus (annotated with word senses, syntactic analyses, etc.), focusing on English as use case, but the approach is scalable to other languages.
The Role of the WordNet Relations in the Knowledge-based Word Sense Disambiguation Task
An analysis of different semantic relations extracted from WordNet, Extended WordNet and SemCor, with respect to their role in the task of knowledge-based word sense disambiguation shows that different sets of relations have different impact on the results: positive or negative.
Single or Multiple? Combining Word Representations Independently Learned from Text and WordNet
This paper learns word representations from text and WordNet independently, and then explores simple and sophisticated methods to combine them, showing that, in the case of WordNet, learning word representations separately is preferable to learning one single representation space or adding WordNet information directly.
PageRank on Semantic Networks, with Application to Word Sense Disambiguation
A new open text word sense disambiguation method that combines the use of logical inferences with PageRank-style algorithms applied on graphs extracted from natural language documents is presented.
Efficient Estimation of Word Representations in Vector Space
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec is presented, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs, and shows that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.
WordNet : an electronic lexical database
The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.