RDF2Vec: RDF Graph Embeddings for Data Mining

@inproceedings{Ristoski2016RDF2VecRG,
  title={RDF2Vec: RDF Graph Embeddings for Data Mining},
  author={Petar Ristoski and Heiko Paulheim},
  booktitle={SEMWEB},
  year={2016}
}
Linked Open Data has been recognized as a valuable source for background information in data mining. [] Key Method We generate sequences by leveraging local information from graph sub-structures, harvested by Weisfeiler-Lehman Subtree RDF Graph Kernels and graph walks, and learn latent numerical representations of entities in RDF graphs. Our evaluation shows that such vector representations outperform existing techniques for the propositionalization of RDF graphs on a variety of different predictive machine…

RDF2Vec: RDF graph embeddings and their applications

TLDR
RDF2Vec is presented, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs that shows that the proposed entity embeddings outperform existing techniques, and that pre-computed feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.

RDF 2 Vec : RDF Graph Embeddings and Their Applications

TLDR
RDF2Vec is presented, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs that shows that the proposed entity embeddings outperform existing techniques, and that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.

Biased graph walks for RDF graph embeddings

TLDR
The RDF2Vec approach is extended, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities, and sequences are generated by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs.

A Biaswalk Based RDF Entity Embeddings

TLDR
A new concept of similar entities which trade-off between the label of outgoing edge and outgoing nodes is provided and a structural similarity that calculates the similarity of two entities in each case of the current sequence is provided.

RDF Graph Embeddings for Content-based Recommender Systems

TLDR
This paper presents an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs used for building content-based recommender system.

Graph Embeddings for Linked Data Clustering

TLDR
An approach that uses neural language models for RDF data clustering that generates sequences of entities extracted from several graph substructures using Doc2vec and Word2vec combined with TF-IDF and applies K-Means to cluster generated vectors.

Literal2Feature: An Automatic Scalable RDF Graph Feature Extractor

TLDR
This paper introduces a generic, distributed, and scalable software framework that is capable of transforming large RDF data into an explainable feature matrix that can be exploited in many standard machine learning algorithms.

Graph Embeddings for Content-based Recommender Systems

TLDR
This paper presents an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs used for building content-based recommender system.

Extracting entity-specific substructures for RDF graph embeddings

TLDR
This work proposes specificity as an accurate measure of identifying most relevant, entity-specific, nodes and edges in Knowledge Graphs and develops a scalable method based on bidirectional random walks to compute specificity.

RDF Data Clustering based on Resource and Predicate Embeddings

TLDR
An unsupervised feature extraction technique named Walks and two language modeling approaches, namely Word2vec and Doc2vec are presented and it is shown that the application of theDoc2vec approach to a set of walks gives better results on two different datasets.
...

References

SHOWING 1-10 OF 35 REFERENCES

Graph Kernels for RDF Data

TLDR
This paper introduces two versatile families of graph kernels specifically suited for RDF, based on intersection graphs and intersection trees, and shows that the novel RDF graph kernels used with (SVMs) achieve competitive predictive performance when compared to specialized techniques for both tasks.

A scalable approach for statistical learning in semantic graphs

TLDR
This paper applies machine learning to semantic graph data and argues that scalability and robustness can be achieved via an urn-based statistical sampling scheme and applies the urn model to the SUNS framework which is based on multivariate prediction.

Substructure counting graph kernels for machine learning from RDF data

Kernel Methods for Mining Instance Data in Ontologies

TLDR
This work investigates how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data through decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned.

Scalable Learning of Entity and Predicate Embeddings for Knowledge Graph Completion

TLDR
This work proposes a principled method for sensibly reducing the learning time, while converging to more accurate link prediction models, and employs the proposed method for training and evaluating a set of novel and scalable models.

A Fast and Simple Graph Kernel for RDF

TLDR
A graph kernel for RDF based on constructing a tree for each instance and counting the number of paths in that tree is studied, which is a factor 10 faster to compute than the previously introduced intersection subtree kernel.

A Review of Relational Machine Learning for Knowledge Graphs

TLDR
This paper provides a review of how statistical models can be “trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph) and how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web.

A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data

TLDR
An approximation of the Weisfeiler-Lehman graph kernel algorithm aimed at improving the computation time of the kernel when applied to Resource Description Framework (RDF) data is introduced and the performance of this kernel is compared to graph kernels designed for RDF described in [1].

Unsupervised generation of data mining features from linked open data

TLDR
The results show that features generated from publicly available information may allow data mining in problems where features are not available at all, as well as help improving the results for tasks where some features are already available.

A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data

TLDR
This paper compares different strategies for creating propositional features from Linked Open Data (a process called propositionalization), and presents experiments on different tasks, i.e., classification, regression, and outlier detection.