How Linked Data can Aid Machine Learning-Based Tasks


The discovery of useful data for a given problem is of primary importance since data scientists usually spend a lot of time for discovering, collecting and preparing data before using them for various reasons, e.g., for applying or testing machine learning algorithms. In this paper we propose a general method for discovering, creating and selecting, in an easy way, valuable features describing a set of entities for leveraging them in a machine learning context. We demonstrate the feasibility of this approach by introducing a tool (research prototype), called LODsyndesisML, which is based on Linked Data technologies, that a) discovers automatically datasets where the entities of interest occur, b) shows to the user a big number of useful features for these entities, and c) creates automatically the selected features by sending SPARQL queries. We evaluate this approach by exploiting data from several sources, including British National Library, for creating datasets in order to predict whether a book or a movie is popular or non-popular. Our evaluation contains a 5-fold cross validation and we introduce comparative results for a number of different features and models. The evaluation showed that the additional features did improve the accuracy of prediction.

DOI: 10.1007/978-3-319-67008-9_13

8 Figures and Tables

Cite this paper

@inproceedings{Mountantonakis2017HowLD, title={How Linked Data can Aid Machine Learning-Based Tasks}, author={Michalis Mountantonakis and Yannis Tzitzikas}, booktitle={TPDL}, year={2017} }