Emir Muñoz

Learn More
The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data(More)
Keys are fundamental for database management, independently of the particular data model used. In particular, several notions of XML keys have been proposed over the last decade, and their expres-siveness and computational properties have been analyzed in theory. In practice, however, expressive notions of XML keys with good reasoning capabilities have been(More)
In this paper, we describe our contribution to the 2015 Linked Data Mining Challenge. The proposed task is concerned with the prediction of review of movies as " good " or " bad " , as does Meta-critic website based on critics' reviews. First we describe the sources used to build the training data. Although, several sources provide data about movies on the(More)
This paper describes µRaptor, a DOM-based method to extract hCard microformats from HTML pages stripped of microformat markup. µRaptor extracts DOM sub-trees, converts them into rules, and uses them to extract hCard microformats. Besides, we use co-occurring CSS classes to improve the overall precision. Results on train data show 0.96 precision and 0.83 F1(More)
Linked Data (LD) datasets (e.g., DBpedia, Freebase) are used in many knowledge extraction tasks due to the high variety of domains they cover. Unfortunately, many of these datasets do not provide a description for their properties and classes, reducing the users' freedom to understand, reuse or enrich them. This work attempts to fill part of this lack by(More)
Tables are widely used in Wikipedia articles to display re-lational information – they are inherently concise and information rich. However, aside from info-boxe s, there are no automatic methods to exploit the integrated content of these tables. We thus present DRETa: a tool that uses DBpedia as a reference knowledge-base to extract RDF triples from(More)
This paper describes our entry for the Linked Data Mining Challenge 2016, which poses the problem of classifying music albums as 'good' or 'bad' by mining Linked Data. The original labels are assigned according to aggregated critic scores published by the Metacritic website. To this end, the challenge provides datasets that contain the DBpedia reference for(More)
The eXtensible Markup Language (XML) is the de-facto industry standard for exchanging data on the Web and elsewhere. While the relational model of data enjoys a well-accepted definition of a key, several competing notions of keys exist in XML. These have complementary properties and therefore serve different applications domains. In a nutshell, XML keys(More)