Emir Muñoz

Learn More
We are currently investigating methods to triplify the content of Wikipedia’s tables. We propose that existing knowledge-bases can be leveraged to semi-automatically extract high-quality facts (in the form of RDF triples) from tables embedded in Wikipedia articles (henceforth called “Wikitables”). We present a survey of Wikitables and their content in a(More)
Keys are fundamental for database management, independently of the particular data model used. In particular, several notions of XML keys have been proposed over the last decade, and their expressiveness and computational properties have been analyzed in theory. In practice, however, expressive notions of XML keys with good reasoning capabilities have been(More)
The mechanisms that regulate the expression of genes encoding extracellular matrix proteins in fibroblasts and other mesenchymal cells have remained elusive. Studies from several laboratories have indicated that Tax, a trans-regulatory protein from the human T cell leukemia virus type I not only augments viral gene expression but also triggers the(More)
We present LODPeas: a system for browsing entities that are found to share many things in common in an RDF dataset. The system first offers standard keyword search to locate a focus entity. Once a focus entity has been found, other entities that share a lot in common with it are displayed in a graph-based visualisation. The degree to which two entities have(More)
In this paper, we describe our contribution to the 2015 Linked Data Mining Challenge. The proposed task is concerned with the prediction of review of movies as “good” or “bad”, as does Metacritic website based on critics’ reviews. First we describe the sources used to build the training data. Although, several sources provide data about movies on the Web in(More)
Tables are widely used in Wikipedia articles to display relational information – they are inherently concise and information rich. However, aside from info-boxe s, there are no automatic methods to exploit the integrated content of these tables. We thus present DRETa: a tool that uses DBpedia as a reference knowledge-base to extract RDF triples from generic(More)
Linked Data (LD) datasets (e.g., DBpedia, Freebase) are used in many knowledge extraction tasks due to the high variety of domains they cover. Unfortunately, many of these datasets do not provide a description for their properties and classes, reducing the users’ freedom to understand, reuse or enrich them. This work attempts to fill part of this lack by(More)
This paper describes μRaptor, a DOM-based method to extract hCard microformats from HTML pages stripped of microformat markup. μRaptor extracts DOM sub-trees, converts them into rules, and uses them to extract hCard microformats. Besides, we use co-occurring CSS classes to improve the overall precision. Results on train data show 0.96 precision and 0.83 F1(More)