Learn More
We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of(More)
There has been a recent, tangible growth in RDF published on the Web in accordance with the Linked Data principles and best practices, the result of which has been dubbed the “Web of Data”. Linked Data guidelines are designed to facilitate ad hoc re-use and integration of conformant structured data—across the Web—by consumer applications; however, thus far,(More)
In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data – loosely also(More)
Over a decade after RDF has been published as a W3C recommendation, publishing open and machine-readable content on the Web has recently received a lot more attention, including from corporate and governmental bodies; notably thanks to the Linked Open Data community, there now exists a rich vein of heterogeneous RDF data published on the Web (the so-called(More)
In this paper, we present the design and first results of the Dynamic Linked Data Observatory : a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs to monitor, retrieving the documents, and further crawling part of(More)
Hundreds of public SPARQL endpoints have been deployed on the Web, forming a novel decentralised infrastructure for querying billions of structured facts from a variety of sources on a plethora of topics. But is this infrastructure mature enough to support applications? For 427 public SPARQL endpoints registered on the DataHub, we conduct various(More)
In this article the authors discuss the challenges of performing reasoning on large scale RDF datasets from the Web. Using ter-Horst’s pD* fragment of OWL as a base, the authors compose a rulebased framework for application to web data: they argue their decisions using observations of undesirable examples taken directly from the Web. The authors further(More)
With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach, which uses explicit owl:sameAs relations to perform(More)
An important aspect of Semantic Web technologies is the issue of identity and uniquely identifying resources, which is essential for integrating data across sources. Currently, there is poor agreement on the use of common URIs for the same instances across sources and as a result a naively integrated dataset might miss associations between resources. To(More)