A Novel Approach for Publishing Linked Open Geodata from National Registries with the Use of Semantically Annotated Context Dependent Web Pages
The Geosciences and Geography are not just yet another application area for semantic technologies. The vast heterogeneity of the involved disciplines ranging from the natural sciences to the social sciences introduces new challenges in terms of interoperability. Moreover, the inherent spatial and temporal information components also require distinct semantic approaches. For these reasons, geospatial semantics, geo-ontologies, and semantic interoperability have been active research areas over the last 20 years. The geospatial semantics community has been among the early adopters of the Semantic Web, contributing methods, ontologies, use cases, and datasets. Today, geographic information is a crucial part of many central hubs on the Linked Data Web. In this editorial, we outline the research field of geospatial semantics, highlight major research directions and trends, and glance at future challenges. We hope that this text will be valuable for geoscientists interested in semantics research as well as knowledge engineers interested in spatiotemporal data. Introduction and motivation While the Web has changed with the advent of the Social Web from mostly authoritative content towards increasing amounts of user generated information, it is essentially still about linked documents. These documents provide structure and context for the described data and easy their interpretation. In contrast, the evolving Data Web is about linking data, not documents. Such datasets are not bound to a specific document but can be easily combined and used outside of their original creation context. With a growth rate of millions of new facts encoded as RDF-triples per month, the Linked Data cloud allows users to answer complex queries spanning multiple, heterogeneous data sources from different scientific domains. However, this uncoupling of data from its creation context makes the interpretation of data challenging. Thus, research on semantic interoperability and ontologies is crucial to ensure consistency and meaningful results. Space and time are fundamental ordering principles to structure such data and provide an implicit context for their interpretation. Hence, it is not surprising that many linked datasets either contain spatiotemporal identifiers themselves or link out to such datasets, making them central hubs of the Linked Data cloud. Prominent examples include Geonames.org as well as the Linked Geo Data project, which provides a RDF serialization of Points Of Interest from Open Street Map . Besides such Voluntary Geographic Information (VGI), governments and governmental agencies recently started to develop geo-ontologies and publish their data as Linked Spatiotemporal Data . Examples include the US Geological Survey  and 1570-0844/12/$27.50 c © 2012 – IOS Press and the authors. All rights reserved 322 K. Janowicz et al. / Geospatial semantics and linked spatiotemporal data – Past, present, and future the UK Ordnance Survey . Furthermore, myriad other Linked Data sources contain location-based references. For instance, a dataset from the digital humanities may link information about exhibits to places and their historic names . Following outgoing links, scholars can explore these places and learn about events which took place there. This historic events dataset may in turn link to information about physical objects and actors that were involved in these events. To query data across different data sources requires information about the intended meaning of the used terms. In the example above, the datasets may use the CIDOC conceptual reference model  as a common top-level ontology that defines terms such as event or participatesIn. On the domain level, researchers have proposed ontologies, e.g., for Geology , that enrich top-level ontologies such as DOLCE with domain specific facts. However for highly heterogeneous domains and interdisciplinary research, dealing with geospatial data as well as establishing and maintaining such toplevel and domain-level ontologies may turn out to be difficult or even impossible. Therefore, a major challenge of semantic research in the context of Linked Data lies in exploiting semantic heterogeneity, instead of resolving it . Datasets and ontologies are just two components of the Geospatial Semantic Web . The formal semantics defined for knowledge representation languages such as the Web Ontology Language (OWL) support reasoning services that can make implicit facts explicit, discover incompatibilities, improve retrieval beyond keyword search, and provide the framework for complex integrity constraint checking that reduces the risk of combining incompatible data and models. Finally, all of this would be of little use if not supported by semantics-driven user interfaces and novel interaction paradigms that support the exploration of data, models, and services. In the following we outline the research field of geospatial semantics, sketch its major research directions so far, and highlight future challenges. We hope that this overview will be valuable for geoscientists interested in semantics research as well as knowledge engineers interested in the geosciences. Geospatial semantics in a nutshell Geospatial semantics is a research area combining Geographic Information Science (GIScience), spatial databases, cognitive science, Artificial Intelligence (AI), and the Semantic Web . It addresses the meaning of digital referents at a geographic scale, such as places, locations, events, and geographic objects in digital maps, geodatabases, and earth models. Geospatial semantics uses a variety of methods ranging from top-down knowledge engineering and logical deduction to bottom-up data mining and induction. It integrates knowledge engineering with methods specific to GIScience, such as spatial reference systems and spatial reasoning. It also extends methods that originated in cognitive science such as semantic similarity and analogy reasoning, e.g., to enable semantics-based geographic information retrieval . Often, geospatial semantics combines work on conceptual modeling and geo-ontologies with spatial statistics, e.g., to study land cover . The semantic interpretations of geographic information can differ considerably, which frequently causes misunderstandings when using and combining data and services on the Web. A well studied example are Web services that provide sensor data, e.g., from weather stations. For instance, in order to simulate the spread of a toxic gas plume, two different services may be queried for wind direction measurements. Both services may be syntactically comparable in that they return a string called wind direction as output together with an integer ranging from 0–360◦. Nevertheless, both services can have contradicting semantic interpretations of what the returned values refer to: wind blows to or wind blows from. Thus, sending both values to an evacuation simulation running on a Web Processing Service (WPS) will yield misleading results . Other examples include different and evolving conceptualizations of land cover types in the context of the Kyoto protocol  as well as geographic feature types such as forests or Points Of Interest. Besides challenges arising from integrating heterogeneous data and combining services, data-model intercomparison plays another crucial role . Finally, time and the resulting change is another challenge that has to be taken into account. Most concepts are not static but evolve over time or are even dynamically redefined. For the long term preservation and maintenance of data and ontologies this leads to research challenges such as how to handle semantic aging . One can distinguish two major strands of scientific thought in geospatial semantics, by analogy with Kuhn’s  distinction between modeling vs. encoding on the Semantic Web. One is concerned with the design task of semantic modeling. It addresses the problem how geographic information should be modeled in an information ontology, i.e., which relations K. Janowicz et al. / Geospatial semantics and linked spatiotemporal data – Past, present, and future 323 and classes are useful in order to discover, capture and query the meaning of spatiotemporal and geographic phenomena. Examples include work on geospatial ontology engineering [34,13,60] and the formalization of spatial reasoning . Spatial relations allow querying and localizing complex geometrical objects, such as cities or buildings, relative to other referents, such as countries and roads . It was recognized early that such queries need to deal with indeterminate boundaries of geographic objects . This research strand goes back to a tradition of work on spatial representations and operations in Geographic Information Systems (GIS)  as well as on integrity constraints in spatial databases. Another strand is concerned with the task of semantics-based search, integration, and interoperability of geo-referenced information, as discussed in the examples before. It addresses the problem of how geographic referents can be semantically linked to other kinds of information with related meaning. Due to the vast heterogeneity of geo-data and models spanning fields such as human and cognitive geography, ecology, economics, geology, climatology, oceanography, transportation research, and so forth, integration and sharing of georeferenced information requires methods to ensure semantic interoperability . Additionally, geographic information frequently needs to be represented on different levels of abstraction, scale, and granularity , and can be inherently vague and uncertain . This creates another source of interoperability problems. An important challenge of semantic linking is how geospatial referents, such as events and places, can be automatically discovered in data sources which are not linked or georeferenced. Recent examples for work on querying includes GeoSPARQL as a common query language for the Geospatial Semantic Web as well as triple stores that can effectively handle and index Linked Spatiotemporal Data . Other work along these lines also addressed the role of semantic similarity for spatial scene queries [83,71]. Major research directions In the following, we give a brief introduction into some of the major research topics in geospatial semantics and related areas. Geo-ontology engineering Geographic information deals with a variety of phenomena on a certain range of spatial scales. Even though geographic referents are rooted in diverse domains, they share certain semantic characteristics and principles that can be exploited in common approaches towards designing geo-ontologies. For example, such ontologies should support access to phenomena on flexible resolution levels and scales . They also have to deal with the various natures of spatial boundaries . Examples for top-level geo-ontologies that incorporate the principle of spatial granularity include the work of Bittner et al. . Usually, such foundation ontologies are extended by domain ontologies, e.g., the SWEET ontology for earth and environmental science . However, in recent time, it has become apparent that geographic concepts are situated and contextdependent , that they can be described from different, equally valid, points of view , and that ontological commitments are arbitrary to a large extent . This makes standard comprehensive approaches towards ontology engineering more likely to fail. Semantic engineering, however, may be slightly redefined, namely as a method of communicating possible interpretations of data terms by constraining them towards the intended ones , without prescribing ontological commitments. For example, so-called ontology design pattern have been proposed and implemented as modular, flexible, and reusable building blocks (or strategies) that support engineers and scholars in defining local, purpose-driven ontologies . Another approach is based on grounding vague terms with possibly multiple meanings . Additionally, one can also engineer ontologies in a layered fashion [34,25]. Such a layered approach can start with observation procedures on the bottom level and then provide deductive and inductive methods to arrive at more abstract but reproducible ontological categories . Semantic reference systems In its most basic definition, geographic information contains a spatial, a temporal, and a thematic component . The usefulness of geographic information lies, to large extent, in the availability of reference systems for the precise semantic interpretation of these components. Spatial reference systems provide the formal vocabulary to calculate with precise locations, e.g., in the form of points on a mathematical ellipsoid, as well as with their meaning in terms of technical operations. The latter are given in terms of geodetic datums, i.e., standard directions and positions 324 K. Janowicz et al. / Geospatial semantics and linked spatiotemporal data – Past, present, and future of the ellipsoid, which allow interpreting locations as results of repeatable measurements on the earth surface. Both is required to make sense out of spatial data. Temporal reference systems, such as calendars, similarly handle the representation of time, and allow to translate between different calendars. The thematic (also called attributive) component of geographic information requires reference systems as well . In analogy, Kuhn proposed the generalized notion of Semantic Reference Systems (SRS) , which enable a precise interpretation of all components of geospatial data in terms of measurement scales and observation procedures. For example, attribute values such as the wind directions discussed before can be interpreted in terms of reference systems for cardinal wind directions and anemometers. Establishing such SRS, their standard operations as well as their formal vocabularies, is an ongoing research topic [87,93,68], and has been named among the most important and challenging projects of GIScience . Semantic primitives and information grounding Related to SRS is the problem on which level of abstraction a geospatial dataset can be semantically described in order to convey its meaning and to compare it with other datasets. As discussed above, geospatial ontologies reflect different world views on different levels of abstraction for good reasons. However, in order to compare and link them, one needs a common semantic plane. What are the basic concepts on which the primitive notions in a geospatial ontology should be founded? What are useful cognitive abstractions that can be reused across different ontologies? What are useful semantic backgrounds that enable comparison of different ontologies and conceptualizations with each other? One approach towards geospatial semantic primitives is based on spatial cognitive schemas. For example, Johnson’s  image schemas, such as container, or path, are cross-domain abstractions (i.e., conceptual metaphors) underlying many different kinds of geographic data such as road networks or administrative boundaries. Thus, they can be used for designing core concepts in geospatial ontologies . Lynch’s urban patterns  and Alexander’s design patterns  may be seen from a similar angle. Likewise, Gärdenfors’ notion of cognitive categories as convex regions in a conceptual space  is a cognitive schema that can be exploited for comparison of geospatial concepts [90,2]. Another approach acknowledges that cognitive concepts are themselves abstractions and, thus, in need of grounding in the sense of Harnad . Geographic information concepts may be grounded in terms of embodied perceptual routines, perception-action cycles, and situated simulations . Perception-action cycles underlie Gibson’s meaningful environment and his central notion of affordance . Both can be used to understand geographic media, such as road networks, in terms of the kinds of actions they afford [95,96]. More generally, it is possible to understand the meaning of geographic information in a pragmatic sense , e.g., in terms of repeatable actions taken to generate a dataset , as well as in a teleological sense, i.e., in terms of the underlying purpose . Relevant actions may involve cognitive constructions, which account for abstract notions, as well as perceptual operations, which allow humans to reliably simulate and predicate some phenomenon in jointly observable environmental scenes . Event discovery and spatiotemporal ontologies Geographic information is inherently temporal in the sense that geographic assertions, such as partonomic relations between administrative regions or the membership in organizations, are valid only over a certain period . Consequently, research investigates how this temporal dimensions can be brought into geospatial data. This is especially crucial for the integration of Linked Data on the Web. To give a concrete example, problems arise when administrative regions are linked via OWL:sameAs, and their properties, such as population numbers, are not temporally indexed, e.g., via blank nodes. Over the last years a multitude of work on spatiotemporal modeling, temporal GIS , and simple temporal gazetteer models  has been published. Research also addressed event ontology design patterns . However, a particular challenge remains the automated detection of events from observation data on a geographic scale , such as rainstorms or blizzards . Examples of work on geographic event detection and identification algorithms include the work by Agouris and Stefanidis . Nonetheless, there remain open questions. For example, regarding general formal and computational procedures of geographic event detection, concerning the tight coupling of geospatial ontologies with detected events, as well as the triggering of data and ontology updates by automatically detected events . These challenges K. Janowicz et al. / Geospatial semantics and linked spatiotemporal data – Past, present, and future 325 are reflected, to some extent, in ontological questions about the relationship between processes, objects, and events . Places and trajectories Place is the human way to understand and refer to space, and it goes well beyond geographic coordinates. Locations as simple coordinates are point-like, ubiquitous and precise. In contrast, places are not pointlike and have fuzzy boundaries determined by physical, cultural, and cognitive processes [106,81]. Furthermore, places, such downtown, can change their locations over time, just like physical objects . In consequence, locations only insufficiently capture the identity and meaning of places. So far, research in GIScience and geospatial semantics has been focusing on three major dimensions. Fist, the formalization of place , place data models , and place ontologies , in order to improve geographic information retrieval [57,74,53]. A promising direction of further research are affordance-based approaches towards place . Second, the automated discovery of places, in order to enrich data with georeferences. A traditional direction of research is geoparsing, i.e., the discovery of places in texts by NLP methods, which can be also used to identify placerelated activities . Recently, due to new technologies, research has focused on the discovery of places and user activities by mining (semantic) trajectories , which has also a tradition in ubiquitous computing . Research also investigated how to reconstruct the spatial footprint of places based on geotags in social media, such as Flickr . In the age of Big Data, semantic integration will allow researchers to combine data from heterogeneous sources to gain a more holistic understanding of places by studying locationbased social networks, different types of volunteered geographic information, authoritative data from the ground and via remote sensing, and many other data sources. Third, novel research addresses the design of place-based information systems  in which traditional operations and methods of GIS need to be redesigned to cope with places as referents. Geographic feature type ontologies are a central part of this vision. Sensor and observation semantics Naturally, observations play a key role in the geosciences, and, thus, also the involved sensors. In order to describe the origin and provenance of geodata, well-designed ontologies about sensors, observation, and measurement are necessary . The so-called Semantic Sensor Web  develops ontologies, software, and methods to improve retrieval, access, and integration of observation data as well as sensor metadata. Ontologies, such as the Semantic Sensor Network ontology , provide formal specifications that ease retrieval and integration of data, while semanticsenabled Sensor Observation Services (SOS) provide access and querying capabilities . Work on the Semantic Sensor Web also investigates how to establish and maintain provenance information about sensors, e.g., their survival range, sampling time, used observation procedure, and so forth [113,85]. To reduce manual interaction, sensor Plug & Play investigates how to automatically register sensors and mediate their observation results to fit the needs of specific services . Other examples of recent work include sensor data mashups  and research on stream reasoning . An overview of research challenges for the Semantic Sensor Web was recently published by Corcho and Garcia-Castro . Similarity, alignment, and translation Semantic translation [43,65,29,84], semantic similarity measurement [92,90,71,98,83,53], and geoontology alignment  have been major research topics over the past years. Both are essential for establishing Semantic Reference Systems; while semantic translation maps between vocabularies and can be thought of as the analogy to datum transformation, semantic similarity measures the distance between concepts in a semantic space as an analogy to distance in space and time. Ontology alignment addresses the combination of multiple ontologies to enable data reuse and integration. The fact that most GI analysis, e.g. interpolation, kernel methods, or point pattern analysis, are based on spatial auto-correlation and distance in space, shows why semantic similarity is considered essential for making geo-ontologies and semantics first class citizens of GIS and spatial statistics. Similarity also plays a central role in most of the cognitive approaches introduced before, as these rely on direct mappings between ontologies instead of rigid top-down ontologies. However, as argued by Bittner et al. , these views do not contradict but can benefit from each other. Semantic similarity and analogy reasoning also enable novel types of user interfaces that ease navigating and browsing through geo-data and ontologies . Similarity, however, is highly sensitive 326 K. Janowicz et al. / Geospatial semantics and linked spatiotemporal data – Past, present, and future to context. Consequently, researchers have studied the impact of context and proposed different weights and procedures to account for its effect. A recent example for such work is Keßler’s DIR measure, which identifies the contextual information with the largest impact on a given setting, and, thus, requiring adjustment of similarity measures . Spatial data infrastructures Spatial Data Infrastructures (SDI) provide standardized means for publishing, querying, retrieving, and accessing geodata via Web services. Additionally, SDIs offer notification and processing services and, thus, go beyond simple data stores. Data and processing services can be chained to model complex scientific workflows. To ensure a meaningful chaining, however, requires formal specifications of the service inputs, outputs, side effects, parameters, and so forth. Consequently, semantic markups for Web services have been an active research area for many years [79,31,105]. Examples of SDI specific research include the work of Lemmens et al. , Vaccari et al. , and Lutz . While the Geo Web is typically composed of SDI services and uses its own markup languages and protocols, the Semantic Web is based on the its own technology stack. This leads to a situation were both infrastructures co-exist separately. It is, for instance, not possible to use a Semantic Web reasoner for instance classification of geo-data. Therefore, researchers developed different approaches for a semantic enablement of the Geo-Web. Janowicz et al., for instance, specified transparent and bi-directional proxies that allow users of both infrastructures to share data and services . Semantic annotations have been proposed to lift existing geo-data to a semantic level [62,80]. In the context of the digital humanities, annotations have been used to create Linked Spatiotemporal Data and to enrich old maps with interlinked information from the global graph . Finally, in the context of eScience and scientific workflows, researchers studied the role of semantic technologies and ontologies for the earth sciences [33, 16].