The Combined Approach to Ontology-Based Data Access

Abstract

The use of ontologies for accessing data is one of the most exciting new applications of description logics in databases and other information systems. A realistic way of realising sufficiently scalable ontology-based data access in practice is by reduction to querying relational databases. In this paper, we describe the combined approach, which incorporates the information given by the ontology into the data and employs query rewriting to eliminate spurious answers. We illustrate this approach for ontologies given in the DL-Lite family of description logics and briefly discuss the results obtained for the EL family. 1 Ontology-Based Data Access The paradigm of ontology-based data access (OBDA) has recently emerged as an exciting application of knowledge representation and reasoning technologies in information management systems [Dolby et al., 2008; Heymans et al., 2008; Poggi et al., 2008a]. In a nutshell, the underlying idea is to facilitate access to data by separating the user from the raw data sources using an ontology that provides a user-oriented view of the data and makes it accessible via queries formulated solely in the language of the ontology without any knowledge of the actual structure of the data. To make this idea more precise, let us assume that the ontology, T , is given by a finite set of sentences of (a suitable fragment of) first-order logic (FO) and the data D by a finite set of ground atoms P (a1, . . . , an) of FO, where the ai are individual names (constants) and P is an n-ary predicate symbol. A query q(~x) is an FO-formula with free variables ~x, called the answer variables. At its core, the OBDA scenario is typical of the logic-based approach to knowledge representation, where logical theories are employed to represent knowledge, and reasoning is required to unlock that knowledge for applications. In OBDA, the ontology T is typically used to enrich the data with additional vocabulary for querying, to translate between different data and query vocabularies, and for reconciling the different vocabularies of multiple data sources. The data D is assumed to be incomplete to allow for the inference of additional data by means of reasoning. This requires, at least in principle, reasoning over a potentially infinite set of possible models of T and D. Thus, the fundamental query-answering problem we are facing is to decide whether a tuple ~a of individual names from D is a certain answer to q over T and D—i.e., whether q(~a) is true in every FO-modelM of T and D. In contrast, the relational database paradigm presupposes that the data is complete, so answering a query means checking whether it holds in the single model given by D. As a simple illustration of OBDA, consider the query φ(x) = ∃y, z (city(x) ∧ has airport(x, y) ∧ located in(x,US )∧named for(y, z)∧ww2 hero(z)), asking for US cities with an airport named after a WW2 hero. Let us assume that we have a database DB with tables for all the relations mentioned in φ(x), except the more abstract concept ww2 hero, and with additional tables for ww2 deco (WW2 decoration) and recipient of . Thus, DB contains atoms such as city(Chicago), has airport(Chicago,ORD), located in(Chicago,US ), named for(ORD ,O ′Hare), recipient of (O ′Hare,ww2 medal of honor), ww2 deco(ww2 medal of honor). As DB contains no data for the relation ww2 hero, no answer to φ(x) over DB can be found. However, if we describe WW2 heros by means of an ontology H with sentences such as ∀x, y (recipient of (x, y)∧ww2 deco(y)→ ww2 hero(x)), Chicago becomes an answer to φ(x) overH and DB. To be useful in practice, OBDA should scale to large amounts of data and preferably be as efficient as standard relational database management systems (RDBMSs), where decades of research have been invested to make them scalable. Realistically, this means that we are interested in such ontology and query languages for which OBDA is efficiently reducible to tasks that can be executed using existing RDBMSs. Thus, given T , D and q(~x) as above, we want to compute a finite FO model D′ and an FO query q′(~x) such that (ans) ~a is an answer to q′(~x) over D′ if, and only if, ~a is a certain answer to q(~x) over T and D. To illustrate the type of reduction we have in mind, observe that querying (H,DB) with φ(x) can be reduced to asking φ(x) over the extension DB′ of DB with a table for ww2 hero containing all names in DB who are recipients of a WW2 decoration (which can be easily computed). Another possible reduction is to use the same databaseDB but rewrite the query φ(x) to a new query φ′(x), which results from φ(x) by replacing the conjunct ww2 hero(z) with ww2 hero(z) ∨ ∃v (recipient of (z, v) ∧ ww2 deco(v)). As the size of data is normally large and many different queries can be posed to the same database, at least the following two requirements should supplement (ans): (dat) D′ is computable in polynomial time inD and does not depend on q(~x); (que) q′(~x) does not depend on D. There are various possible refinements of these conditions. The query-rewriting approach of [Calvanese et al., 2007] does not allow modifications of the data, so that (dat) is replaced with D′ = D. As a result, this approach is only applicable to description logics for which query-answering belongs to the class AC for data complexity (that is, if only the data is regarded as input, whereas the ontology and the query are regarded as fixed). Although guaranteeing the same data complexity as in RDBMSs, the rewriting approach does not impose any restrictions on the size of the ‘rewritten’ queries q′(~x), which may be exponential in the size of q and so prohibitive for efficient execution by RDBMSs. In this paper, we suggest a different refinement of conditions (dat) and (que) by taking account of the size of T : (dat′) D′ is computable in polynomial time in both T andD, preferably using RDBMSs; (que′) q′(~x) is polynomial in T and q(~x). These conditions emerge from the combined approach to OBDA suggested in [Lutz et al., 2009; Kontchakov et al., 2010] and aim at scenarios where it is allowed to manipulate the source data (which is not always the case in information integration). The motivation for this approach is twofold. First, by allowing D′ 6= D, we gain the advantage of much smaller and transparent rewritings q′(~x) of the query. The experimental data we discuss below indicates that this leads to significant performance improvements. Second, the approach advocated here is not confined to the languages for which OBDA is in AC for data complexity. As shown in [Lutz et al., 2009], it can be successfully applied to ontology languages for which query-answering is PTIME-complete for data complexity. Over many years now, the most popular and successful ontology languages are based on description logics. In fact, the DL-Lite family of description logics [Calvanese et al., 2007; Artale et al., 2009] was specifically designed for the rewriting approach to OBDA and is the logical underpinning of the profile OWL 2 QL of the OWL 2 Web Ontology Language. In our exposition of the combined approach below, we focus on a simple member of the DL-Lite family, but also briefly discuss its applicability to the EL family of description logics, which underlies the OWL 2 EL profile of OWL 2. 2 DL-Litehorn To discuss the main ideas behind the combined approach, we consider the description logic DL-Litehorn [Artale et al., 2009] designed to represent relationships between concepts (unary predicates in FO or classes in OWL ) and the domains and ranges of roles (binary relations in FO or properties in OWL ). Ontologies in DL-Litehorn, as well as most other description logics, are called TBoxes (T for terminology) and consist of inclusions between concepts. The expressive power of the logic depends then on the constructors available to build concepts. In the case of DL-Litehorn, roles R and concepts C are built from concept names Ai and role names Pi, i ≥ 0, according to the following syntax rules: R ::= Pi | P− i , C ::= > | ⊥ | Ai | ∃R, and a DL-Litehorn TBox T is a finite set of concept inclusions (CIs) C1 u · · · uCn v C, where the Ci and C are concepts. Every CI can be regarded as a first-order Horn sentence. For example, ∃P u ∃P− v A has exactly the same meaning as ∀x (∃y P (x, y) ∧ ∃y P (y, x)→ A(x)). Thus, TBoxes are interpreted in standard FO structures I = (∆I , Ai , P I i )i≥0, where ∆I is a non-empty domain, Ai ⊆ ∆I and P I i ⊆ ∆I ×∆I . We set >I = ∆I , ⊥I = ∅, (P− i ) I = {(x, y) | (y, x) ∈ P I i }, and (∃R)I = {x ∈ ∆I | there is y ∈ ∆I with (x, y) ∈ RI}; so ∃P is interpreted as the domain of P and ∃P− as its range. We write I |= dn i=1 Ci v C and say that dn i=1 Ci v C is satisfied in I if ⋂n i=1 C I i ⊆ CI . In description logic, ground atoms of the form A(a) and P (a, b), where A is concept name, P a role name, and a, b individual names, are called concept assertions and role assertions, respectively. An ABox, A, is a finite set of concept and role assertions, which is used to store instance data. In interpretations I, aI is the domain element interpreting the individual name a in I. As usual, I |= A(a) if aI ∈ AI , and I |= P (a, b) if (aI , bI) ∈ P I . We denote by Ind(A) the set of individual names occurring in A, and often write P−(a, b) ∈ A instead of P (b, a) ∈ A. A DL-Litehorn knowledge base (KB) is a pair K = (T ,A). An interpretation I is a model of a KB K = (T ,A) if I |= α for all α ∈ T ∪ A. We write K |= α whenever I |= α for all models I of K. K is consistent if it has a model. Consistency of DL-Litehorn KBs is known to be PTIME-complete [Artale et al., 2009] for combined complexity. Example 1. Consider the KB K = (T , {A(a)}), where T = {A v ∃T, ∃T− v B, B v ∃R, ∃R− v A}. Two models of K, called GK and UK, are depicted below:

DOI: 10.5591/978-1-57735-516-8/IJCAI11-442

Extracted Key Phrases

0204020112012201320142015201620172018
Citations per Year

142 Citations

Semantic Scholar estimates that this publication has 142 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Kontchakov2011TheCA, title={The Combined Approach to Ontology-Based Data Access}, author={Roman Kontchakov and Carsten Lutz and David Toman and Frank Wolter and Michael Zakharyaschev}, booktitle={IJCAI}, year={2011} }