On the Influence of Incoherence in Inconsistency-tolerant Semantics for Datalog±

Abstract

The concept of incoherence naturally arises in ontological settings, specially when integrating knowledge. In this work we study a notion of incoherence for Datalog± ontologies based on the definition of satisfiability of a set of existential rules regarding the set of integrity constraints in a Datalog± ontology. We show how classical inconsistency-tolerant semantics for query answering behaves when dealing with atoms that are relevant to unsatisfiable sets of existential rules, which may hamper the quality of answers—even under inconsistency-tolerant semantics, which is expected as they were not designed to confront such issues. Finally, we propose a notion of incoherency-tolerant semantics for query answering in Datalog±, and present a particular one based on the transformation of classic Datalog± ontologies into defeasible Datalog± ones, which use argumentation as its reasoning machinery. Introduction and Motivation The problem of inconsistency in ontologies has been widely acknowledged in both the Semantic Web and Database Theory communities, and several methods have been developed to deal with it, e.g., (Arenas, Bertossi, and Chomicki 1999; Lembo et al. 2010; Lukasiewicz, Martinez, and Simari 2012; Black, Hunter, and Pan 2009; Bienvenu 2012; Martinez et al. 2014). The most widely accepted semantics for querying inconsistent databases is that of consistent answers (Arenas, Bertossi, and Chomicki 1999) (or AR semantics in (Lembo et al. 2010) for ontological languages), which yields the set of atoms that can be derived despite all possible ways of repairing the inconsistency. In this semantics often an assumption is made that the set of ontological knowledge Σ expresses the semantics of the data and as such there is no internal conflict on the set of constraints, which is not subject to changes over time. This means first, that the set of constraints is always satisfiable, in the sense that their application do not inevitably yield a consistency problem; second, as a result of the previous observation, it must be the case that the conflicts come from the data contained in the database instance and that is the part of the ontology that must be modified in order to restore consistency. Copyright c © 2015, for this paper by its authors. Copying permitted for private and academic purposes. Although to consider the constraints as always satisfiable is a reasonable assumption to make, specially in the case of a single ontology, in this work we will focus on a more general setting and consider that both data and constraints can change through time and become conflicting. In this more general scenario, as knowledge evolves (and so the ontology that represents it) not only data related issues can appear, but also constraint related ones. The problem of conflicts among constraints is known in the Description Logics community as incoherence (Flouris et al. 2006; Qi and Hunter 2007). As they were not developed to consider this kind of issue, several of the well-known inconsistency-tolerant semantics for query answering fail at computing good quality answers in the presence of incoherence. In this paper we focus on a particular family of of ontological languages, namely Datalog± (Calı̀, Gottlob, and Lukasiewicz 2012a). We show how incoherence can arise in Datalog± ontologies, and how the reasoning technique based on the use of defeasible elements in Datalog± and an argumentative semantics introduced by Martinez et al. (2014) can tolerate such issues, thus resulting in a reasoning machinery suitable of dealing with both incoherent and inconsistent knowledge. This work integrates three different building blocks: first, we introduce the notion of incoherence for Datalog± ontologies, relating it to the problem of satisfiability of concepts for Description Logics; second, we show how such notion affects most of well-known inconsistency-tolerant semantics which, since they were not designed to confront such issues, can go up to the point of not returning any useful answer; finally, we propose a definition for incoherency-tolerant semantics, introducing an alternative semantics based on an argumentative reasoning process over the transformation of Datalog± ontologies to their correspondent defeasible Datalog± ontologies. We show how this semantics behaves in a satisfactory way in the presence of incoherence, as the process can return as answers atoms that trigger incoherency, which we show that cannot be done by classical inconsistency-tolerant semantics. Preliminaries First, we briefly recall some basics on Datalog± (Calı̀, Gottlob, and Lukasiewicz 2012a). We assume (i) an infinite universe of (data) constants ∆ (which constitute the “normal” domain of a database), (ii) an infinite set of (labeled) nulls ∆N (used as “fresh” Skolem terms, which are placeholders for unknown values, and can thus be seen as variables), and (iii) an infinite set of variables V (used in queries, dependencies, and constraints). Different constants represent different values (unique name assumption), while different nulls may represent the same value. We assume a lexicographic order on ∆∪∆N , with every symbol in ∆N following all symbols in ∆. We denote by X sequences of variables X1, . . . , Xk with k≥ 0. We assume a relational schemaR, which is a finite set of predicate symbols (or simply predicates). A term t is a constant, null, or variable. An atomic formula (or atom) a has the form P (t1, ..., tn), where P is an n-ary predicate, and t1, ..., tn are terms. A database (instance) D for a relational schema R is a (possibly infinite) set of atoms with predicates fromR and arguments from ∆. Given a relational schema R, a tuple-generating dependency (TGD) σ is a first-order formula ∀X∀Y Φ(X,Y) → ∃ZΨ(X,Z), where Φ(X,Y) and Ψ(X,Z) are conjunctions of atoms over R (without nulls), called the body and the head of σ, respectively. Satisfaction of TGDs are defined via homomorphisms, which are mappings μ : ∆∪∆N ∪V → ∆∪∆N ∪V such that (i) c∈∆ implies μ(c) = c, (ii) c∈∆N implies μ(c)∈∆∪∆N , and (iii) μ is naturally extended to atoms, sets of atoms, and conjunctions of atoms. Consider a databaseD for a relational schemaR, and a TGD σ onR of the form Υ(X,Y) → ∃ZΨ(X, Z). Then, σ is applicable to D if there exists a homomorphism h that maps the atoms of Υ(X,Y) to atoms of D. Let σ be applicable to D, and h′ be a homomorphism that extends h as follows: for each Xi ∈ X, h(Xi) = h(Xi); for each Zj ∈ Z, h(Zj) = zj , where zj is a “fresh” null, i.e., zj ∈ ∆N , zj does not occur in D, and zj lexicographically follows all other nulls already introduced. The application of σ on D adds to D the atom h′(Ψ(X,Z)) if it is not already in D. After the application we say that σ is satisfied by D. The Chase for a database D and a set of TGDs Σ T , denoted chase(D,Σ T ), is the exhaustive application of the TGDs (Calı̀, Gottlob, and Lukasiewicz 2012b) in a breadth-first (level-saturating) fashion, which leads to a (possibly infinite) chase for D and Σ. Since TGDs can be reduced to TGDs with only single atoms in their heads, in the sequel, every TGD has without loss of generalization a single atom in its head. A conjunctive query (CQ) over R has the form Q(X) =∃Y Φ(X,Y), where Φ(X,Y) is a conjunction of atoms (possibly equalities, but not inequalities) with the variables X and Y, and possibly constants, but without nulls. In this work we restrict our attention to atomic queries. A Boolean CQ (BCQ) over R is a CQ of the form Q(), often written as the set of all its atoms, without quantifiers. The set of answers for a CQ Q to D and Σ, denoted ans(Q,D,Σ), is the set of all tuples a such that a∈Q(B) for all B ∈mods(D,Σ). The answer for a BCQ Q to D and Σ is Yes, denoted D∪Σ |=Q, iff ans(Q,D,Σ) 6= ∅. It is important to remark that BCQsQ overD and Σ T can be evaluated on the chase forD and Σ T , i.e.,D∪Σ T |= Q is equivalent to chase(D,Σ T ) |= Q (Calı̀, Gottlob, and Lukasiewicz 2012b). Negative constraints (NCs) are first-order formulas of the form ∀XΦ(X)→ ⊥, where the body X is a conjunction of atoms (without nulls) and the head is the truth constant false, denoted ⊥. Intuitively, the head of these constraints have to evaluate to false in D under a set of TGDs Σ T . That is, an NC τ is satisfied by a database D under a set of TGDs Σ T iff there not exists a homomorphism h that maps the atoms of Φ(X) toD, whereD is such that every TGD in Σ T is satisfied. As we will see through the paper, negative constraints are important to identify inconsistencies in a Datalog± ontology, as their violation is one of the main inconsistency sources. In this work we restrict our attention to binary negative constraints (or denial constraints), which are NCs such that their body is the conjunction of exactly two atoms, e.g., p(X,Y ) ∧ q(X,Z) → ⊥. As we will show later, this class of constraints suffices for the formalization of the concept of conflicting atoms. Equality-generating dependencies (EGDs) are first-order formulas of the form ∀XΦ(X)→ Xi = Xj , where Φ(X) is a conjunction of atoms, andXi andXj are variables from X. An EGD σ is satisfied in a database D for R iff, whenever there exists a homomorphism h such that h(Φ(X)) ⊆ D, it holds that h(Xi) = h(Xj). In this work we will focus on a particular class of EGDs, called separable (Calı̀, Gottlob, and Lukasiewicz 2012a); intuitively, separability of EGDs w.r.t. a set of TGDs states that, if an EGD is violated, then atoms contained inD are the reason of the violation (and not the application of TGDs); i.e., if an EGD in Σ E is violated when we apply the TGDs in Σ T for a database D, then the EGD is also violated in D. Separability is an standard assumption in Datalog± ontology, as one of the most important features of this family of languages is the focus on decidable (Calı̀, Lembo, and Rosati 2003) (actually tractable) fragments of Datalog±. EGDs play also an important role in the matter of conflicts in Datalog± ontologies. Note that the restriction of using only separable EGDs makes that certain cases of conflicts are not considered in our proposal; the treatment of such cases, though interesting from a technical point of view, are outside the scope of this work since we focus on tractable fragments of Datalog± as the ones mentioned above. Moreover, as for the case with NCs, we restrict EGDs to binary ones; that is, those which body ∀XΦ(X) is such that Φ(X) is the conjunction of exactly two atoms, e.g., p(X,Y ) ∧ q(X,Z)→ Y = Z. We usually omit the universal quantifiers in TGDs, NCs and EGDs, and we implicitly assume that all sets of dependencies and/or constraints are finite. Datalog± Ontologies. A Datalog± ontology KB = (D, Σ), where Σ = Σ T ∪Σ E ∪Σ NC , consists of a databaseD, a set of TGDs Σ T , a set of separable EGDs Σ E , and a set of negative constraints ΣNC . Example 1 illustrates a simple Datalog ± ontology. Example 1 Consider the following KB.  D : {a1 : can sing(simone) a2 : rock singer(axl), a3 : sing loud(ronnie), a4 : has fans(ronnie), a5 : manage(band1 , richard)} ΣNC : {τ1 : sore throat(X) ∧ can sing(X)→ ⊥, τ2 : unknown(X) ∧ famous(X)→ ⊥} ΣE : {ν1 : manage(X,Y ) ∧manage(X,Z)→ Y = Z} ΣT : {σ1 : rock singer(X)→ sing loud(X), σ2 : sing loud(X)→ sore throat(X), σ3 : has fans(X)→ famous(X), σ4 : rock singer(X)→ can sing(X)}  Following the classical notion of consistency, we say that a consistent Datalog± ontology has a non-empty set of models. Consistency. A Datalog± ontology KB = (D,Σ) is consistent iff mods(D,Σ) 6= ∅. We say that KB is inconsistent otherwise. Incoherence in Datalog± The problem of obtaining consistent knowledge from an inconsistent knowledge base is natural in many computer science fields. As knowledge evolves, contradictions are likely to appear, and these inconsistencies have to be handled in a way such that they do not affect the quality of the information obtained from the knowledge base. In the setting of Consistent Query Answering (CQA), database repairing, and inconsistency-tolerant query answering in ontological languages (Arenas, Bertossi, and Chomicki 1999; Lembo et al. 2010; Lukasiewicz, Martinez, and Simari 2012), often the assumption is made that the set of constraints Σ expresses the semantics of the data in the component D, and as such there is no internal conflict on the set of constraints and these constraints are not subject to changes over time. We argue that it is also important to identify and separate the sources of conflicts in Datalog± ontologies. In the previous section we defined inconsistency of a Datalog± ontology based on the lack of models. From an operational point of view, conflicts appear in a Datalog± ontology whenever a NC or an EGD is violated, that is, whenever the body of one such constraint can be mapped to either atoms in D or atoms that can be obtained from D by the application of the TGDs in ΣT ⊆ Σ. Besides these conflicts, we will also focus on the relationship between the set of TGDs and the set of NCs and EGDs, as it could happen that (a subset of) the TGDs in ΣT cannot be applied without always leading to the violation of the NCs or EGDs. Note that in this case clearly the data in the database instance is not the problem, as any database in which these TGDs are applicable will inevitable produce an inconsistent ontology. This issue is related to that of unsatisfiability problem of a concept in an ontology and it is known in the Description Logics community as incoherence (Flouris et al. 2006; Qi and Hunter 2007). Incoherence can be particularly important when combining multiple ontologies since the constraints imposed by each one of them over the data could (possibly) represent conflicting modellings of the application at hand. Clearly, the notions of incoherence and inconsistency are highly related; in fact, Flouris et al. (2006) establish a relation between incoherence and inconsistency, considering the former as a particular form of the latter. Our proposed notion of incoherence states that given a set of incoherent constraints Σ it is not possible to find a set of atoms D such that KB = (D,Σ) is a consistent ontology and at the same time all TGDs in Σ T ⊆ Σ are applicable in D. This means that a Datalog± ontology KB can be consistent even if the set of constraints is incoherent, as long as the database instance does not make those dependencies applicable. On the other hand, a Datalog± ontology KB can be inconsistent even when the set of constraints is coherent. Consider, as an example, the following KB = ({tall(peter), small(peter)}, {tall(X) ∧ small(X)→ ⊥}), where the (empty) set of dependencies is trivially coherent; the ontology is, nevertheless, inconsistent. In the last decades, several approaches to handling inconsistency were developed in Artificial Intelligence and Database Theory (e.g., (Konieczny and Pérez 2002; Delgrande and Jin 2012; Arenas, Bertossi, and Chomicki 1999)). Some of the best known approaches deal with inconsistency by removing from the theory atoms, or a combination of atoms and constraints or rules. A different approach is to simultaneously consider all possible ways of repairing the ontology by deleting or adding atoms, as in most approaches to Consistent Query Answering (Arenas, Bertossi, and Chomicki 1999) (CQA for short). However, these datadriven approaches might not be adequate for an incoherent theory and may produce meaningless results. As we stated before, an incoherent set Σ renders inconsistent any ontology whose database instance is such that the TGDs are applicable; in particular cases this may lead to the removal of every single atom in a database instance in an attempt to restore consistency, resulting in an ontology without any valuable information, when it could be the case that it is the set of constraints that is ill defined. Before formalizing the notion of incoherence that we use in our Datalog± setting we need to identify the set of atoms relevant to a given set of TGDs. Intuitively, we say that a set of atoms A is relevant to a set T of TGDs if the atoms in the set A are such that the application of T over A generates the atoms that are needed to apply all dependencies in T , i.e., A triggers the application of every TGD in T . Formally, the definition of atom relevancy is as follows: Definition 1 (Relevant Set of Atoms for a Set of TGDs) Let R be a relational schema, T be a set of TGDs, and A a (possibly existentially closed) non-empty set of atoms, both over R. We say that A is relevant to T iff for all σ ∈ T of the form ∀X∀YΦ(X,Y)→ ∃ZΨ(X,Z) it holds that chase(A, T ) |= ∃X∃YΦ(X,Y). When it is clear from the context, if a singleton set A = {a} is relevant to T ⊆ Σ T we just say that atom a is relevant to T . The following example illustrates atom relevancy. Example 2 (Relevant Set of Atoms) Consider the following constraints: Σ T = {σ1 : supervises(X,Y )→ supervisor(X), σ2 : supervisor(X) ∧ take decisions(X)→ leads department(X,D), σ3 : employee(X)→ works in(X,D)} First, let us consider the set A1 = {supervises(walter, jesse), take decisions(walter), employee(jesse)}. This set is a relevant set of atoms to the set of constraints Σ T = {σ1, σ2, σ3}, since σ1 and σ3 are directly applicable to A1 and σ2 becomes applicable when we apply σ1 (i.e., the chase entails the atom supervisor(walter), which together with take decisions(walter) triggers σ2). However, the set A2 = {supervises(walter, jesse), take decisions(gus)} is not relevant to Σ T . Note that even though σ1 is applicable to A2, the TGDs σ2 and σ3 are never applied in chase(A2,ΣT ), since the atoms in their bodies are never generated in chase(A2,ΣT ). For instance, consider the TGD σ2 ∈ ΣT . In the chase of ΣT over D we create the atom supervisor(walter), but nevertheless we still cannot trigger σ2 since we do not have and cannot generate the atom take decisions(walter), and the atom take decisions(gus) that is already in A2 does not match the constant value. We now present the notion of coherence for Datalog±, which adapts the one introduced by Flouris et al. for DLs (Flouris et al. 2006). Our conception of (in)coherence is based on the notion of satisfiability of a set of TGDs w.r.t. a set of constraints. Intuitively, a set of dependencies is satisfiable when there is a relevant set of atoms that triggers the application of all dependencies in the set and does not produce the violation of any constraint in Σ NC ∪ Σ E , i.e., the TGDs can be satisfied along with the NCs and EGDs in KB. Definition 2 (Satisfiability of a set of TGDs w.r.t. a set of constraints) LetR be a relational schema, T ⊆ Σ T be a set of TGDs, and N ⊆ ΣNC ∪ ΣE , both over R. The set T is satisfiable w.r.t. N iff there is a set A of (possibly existentially closed) atoms overR such that A is relevant to T and mods(A, T ∪ N) 6= ∅. We say that T is unsatisfiable w.r.t. N iff T is not satisfiable w.r.t. N . Furthermore, Σ T is satisfiable w.r.t. Σ NC ∪ Σ E iff there is no T ⊆ Σ T such that T is unsatisfiable w.r.t. some N with N ⊆ Σ NC ∪ Σ E . In the rest of the paper sometimes we write that a set of TGDs is (un)satisfiable omitting the set of constraints, we do this in the context of a particular ontology where we have a fixed set of constraints ΣNC ∪ΣE . Also, through the paper we denote by U(KB) the set of minimal unsatiasfiable sets of TGDs in Σ T for KB (i.e., unsatisfiable set of TGDs such that every proper subset of it is satisfiable). The following example illustrates the concept of satisfiability of a set of TGDs in a Datalog± ontology Example 3 (Unsatisfiable sets of dependencies) Consider the following sets of constraints. Σ NC = {τ : risky job(P ) ∧ unstable(P )→ ⊥} Σ T = {σ1 : dangerous work(W ) ∧ works in(W,P )→ risky job(P ), σ2 : in therapy(P )→ unstable(P )} The set Σ T is a satisfiable set of TGDs, and even though the simultaneous application of σ1 and σ2 may violate some formula in Σ NC ∪ Σ E , that does not hold for every relevant set of atoms. Consider as an example the relevant set D1 = {dangerous work(police), works in(police,marty), in therapy(rust)}; D1 is a relevant set for Σ1T , however, as we have that mods(D1,Σ1T ∪Σ 1 NC ∪Σ E ) 6= ∅ then Σ T is

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@inproceedings{Deagustini2015OnTI, title={On the Influence of Incoherence in Inconsistency-tolerant Semantics for Datalog±}, author={Cristhian A. D. Deagustini and Maria Vanina Martinez and Marcelo A. Falappa and Guillermo Ricardo Simari}, booktitle={JOWO@IJCAI}, year={2015} }