Conditional Context-Free Languages of Finite Index
Recently, the deductive database community has paid considerable attention on the specification of database queries in terms of linear datalog programs (also referred to as linearization, see [lo] for further references) and on the maximal predicate arities necessary in such specifications [ 11. In [ 21, the current author proposed and studied a new grammar (called “positive programmed grammar”) as a tool to investigate linearization of chain queries. (If L is a set of words over binary predicate symbols, chain query QL is roughly the mapping that, on input a database or labeled graph D, returns the set of pairs of start and end nodes of paths in D that spell words in L.) Interestingly, that paper revealed a close relation between the index of a grammar (i.e., the maximum number of variables used at each step of each derivation) and the predicate arity of a linear datalog program. Besides being useful for datalog applications, the index of other kinds of grammars and languages was extensively studied previously [ 5,8,9] for language-theoretic purposes. Recognizing the importance of the index concept, in this paper we present the following two results on index: There is a strict index hierarchy of positive programmed languages (Theorem 3). In the context of datalog, this implies that, for each positive integer k, there is a chain query which is “addressably computed”  by a linear datalog program with maximal predicate arity 2k, but not by any such program with maximal predicate arity less than k. It is undecidable whether an arbitrary context-free grammar defines a positive programmed language with finite index (Theorem 4). In the context of datalog, this implies that it is undecidable, for an arbitrary context-free language L, whether QL is addressably computable  by some linear datalog program. A motivating example that intuitively describes how a linear datalog program corresponds to a positive programmed grammar is given in Section 3.