Chase Termination Beyond Polynomial Time

The chase is a widely implemented approach to reason with tuple-generating dependencies (tgds), used in data exchange, data integration, and ontology-based query answering. However, it is merely a semi-decision procedure, which may fail to terminate. Many decidable conditions have been proposed for tgds to ensure chase termination, typically by forbidding some kind of "cycle'' in the chase process. We propose a new criterion that explicitly allows some such cycles, and yet ensures termination of the standard chase under reasonable conditions. This leads to new decidable fragments of tgds that are not only syntactically more general but also strictly more expressive than the fragments defined by prior acyclicity conditions. Indeed, while known terminating fragments are restricted to PTime data complexity, our conditions yield decidable languages for any k- ExpTime. We further refine our syntactic conditions to obtain fragments of tgds for which an optimised chase procedure decides query entailment in PSpace or k- ExpSpace, respectively.


INTRODUCTION
The chase [1,6,33] is an essential method for reasoning with constraints in databases, with application areas including data exchange [24], constraint implication [5], data cleansing [26], query optimization [2,7,36], query answering under constraints [13,38], and ontological reasoning [4,12].The basis for this wide applicability is the chase's ability to compute a universal model for a set of constraints [21], which is either used directly (e.g., as a repaired database) or indirectly (e.g., for deciding query entailment).
However, universal models can be infinite, and chase termination is undecidable on a single database [5] as well as in its stricter uniform version over arbitrary databases [27,29].The root of this problem are a form of constraints known as tuple-generating dependencies (tgds) -or existential rules: Horn logic rules with existential quantification in conclusions -, since they may require additional domain elements (represented by nulls) to be satisfied.
A large body of research is devoted to finding decidable cases for which termination can be guaranteed, mainly by analysing the data flow, i.e., the propagation of nulls in the chase [14,18,22,24,31,34]. 1 Here we can distinguish graph-based abstractions, such as weak acyclicity [24], from materialisation-based approaches, such as MFA [18].In general, decidable criteria are sufficient but not necessary for termination, but recent breakthroughs established decidability of termination for the linear, guarded, and sticky classes of tgds [9,10,28,37].
Meanwhile, another productive line of recent research clarified the computational power of the chase, characterising the query functions that can be expressed by conjunctive queries under tgd constraints.This expressive power is bounded by data complexity, but can be lower: famously, Datalog does not capture all queries in P, not even those closed under homomorphisms2 [20].Results are more satisfying for tuple-generating dependencies.Not only do arbitrary tgds capture the recursively enumerable homomorphism-closed queries [39], but, surprisingly, the decidable homomorphism-closed queries can be captured by tgds for which the standard (a.k.a.restricted) chase terminates [8].In other words, the standard chase over tgds without any extensions is a universal computational paradigm for database query answering.This is highly encouraging since a majority of chase implementations already support this version of the chase procedure [3,7,26,36,38,41], often with favourable performance [6].
Naturally, tractable data complexity is often desirable in database applications, and has therefore been the focus of many works in the area.Unfortunately, the potential of chase-based computation for more complex computations has meanwhile been neglected.To our knowledge, the only line of research where the chase was used for harder-than-P computations relies on a method for modelling finite sets with tgds [15].Practical feasibility was shown for E T -complete ontology-based query answering [15], set-terms in answer set programming [25], complex values in Datalog [35], and why-provenance [23].In essence, all of these works are based on a single set of tgds for which uniform termination is shown directly.Known chase termination criteria fail for this case, since they only recognise queries in P. In fact, most criteria are even known to describe fragments that do not increase upon the expressive power of Datalog [32,42], This limitation is common to all tgd sets on which the semi-oblivious chase uniformly terminates.We are aware of only one approach so far that studies the more complicated standard chase at all [14], but which also remains in P.
We tackle this challenge with a new method that extends the graph-based termination criterion of joint acyclicity [31] with new decidable conditions that allow for some kinds of cycles.These conditions are specific to the standard chase, and enforce that tgd applications within a cycle are eventually blocked, possibly only after (double) exponentially many loops.Our conditions detect the uniform termination of the set-modelling tgds of Carral et al. [15], but significantly extend upon this baseline: even a single strongly connected component in the data-flow graph can correspond to a 2-E T -complete query (whereas Carral et al. deal with exponentially many sets).
Leveraging again the graph-based view, we can further analyse the data flows between cyclic strongly connected components to describe decidable fragments of arbitrary multi-exponential data complexity.Our resulting decidable fragment of saturating tgds therefore has non-elementary complexity.This is already very general, especially when considering that capturing all decidable homomorphism-closed queries can in principle only be achieved with tgd fragments that are not even recursively enumerable [8].
The structure of the data-flow graph enables us to compute more precise -E T bounds for any saturating tgd set.Refining this analysis further, we then identify cases where the complexity of query answering drops to ( − 1)-E S . In particular, we therefore obtain a decidable fragment of uniformly terminating tgd sets with PS -complete query entailment.To the best of our knowledge, this is the first such fragment.
All of our results establish uniform termination of the standard chase under all chase strategies that prioritise Datalog rules (Krötzsch et al. call this the Datalog-first strategy [32]).By a recent result, this is a stronger requirement than termination under some strategies [16].As of today, however, all known termination criteria imply Datalog-first termination, and preferring Datalog rules is also a common heuristic in practice.
In summary, our main contributions are as follows: • In Section 3, we refine the dependency graph of Krötzsch and Rudolph [31] to capture data flow in more detail.• In Section 4, we study the propagation of inferences between nulls related to strongly connected components in the extended dependency graph.We find conditions that suffice to prevent infinite repetitions of inference cycles, define the language of saturating tgds, and show uniform standard chase termination for this fragment.• In Section 5, we analyse the exact complexity (and size) of the standard chase over saturating sets by assigning ranks to strongly connected components in the extended dependency graph.We differentiate the case of single and double exponential complexity for individual components, and we establish matching lower bounds to show that query entailment is -E T -complete for tgd sets of rank .• In Section 6, we describe conditions that reduce query entailment for tgd sets of rank to ( − 1)-E S .To this end, we discover a tree-like structure within the chase, and we define a syntactic condition called path guardedness that allows us to use an optimised chase procedure.Again, we establish matching lower bounds.Detailed proofs are included in the appendix.

PRELIMINARIES
We consider a signature based on mutually disjoint, countably infinite sets of constants C, variables V, predicates P, and nulls N.Each predicate name ∈ P has an arity ar( ) ≥ 0. Terms are elements of V ∪ N ∪ C. We use to denote a list 1 , . . ., | | of terms, and similarly for special types of terms.An atom is an expression ( ) with ∈ P, a list of terms, and ar( ) = | |.An interpretation I is a set of atoms without variables.A database D is a finite interpretation without nulls, i.e., a finite set of facts (variable-free, null-free atoms).For an interpretation I, we use N(I) = { ∈ N | ( ) ∈ I, ∈ } to denote the nulls used in I.

Rules.
A tuple-generating dependency (tgd) is a formula where and are conjunctions of atoms using only terms from C or from the mutually disjoint lists of variables , , ⊆ V. We call the body (denoted body( )), the head (denoted head( )), and the frontier of .We may treat conjunctions of atoms as sets, and we omit universal quantifiers in tgds.We require that all variables in do really occur in (safety).A tgd without existential quantifiers is a Datalog rule.
Renamings and Substitutions.Without loss of generality, we require that variables in tgd sets Σ are renamed apart, i.e., each variable ∈ V in Σ is bound by a unique quantifier in a unique tgd ∈ Σ.A substitution is a partial mapping : Semantics.We consider a standard first-order semantics.A match of a tgd as in (1) in an interpretation I is a substitution that maps ∪ to terms in I, such that ⊆ I.A match is satisfied if it can be extended to a substitution ′ over ∪ ∪ such that ′ ⊆ I. Reasoning with the Chase.An important reasoning task for tgds is conjunctive query (CQ) answering, which can further be reduced to the entailment of Boolean CQs (BCQs), which are formulas ∃ .[ ] with a conjunction of null-free atoms.This task is undecidable in general.A sound and complete (but not always terminating) class of reasoning procedures is the chase, which exists in many variants.We are interested in the standard chase (a.k.a.restricted chase) under Datalog-first strategies.
Definition 1.A (standard) chase sequence for a database D and a tgd set Σ is a potentially infinite sequence of interpretations D 0 , D 1 , . . .such that (1) D 0 = D; (2) for every D +1 with ≥ 0, there is a match for some tgd = [ , ] → ∃ .
[ , ] ∈ Σ in D such that both of the following hold true: (a) is an unsatisfied match in D (i.e., cannot be extended to a substitution + with where + is such that + ( ) = ( ) for all ∈ , and for all ∈ , + ( ) ∈ N is a distinct null not occurring in D ; we then say that was applied in step , and we define tgd[ ] := , [ ] := , and + [ ] := + ; (3) if a tgd with existential variables is applied in step , then D must satisfy all Datalog rules in Σ; (4) if is a match for a tgd ∈ Σ and D ( ≥ 0), then there is > such that is satisfied in D .Item (3) requires rule applications to follow a Datalog-first strategy, and item (4) ensures fairness.The (standard) chase for such a chase sequence then is chase(Σ, D) = ≥0 D .
A BCQ is entailed by Σ and D if and only if chase(Σ, D) | = .If the chase terminates, this can be determined from the (finite) chase(Σ, D).Termination may depend on the chosen order of tgd applications.The Datalog-first strategy (3) is a common heuristic that tends to improve termination in practice, although one can construct examples where this is not the case [16].Since Datalog rules can only be applied finitely many times, Datalog-first does not impair fairness (4).

THE LABELLED DEPENDENCY GRAPH
The existential dependency graph is used to analyse the data flow between existential variables in a tgd set [31].In particular, a tgd set is jointly acyclic if this graph does not have cycles.In this section, we recall and slightly extend this approach to better suit our needs.
The following definition mostly follows Krötzsch and Rudolph [31], but adds variables as labels to the edges in the graph.Recall that we assume tgd sets to be renamed apart, so that variables can be used to identify tgds .Definition 2. Let Σ be a tgd set.A predicate position is a pair , ∈ P × N with 1 ≤ ≤ ar( ).For a variable in Σ, let Pos (resp.Pos ) be the set of all predicate positions where occurs in the body (resp.head) of its unique tgd ∈ Σ.For an existential variable in Σ, let Ω be the smallest set of positions such that (i) Pos ⊆ Ω , and (ii) for every universal variable , Pos ⊆ Ω implies Pos ⊆ Ω .
The labelled existential dependency graph LXG(Σ) of Σ is a directed graph with the existentially quantified variables of Σ as its vertices and an edge → for every tgd ∈ Σ with a frontier variable such that Pos ⊆ Ω .
Definition 2 slightly sharpens the original definition by introducing edges only based on frontier variables.Moreover, by adding labels, a single edge in the original graph now corresponds to one or more edges in LXG(Σ).Therefore, if the existential dependency graph contains no cycles (i.e., Σ is jointly acyclic), then the labelled existential dependency graph does not contain cycles either.
LXG(Σ) is useful since it over-estimates the possible data flow in the computation of chase(Σ, D).To make this precise, note that any null in chase(Σ, D) is introduced in a chase step as a fresh value = + [ ] ( ) of some existential variable in tgd[ ].We write var( ) for this , and ։ to indicate that [ ] ( ) = for some term ∈ C ∪ N and frontier variable of tgd[ ].

TERMINATION WITH CYCLES
In this section, we establish decidable criteria to show that the chase on a tgd set is guaranteed to be finite, for all input databases, even though the existential dependency graph has some cycles.Whenever a tgd [ , ] → ∃ .
[ , ] was applied in step of chase(Σ, D), the conjunctive query ( ∧ ) [ , , ] has the answer + [ ] over chase(Σ, D).Likewise, a chain of chase steps (or path in LXG(Σ)) leads to a match for a larger query, defined next.
[ , ] be a variant of the tgd of variable , where variables have been bijectively renamed such that tgds for different steps do not share variables.For 1 ≤ ≤ , let ˜ (and ˜ ) denote the renamed version of (and ), and let ˜ +1 denote a fresh variable.→ , where we omit the numbers of the tgds from variables for simplicity.Then Path( ). Variables marked with ′ and ′′ stem from renamed variants of tgds (3) and (5), respectively.Similarly, Path( ) ,1 = up( , ′ + , ¯ 1 ).Lemma 9 ensures that every chain of nulls in the chase is accompanied by facts of the form Path( ), connecting all nulls of the chain.Figure 1 sketches this situation for a cyclic path (left).
A corresponding chain of nulls 0 We can use the facts of Path( ) to infer additional information that may help with chase termination for cyclic paths.Indeed, additional information may prevent tgd applications if the head of a tgd is already entailed (Definition 1 (2.a)).
Clearly, we cannot apply twice for the same substitution of its frontier: for all ′ > and substitutions with = [ ], we have D ′ | = ∃ .due to the presence of 0 .However, a cycle in LXG(Σ) can lead to a chain of tgd applications as in Figure 1, where is applied to different frontiers in several steps.Indeed, if is the chase step that introduced 2 0 , then 0 [ ] ≠ 0 [ ], so the matches differ on at least this frontier variable.If we write − := \ { 0 } for the frontier of without 0 , we can think of − [ ] as the "context" for which was applied to 0 ℓ in step .
Our goal is that can never be applied twice to the same context within a single chain.To ensure this, we would like the chase to derive additional facts 1 and 2 as indicated in Figure 1.Fixing a variable order [ 0 , − , ], these facts are . Chains like those in Figure 1, even if finite, can become rather long, and we need facts to be propagated to all nulls ℓ .We therefore require two kinds of conditions: (i) a base case that turns recently derived (forward) 0 into a (backwards) 1 , and (ii) an inductive step that propagates a (backwards) to another (backwards) +1 .The next definition spells out the two conditions.We generalise slightly by allowing that, instead of applying a single tgd several times, the propagation can occur between different tgds along a path.
where the conclusion is the path's first head conjunction Path( A composite path of the form with Intuitively speaking, considering Figure 1, base propagation allows us to infer 1 from 0 , and step propagation allows us to infer 2 from 1 .In contrast to the simplified situation in the figure, the definition clarifies that the propagated head * is not necessarily the head of the tgds and that occur in the current piece of chain we consider.
As we will see below, the extended example achieves universal termination, and in particular also terminates for the database of Example 6.For the case that next is a strict order, tgd (6) ensures that each sequence is assigned a unique level, even with the additional up-facts from tgds (9) and (10).
The complexity of checking Definition 10 is dominated by the E T complexity of Datalog.
Lemma 12. Checking whether a path is base-propagating (or step-propagating) is E T -complete, and P-complete with respect to the length of .
The conditions of Definition 10 have the desired impact on the chase: if we find a sequence of nulls that corresponds (by Lemma 4) to a path that is propagating, then satisfied head atoms in the chase are propagated accordingly.Moreover, since propagation is defined based on Σ DL , the Datalog-first chase ensures that the propagation happens before further nulls are introduced.To ensure termination, we require that the cycles in each strongly connected component in LXG(Σ) can be broken by removing a set of edges that are connected by propagating paths: Definition 13.Let be a strongly connected component in LXG(Σ), and let be a set of edges in .An ¯ -path is a path in that (i) contains no edges from , (ii) starts in some ∈ { | → ∈ }, and (iii) ends in some without the edges of is acyclic; (2) for all → , ′ ′ → ∈ , we have = ′ ; (3) all paths in , such that ∈ and is an ¯ -path, are base-propagating; and (4) all paths 1 2 in , such that , ∈ and 1 , 2 are ¯ -paths, are step-propagating for every ∈ .Σ is saturating if all strongly connected components in LXG(Σ) are -saturating for some .In practice, the exponential factors in Theorem 15 might be small, already because strongly connected components are often small.The next result states that tgds in the edge sets can only be applied at most once for each substitution of their "context" variables -an essential ingredient for termination.
Lemma 16.Let be an -saturating strongly connected component in LXG(Σ), and let → ∈ have the corresponding tgd with head ∃ , .[ , , , ].For every database D and chain of nulls The main result of this section is as follows.A more detailed analysis follows in the next section.
Example 18. Theorem 17 also subsumes previous termination results by Carral et al. [15].The following tgds captures the essence of their approach of simulating finite sets in tgds: su( , , ) ∧ su( , , ) → su( , , ) The database can provide elem-facts that define a domain of elements, and a fact set (∅).The tgd (11) constructs new "sets" by creating facts su( , , ), which can be read as { } ∪ = .In particular, su( , , ) means ∈ .The tgd (12) propagates memberships ∈ from to a direct superset , which we recognise as a special case of step propagation.Indeed, the only dependency here is → , and all paths (1/2) in Definition 13 are empty.Base propagation is achieved by the atom su( , , ) in (11) without requiring any Datalog rules.

COMPLEXITY OF THE SATURATING CHASE
Next, we refine Theorem 17 by deriving specific bounds for the size of the chase over saturating tgd sets Σ, based on the structure of LXG(Σ).For a vertex of LXG(Σ), we write SCC( ) for the strongly connected component that contains , and SCC(LXG(Σ)) for the set of all strongly connected components.An edge → is incoming for SCC( ) if ∉ SCC( ) and ∈ SCC( ); in this case we write SCC( ) ≺ SCC( ) (where SCC( ) = SCC( )).The transitive reflexive closure of ≺ is the usual induced partial order on SCC(LXG(Σ)).
Note that conf( ) = 0 implies that is trivial, i.e., a singleton set without any cycle (self loop).
Definition 21.Consider a database D and ∈ SCC(LXG(Σ)).A term in chase(Σ, D) is ainput if (i) is a constant, or (ii) is a null and has an incoming edge var( ) → .A null with var( ) ∈ has -depth if there is a maximal chain of nulls 0 1 ։ . . .։ in chase(Σ, D) with = and var( ) ∈ for 0 ≤ ≤ .The -depth of is undefined if the length of such chains is unbounded.
The next result limits the number of bounded--depth nulls based on the number of -inputs, thereby also clarifying the significance of homogeneous confluence.Note that this general insight does not restrict to saturating tgd sets.
Lemma 22. Consider a database D and ∈ SCC(LXG(Σ)).If is the number of -inputs in chase(Σ, D), then, for any ≥ 0, the number of nulls at -depth ≤ is at most doubly exponential in and polynomial in .If is homogeneously confluent, then this number is at most exponential in .
We define rank( ) iteratively for = 1, . . ., as follows, where we assume max{} = 0. First, let Then the rank of is Theorem 24.Let Σ be saturating and D a database.For every existential variable in Σ, the number of nulls with var( ) = in chase(Σ, D) is at most rank(SCC( ))-exponential in the size of D.
Example 14 (and the discussion in Example 3) shows that the upper bound of Theorem 24 can be reached.We can further strengthen this into a hardness result: Theorem 25.Let Σ be saturating.For every database D the size of chase(Σ, D) is at most rank(Σ)exponential in the size of D, and BCQ entailment is rank(Σ)-E T -complete for data complexity.

THE CHASE IN THE FOREST
In this section, we refine Theorem 25 by identifying cases where BCQ answering over saturating tgd sets Σ is not rank(Σ)-E T -complete but merely (rank(Σ) − 1)-E S -complete, and we design a chase procedure that runs within these complexity bounds.To simplify presentation, we consider tgd sets with a single rank-maximal SCC ˆ in LXG(Σ) (generalisations are possible; see concluding remarks).If conf( ˆ ) = 1, we can establish a tree-like search space within the chase that follows the edges of LXG(Σ).A new syntactic restriction on tgds, called path guardedness, ensures that a chase that follows this tree-like structure remains complete for conjunctive query answering.Definition 26.A tgd set Σ is arboreous if it is saturating, and has a unique ˆ ∈ SCC(Σ) with rank( ˆ ) = rank(Σ), which satisfies conf( ˆ ) ≤ 1.For such Σ and some chase(Σ, D), the null forest is the directed graph ˆ , ։ with ˆ = { ∈ N(chase(Σ, D)) | var( ) ∈ ˆ } the nulls of variables in ˆ , and ։ if ։ for some .
Lemma 27.For every arboreous Σ and chase(Σ, D), the null forest is indeed a forest (set of trees).
Next, we use the special tgds that correspond to set of Definition 13 to partition the null forest into sub-forests.The intuition is that special tgd applications start a new sub-forest (their fresh nulls being the roots), whereas other tgd applications remain within their current sub-forest.By placing all remaining terms of the chase in an additional root node, we obtain a tree structure whose nodes are sets of terms that partition the terms of the chase: Definition 28.Let Σ, ˆ , ˆ , and ։ be as in Definition 26, let ˆ be the edge set of Definition 13 for ˆ , and let be the set of all terms in chase(Σ, D).An ˆ -tgd is a tgd with an existential variable that is the target of an edge in ˆ .The ˆ -variables ˆ are all existential variables in ˆ that occur in some ˆ -tgd.
For every chase step where an ˆ -tgd was applied, let [ ] be the set of fresh nulls introduced in step .Let ¯ ⊆ ˆ be the set of nulls that are not in any such [ ].Then, for each [ ], let [ ] ⊆ ˆ be the least set that contains [ ] and all ∈ ¯ for which there is Let L 0 = \ ∈ F .The term tree of chase(Σ, D) is the graph ∼ , ։ with ∼ = F ∪ {L 0 }, and ։ extended to ∼ by setting L 0 ։ for all ∈ F that have no ։-predecessor in F .The reflexive transitive closure of ։ is denoted ։ * .For every term , Lemma 29 allows us to use L( ) to denote the unique set L ∈ ∼ with ∈ L.
Example 30. Figure 2 revisits the abstract example from Figure 1, which we assume to be saturating according to Definition 13 using = { ℓ 0 → 0 }.If this is the unique maximal SCC ˆ , then ˆ = , and we obtain three partitions of nulls that are illustrated in Figure 2: ).These partitions are part of a path in the term tree, which is overlaid on the original null forest.
The motivation for defining such a coarser tree structure is that we intend to use this tree to guide the computation of facts during the chase.We will limit its space-complexity by storing, at each particular moment during the chase, only those facts that can be represented using terms on a single path of this tree.The coarser the tree structure, the more facts can be considered at any moment, the more cases can be handled by this limited form of chase.
Besides this general intuition, the factorisation also plays a crucial role in deriving a syntactic criterion to recognise cases where such a tree-based chase can safely be applied, which we will consider next.The difficulty for this endeavour is that any such syntactic condition eventually has to rely on the facts that have induced the tree-like structure in the first place, such as 0 in Figure 2.But these very facts also occur in backwards direction to ensure saturation, as indicated by 1 and 2 in the figure .In the coarser tree structure, such "backward edges" merely lead to the same node of the tree, rather than to a predecessor node from which we could enter parallel branches in forward direction.
The tree structure of terms as such does not constrain the structure of inferred facts in chase(Σ, D), which may relate nulls from arbitrary positions in the null forest.We seek syntactic restrictions that ensure that the chase respects the term tree in the sense that the terms of any fact are on a common path and impose an order on terms that matches their position in the path.To this end, we derive relationships , , on predicate positions such that, for all ( ) ∈ chase(Σ, D) with , ∈ ˆ , we have that L( ) is an ancestor of (or possibly equal to) L( ) in the term tree.The next definition once again uses our assumption that distinct tgds do not share variables.Definition 31.Let Σ be arboreous with ˆ , ˆ , and ˆ as in Definition 28.We will define a (not necessarily transitive) binary relation on predicate positions , .Any such induces a relation on variables of Σ as the reflexive, transitive closure of the set of all such that and occur at positions , and , in a single body atom of some tgd in Σ, and , , .(1) if ∈ ∩ ˆ and ∈ \ ˆ then , , , (2) if ∈ ∩ ˆ and ∈ then , , , if ∈ ∩ ˆ and ∈ with ∉ ( ) then , , , (4) if , ∈ and then , , .
One can construct in polynomial time with a simple greatest fixed point computation.Note that such a construction is anti-monotonic in Σ: more tgds lead to fewer constraints .
body variables in are mutually comparable with respect to the relation , i.e., form a chain in .Σ is path-guarded if all of its tgds are.
Any node L of the term tree induces a unique upwards path path(L) that consists of all nodes L ′ with L ′ ։ * L. For term , we write path( ) for path(L( )).Inferences of path-guarded tgds are situated on such paths: ≠ ∅ then ⊆ path( ) for some ∈ .
Algorithm 1 specifies a non-deterministic chase procedure to check the entailment of a BCQ.It is intended for arboreous, path-guarded tgd sets Σ with variables ˆ as in Definition 28.The input bounds the length of the search: we will determine a rank(Σ)-exponential value for for which the algorithm decides query entailment in (rank(Σ) − 1)-NE S , showing the problem to be in (rank(Σ) − 1)-E S by Savitch's Theorem.Algorithm 1 maintains a set of inferences I over terms in T , which is a list of sets of terms that corresponds to a path in the term tree.We use operations push, pop, and last, respectively, to add, remove, or read T 's last element.The algorithm performs a search for each atom in (L3), adding one current path to T 's root node after each run (L13).The inner loop (L4) non-deterministically chooses (L5) a tgd to apply to I, or to break the iteration early (even if tgds are applicable).The current path T is pruned so that its last element contains a frontier term of the tgd (L6), before we either add a new term set (L9) or augment the last term set (L11), depending on whether is an ˆ -tgd.Finally, we add the inferred head and restrict I to atoms with terms in T (L12), where + extends to existentially quantified variables using globally fresh nulls (not used in the algorithm before).Finally, we check if I entails (L14).
Given a run of Algorithm 1, we write I (resp.T ) to denote the value of I (resp.T ) after executing (L12) in the th iteration of loop (L3) and the th iteration of (L4).Although Algorithm 1 can forget inferences and repeat the same tgd application with distinct fresh nulls, its computations are correct in the following sense: Lemma 35.Let I * be the union of all sets I of some run of Algorithm 1.There is a homomorphism : I * → chase(Σ, D), and therefore chase(Σ, D) | = whenever Algorithm 1 returns true.
It remains to show that, whenever Σ, D | = , Algorithm 1 admits a run that returns true and is bounded by an as in Lemma 36.Any run corresponds to a sequence of choices for (L5), which consists of | | sequences of tgd applications.Let and define the tgd application used to compute I +1 from I .We say that this tgd application corresponds to chase step if tgd[ ] = , the restriction of to terms in T is injective, and ( ) = − ( [ ] ( )).In this case, we canonically extend to the fresh nulls by ( + ) := + [ ] for all existential variables in .This makes locally injective: Lemma 37.If all tgd applications in a run of Algorithm 1 correspond to chase steps, and in each iteration is the canonical extension for the respective step, then is a homomorphism I * → chase(Σ, D) that is injective on all term sets T that occur during the run.
For completeness of the algorithm, we are interested in runs where all tgd applications correspond to chase steps.Such runs can be specified as a list of | | sequences of chase steps, where repetitions of steps are allowed (and sometimes necessary).
We obtain a suitable choice sequence by scheduling tasks , , to be read as "perform chase step under the assumption that I already contains (isomorphic copies of) all atoms that can be expressed using terms from the first − 1 elements of the path of Lemma 34 (1)." Such tasks may require other tasks to be completed first, since I may not yet contain the whole premise of tgd[ ]: Definition 38.For chase step , let path ( ) denote the path of Lemma 34 (1).For an atom ∈ chase(Σ, D), let path( ) be the smallest path in the term tree that contains all terms of (which are on a path by induction over Lemma 34 (2)).
The subtasks of a task , are all tasks , with ≥ a depth, and < the largest chase step that produced an atom ∈ D +1 \ D with |path( )| = and path( ) ⊆ path ( ).The task tree for , has a root node with label , and the task trees for all subtasks of , as its children.Children of a single parent are ordered by the depth in their label: , < ′ , ′ if < ′ .
The atom in Definition 38 ensures that the application of tgd[ ] in Algorithm 1 will not delete any previous inferences up to depth (through (L6) and (L12)).The order of subtasks ensures that inferences at smaller depths are computed first.Now the required sequence to successfully perform chase step in the inner loop of Algorithm 1 is obtained by traversing the task tree for 1, in a topological, order-respecting way (children before parents, smaller sibling nodes before larger ones), extracting the sequence of chase steps from the second component of the sequence of tasks.
Lemma 39.If Σ is arboreous and path-guarded, is the sequence of chase steps obtained from the task tree with root 1, , then the length | | of is bounded by a rank(Σ)-exponential function.
Lemma 40.If Σ is arboreous and path-guarded, is the sequence of chase steps obtained from the task tree for 1, , and ≥ | |, then Algorithm 1 can choose tgd applications according to .
Combining these results, we obtain the completeness of our tree-based chase.Indeed, whenever ⊆ chase(Σ, D) for some match , there are chase steps 1 , . . ., | | that produce the atoms of .A suitable strategy then executes Algorithm 1 for the choice sequences obtained from the task trees for 1, with = 1, . .
Finally, we use su( , , ) ∧ pos( ) ∧ sat ( ) → sat + ( ) su( , , We check satisfaction by evaluating the tree of literal sets by propagating satisfaction from leafs towards the root.The tgd (25) handles existential quantification, and tgds (26)-( 28) handle universal quantification.Handling the successors of universal states separately ensures that the tgds are pathguarded.Indeed, of Definition 31 for the used tgds contains su, 2 su, 3 .Note that chase( and only if is true.

CONCLUSIONS AND OUTLOOK
We have established new criteria for chase termination, which advance the state of the art in two important ways: (1) they can take advantage of the standard chase, and (2) they yield new decidable tgd classes with data complexities that are complete for -E T and -E S , for any ≥ 0. This is obviously too high for transactional DBMS loads, but it allows us to address a much larger range of complex computational tasks over databases with the chase.Practical problems of this kind include ontology reasoning [15], database provenance computation [23], and querying databases with complex values [35].We also note that checking our criteria is always dominated by Datalog reasoning, which is practically feasible and of lower complexity than some established criteria [18].
Our work brings up many follow-up questions.First, the new tgd classes are candidates for capturing their respective complexity classes (at least from PS upwards), but known proof techniques rely on non-saturating tgds [8].Second, our techniques require a Datalog-first chase strategy, which is avoidable for sets and complex values [35].It is open if similar approaches could apply in our setting.Third, our criteria can be broadened, e.g., the restriction to a single maximalrank component in Section 6 can be relaxed.
Taking a wider view, a central methodological contribution of our work is the labelled dependency graph and its extensive use for analysing the internal structure of the standard chase.It can be seen as a surrogate for the more syntactic "lineage" of nulls that is available in the semioblivious chase -best exposed through the use of skolem terms [34] -, which has been extremely useful in studying that chase variant.It is exciting to ask how our method can be similarly useful in further studying the standard chase, e.g., to detect non-termination or to decide termination in new cases, and whether it can be refined in the style of termination checks based on materialisation or control flow analysis.
) is infinite, it must contain infinitely many nulls.Let ≺ be some total order on nulls of chase(Σ, D) such that ≺ holds whenever was introduced at an earlier chase step than .Let be the directed graph that has all such nulls as its vertices and that contains an edge → if is the ≺-largest null with ։ in chase(Σ, D) for some .Then is acyclic (since ։ implies ≺ ) and a forest (since each vertex has a unique predecessor by definition).Root vertices in correspond to nulls introduced by tgd applications to non-null frontier variables; since there are only finitely many such root nulls, (being infinite) contains an infinite tree .Moreover, is finitely branching: indeed, → ∈ implies that was produced by a tgd application to frontier terms that were introduced in chase(Σ, D) no later than ; there are only finitely many such terms; hence there are only finitely many such tgd applications.As a finitely branching, infinite tree, must contain an infinite path by Kőnig's lemma, and this path corresponds to the required chain.

P
. To check the relevant entailments of the form Σ DL | = → as in ( 7) and ( 8), we can check the entailment Σ DL , ′ | = ′ , where ′ and ′ are sets of atoms obtained by uniformly replacing variables in and with fresh constants.The claimed complexities are that of Datalog entailment [19].
Theorem 15.Deciding if is -saturating is E T -complete in the size of .The same complexity applies to deciding if a tgd set Σ is saturating.

P
. Hardness follows from Lemma 12.For inclusion, note that Definition 13 (1) can be checked in polynomial time for a given .If it holds true, there are at most exponentially many ¯ -paths in , leading to exponentially many checks for (3) and ( 4), which are each in E T by Lemma 12.The last part of the claim follows since there are a polynomial number of strongly connected components, each with at most exponentially many candidate sets for .
Lemma 16.Let be an -saturating strongly connected component in LXG(Σ), and let → ∈ have the corresponding tgd with head ∃ , .[ , , , ].For every database D and chain of nulls

P
. Let := → .By Lemma 4, the given chain corresponds to a path in LXG(Σ).The first and last edge of is , so all edges of are in .We can view as a path of the form

։
be the part of that corresponds to , and let ( ) be the respective chase step (in particular, (1) = and ( ) = ).
We show by induction that, for all 2 ≤ ≤ , we have . This shows the claim, since it means that Definition 1 (2.a) would not be satisfied The base case = 2 follows since 1 1 is base-propagating by Definition 13 (3).For the induction step, suppose the claim holds for .The claim follows for + 1 since +1 +1 is step-propagating for by Definition 13 (4).
Theorem 17.If Σ is saturating, then chase(Σ, D) is finite for all databases D.

P
. Suppose for a contradiction that chase(Σ, D) is infinite.Then there are infinitely many nulls, and a tgd that is applied infinitely often.The tgd of an edge ′ → in LXG(Σ) is , the unique tgd with variable .Now let be a strongly connected component of LXG(Σ) such that (1) contains an edge whose tgd is applied infinitely often, and (2) the tgd of every edge ′ → with ′ ∉ and ∈ is applied only finitely often.
Using a similar argument as for the proof of Lemma 5, we find that there is an infinite chain ) such that var( ) ∈ for all ≥ 0. By Lemma 4, this chain corresponds to an infinite path in .By assumption, is -saturating for some set , so, by Definition 13 (1), some edge → ∈ occurs infinitely often in .Let be its tgd, and let head( ) = ∃ , .[ , , , ].
By Definition 13 (2) and Lemma 4, applications of can only involve null values for variables of if (i) var( ) ∉ , and (ii) there is an edge from var( ) to in LXG(Σ).By our choice of , the number of such nulls is finite, as is the number of constants in Σ and D, so there are only finitely many possible instantiations of in applications of .Together with Lemma 16, this implies that is applied only finitely many times -a contradiction.

C PROOFS FOR SECTION 5
Lemma 22. Consider a database D and ∈ SCC(LXG(Σ)).If is the number of -inputs in chase(Σ, D), then, for any ≥ 0, the number of nulls at -depth ≤ is at most doubly exponential in and polynomial in .If is homogeneously confluent, then this number is at most exponential in .
Theorem 24.Let Σ be saturating and D a database.For every existential variable in Σ, the number of nulls with var( ) = in chase(Σ, D) is at most rank(SCC( ))-exponential in the size of D.

P
. Let 0 , . . ., be a topological order as in Definition 23, and let ( ) denote the number of nulls with var( ) = in chase(Σ, D).Moreover, let denote the number of constants in D and Σ.We show the claim for all existential variables with SCC( ) = by induction over = 0, . . ., .
Consider and assume that the claim holds for all with < .Let − 1 , . . ., − ℓ with − ≺ be the direct ≺-predecessors of .Hence, for all 1 ≤ ≤ ℓ, there is < with − = , and, for every ∈ − , ( ) is at most rank( − )-exponential by induction ( * ).Therefore, the number of -inputs is at most in := + ℓ =1 ∈ − ( ), which (by ( * )) is in -exponential for in as in Definition 23.Analogously, the number cxt of -inputs that are nulls with a -incoming edge var( ) → such that there is an edge The claim now follows from Lemma 22, using that the number of -inputs in is in -exponential as noted above, where the single and double exponential dependency on the -depth corresponds to the use of cxt + 1 and cxt + 2 in Definition 23.
Theorem 25.Let Σ be saturating.For every database D the size of chase(Σ, D) is at most rank(Σ)exponential in the size of D, and BCQ entailment is rank(Σ)-E T -complete for data complexity.

P
. Theorem 24 yields a rank(Σ)-exponential bound on the number of terms in chase(Σ, D).The number of atoms in chase(Σ, D), for fixed Σ, is polynomial in the number of terms.Since BCQ entailment can be decided over chase(Σ, D), the claimed rank(Σ)-E T upper bound follows.The lower bound can be shown by reduction from the word problem of -exponentially time bounded Turing machines (TMs).The simulation of TMs with Datalog rules is standard [19], using a strict total order for time steps and tape cells.To construct such an order of the required length, we expand the construction of Example 14 (i.e., the combined tgds from Examples 3 and 11) with the ։-cycle , we find a ։-cycle in the null forest.This contradicts the acyclicity of the null forest (Lemma 27).
With the additional root element L 0 , F therefore becomes a tree as required.
We make another small observation that was not included in the main text of the paper, but that is relevant to avoid Definition 28 from requiring further special cases.

P
. Consider some edge → in ˆ .Then ∈ ˆ .Let be the set of existential variables in the tgd that contains .Then every ′ ∈ also satisfies ′ ∈ ˆ , since otherwise ′ would be part of a distinct strongly connected component that would have the same or a greater rank than ˆ , contradicting the requirement that ˆ is the unique SCC of maximal rank (Definition 26).
The previous result clarifies possible uncertainty about the sets [ ] in Definition 28: even when applied to a match that only includes terms that are not in the null forest ˆ , the resulting set of fresh nulls is fully contained in ˆ .

P
. Let D 0 , D 1 , . . .be the chase sequence of chase(Σ, D).We show the claim for ( ) ∈ D by strong induction on > 0. For D 0 = D, ∉ ˆ so ∈ L 0 , and the claim holds since L 0 is the root of the term tree.
For the induction step D +1 , we only consider the case ∈ ˆ (the case ∉ ˆ works as before), and we show that L( ) ։ * L( ).Note that this implies ∈ ˆ , since there is no edge from L( ) to L 0 .

P
. We iteratively define for fresh nulls introduced during a run of Algorithm 1, and verify the claimed homomorphism property for each step.Since the outer loop does not matter here, we use I 0 , . . ., I ℓ to denote the entire sequence of values for I as they occur throughout the algorithm.
Initially, is the identity function on constants in Σ and D, which is a homomorphism from the initially empty set I 0 to chase(Σ, D).
Now by way of induction, assume that has been defined so that it is a homomorphism =0 I → chase(Σ, D) for some ≥ 0, and that a further tgd application with tgd and match is chosen in (L5).Since is a match for on I , we find a corresponding match ⊥ of on chase(Σ, D) where ⊥ ( ) = ( ( )) for all variables in the body of .Since this match ⊥ is satisfied in chase(Σ, D), it can be extended to a match + ⊥ such that head( ) + ⊥ ⊆ chase(Σ, D).Therefore, given the extended match + that is used in Algorithm 1 to apply , we define ( + ) for all existential variables in as ( + ) = + ⊥ .Then is a homomorphism Lemma 36.If Σ is arboreous and path-guarded with rank(Σ) > 0, and ≤ (|D|) for some rank(Σ)-exponential function , then there is a (rank(Σ) − 1)-exponential function such that Algorithm 1 runs in space (|D|), where 0-exponential means polynomial.

P
. Using binary encoding, the numbers ≤ ≤ (|D|) can be stored in (rank(Σ) − 1)exponential space.To show that I can be stored in (rank(Σ) − 1)-exponential space, note that the sets of T , other than the root, correspond to nodes L( ) of the term tree in the following sense: if T [ ] ( ∈ {1, . . ., |T |}) is the ( + 1)-th element in T (the first T [0] being the root), then there is a node L ∈ ∼ that is steps away from the root such that (T [ ]) ⊆ L, with as in Lemma 35.The terms in | T | =1 T [ ] are therefore always contained in a single path of the term tree.The length of paths of the null forest (and analogously in the term tree) are bounded by a (rank(Σ) −1)exponential function, as shown in the proof of Theorem 24.Indeed, that proof establishes the -depth of nulls in a strongly connected component is exponentially bounded in cxt and polynomially bounded in in .For rank( ) > 0, the latter corresponds to a ( in − 1)-exponential bound.The claim about ˆ follows since path lengths in the null forest corresponds to the ˆ -depth of nulls, and since rank(Σ) = rank( ˆ ) ≥ max{ in , cxt + 1} by Definition 23.
The size of the sets L( ) is polynomial in |D|, since L( ) only contains nulls from applications of non-ˆ -tgds, which do not have a dependency cycle, so that the polynomial data complexity of jointly acyclic tgds applies [31].Therefore, the size of This bound also applies to the initial value of T from (L2), so that after | | executions of loop (L3), the bound remains (rank(Σ) − 1)-exponential (note that | | is constant with respect to D).
With the overall set of available terms restricted by a (rank(Σ) − 1)-exponential bound in |D|, this bound carries over to the possible atoms in I throughout the computation.The final check in (L14) can also be performed in this space bound, e.g., by iterating over all possible variable bindings with respect to T .Lemma 37.If all tgd applications in a run of Algorithm 1 correspond to chase steps, and in each iteration is the canonical extension for the respective step, then is a homomorphism I * → chase(Σ, D) that is injective on all term sets T that occur during the run.

P
. Note that our requirements for "corresponding to a chase step" already include the injectivity of on the terms used in the premise.Nevertheless, the claim is still non-trivial, since the final iteration of each run of the inner loop in Algorithm 1 is not covered by the requirements, and since the algorithm, by virtue of being able to non-deterministically break the computation at any time, can certainly perform runs that satisfy the preconditions.In particular, the required injectivity holds for the initial term set 0 for which was defined as the identity.
We proceed by induction.Consider the tgd application that produces I +1 from I .By assumption, it corresponds to a chase step .Let be the set of existential variables in tgd[ ].
Now suppose for a contradiction that the canonical extension of in this step is not injective.By our definition, induces a bijection + → + [ ], and it is injective on T (a precondition for the application corresponding to step ).Hence, the supposed violation of injectivity requires that there is a null ∈ + and a term ∈ T such that ( ) = ( ).Since ( ) ∈ + [ ], must also be a null (constants are always mapped to themselves in ).By the assumption, has been defined through a series of canonical extensions, so the value ( ) was assigned in a previous tgd application that also corresponded to step (since ( ) ∈ + [ ] can be a fresh null only for this one step).Let + be the extended match used in this tgd application (it has to agree with on universal variables, but must use different nulls), hence ∈ + .But then was added to T in (L9) or (L11).In either case, the whole set + occurs in the same set of T that also contains , so that + ⊆ T .In this case, however, tgd[ ] + ⊆ I , so tgd[ ] is not applicable to obtain I +1 .A contradiction.
Lemma 39.If Σ is arboreous and path-guarded, is the sequence of chase steps obtained from the task tree with root 1, , then the length | | of is bounded by a rank(Σ)-exponential function.

P
. Let the knobbly term tree be obtained from the term tree by simultaneously replacing each node set L ∈ ∼ with the union L ∪ L ։L L that also includes all terms in the node's direct children.The tree structure otherwise remains the same, i.e., the term tree and the knobbly term tree are isomorphic.In particular, the length of paths in the knobbly term tree is bounded by a (rank(Σ) − 1)-exponential function, as observed for the term tree in the proof of Lemma 36.Now consider any path 1 , 1 • • • ℓ , ℓ in the task tree, and let denote the atoms of Definition 38 for every subtask , with 1 < ≤ ℓ.
For a path , let | denote the path of the initial nodes in .We claim that for all ∈ {2, . . ., ℓ}, path( )| −1 = path( ℓ )| −1 , i.e., the paths of atoms agree with the path of ℓ , except possibly for the lowest node (Claim ‡).This is trivial for = ℓ.Since |path • ( ℓ )| is bounded by a (rank(Σ) − 1)-exponential function, and since the term sets that constitute the nodes are still of constant size (being unions of terms generated by jointlyacyclic sets of tgds, cf.proof of Lemma 36), the cardinality of path • ( ℓ ) is also bounded by a (rank(Σ) − 1)-exponential function.But then, given the fixed signature of Σ, there are at most (rank(Σ) − 1)-exponentially many atoms that can play the role of (2 ≤ ≤ ℓ) in the above path, and since each atom is produced in just one chase step, the path corresponds to a (strictly decreasing) sequence of at most (rank(Σ) − 1)-exponentially many chase steps.
This shows that the depth of the task tree is bounded by a (rank(Σ) −1)-exponential function, so the size of the task tree (and of the induced sequence of steps) is bounded by a rank(Σ)-exponential function.
Lemma 40.If Σ is arboreous and path-guarded, is the sequence of chase steps obtained from the task tree for 1, , and ≥ | |, then Algorithm 1 can choose tgd applications according to .2) ensure that tgd [ [ ]] is applicable at step of Algorithm 1 if its head is not already satisfied in I −1 .Note that the latter case can only occur if the same tgd application has been performed before, since earlier chase steps < [ ] have not prevented the application of tgd[ [ ]] in chase(Σ, D).In this situation, we ignore step and continue immediately with the next choice + 1 (if any), and we let I = I −1 .Hence we also obtain (3).Now let ∈ {1, . . ., ℓ} and assume that the induction claim holds true for all ′ < .Consider an arbitrary atom as in the claim.Let := |path( )| be the depth of , and let be the chase step that produced ∈ chase(Σ, D).We claim that ∈ I −1 .
Case (i).If < depth[ ], then depth[ ] > 1.Let be the ancestor node of that is closest to (i.e., lowest in the task tree), such that depth[ ] ≤ .By Definition 38, [ ] > [ ] > , so has a child node with label , where ≥ .Then ∈ I by the induction hypothesis.Moreover, due to the traversal order of children of , all nodes between position and have depth > .This ensures that ∈ I −1 as required (we give a more detailed account of this argument for a slightly more general situation in Case (ii)).
Case (ii).If ≥ depth[ ], then task[ ] has a descendant node the task tree with label , .This is easy to see for = 1, since the path of depth 1 is unique, so that the condition path( ) ⊆ path ( ) in Definition 38 is tautological if |path( )| = 1.Hence the chase steps for atoms at depth 1 appear within a single path below (with chase steps of such atoms in decreasing order, largest first).
For > 1, we find a similar path below task .Care is needed since the condition in Definition 38 refers to the body path of the immediate parent node, which may not be the body path of , since only the nodes up to depth − 1 are stable yet.We therefore make the following observation: The earliest chase step that produces an atom such that |path( )| = and path( ) ⊆ path ( ) must be the step that introduced the set of nulls denoted [ ] in Definition 28, i.e., that initialised the node [ ] of path ( ) at depth .All other chase steps > that infer an atom with |path( )| = and path( ) ⊆ path ( ) have a frontier variable that is matched to a term in [ ].Therefore, [ ] is a node in path ( ), and every atom with path( ) ⊆ path ( ) also satisfies path( ) ⊆ path ( ).The chase steps that produce such atoms therefore form a sequence < 1 < . . .< , and we find an according path of tasks , → • • • → , 1 → , in the task tree.This finishes the argument that we find the claimed descendant node of .
Then < since the task tree is traversed in topological order.Let := tgd[ ], := [ ], and + := + [ ].By the induction hypothesis, the tgd application for this step = [ ] succeeded with a match = − ( ) 3 , and was performed with an extended match + = − ( + ).However, it is possible that the deletions in T between step and step were such that − (which is only defined locally) is not the same at both steps, hence we cannot yet conclude ∈ I but merely that ′ ∈ I for some variant of that might use different fresh nulls.We therefore show by induction that all intermediate steps with < < are such that, after Algorithm 1 has executed (L6), T has length ≥ .This shows that any fresh nulls of step and, otherwise, if = , Σ → contains the tgd: hpos( , ) ∧ next ( , ′ ) ∧ to ( , + ) → hpos( + , ′ ) (50) Finally, we define tgds to evaluate which configurations are accepting.The set of tgds Σ eval contains the following tgds, where we introduce further predicates acc and acc for each ∈ Δ.
1.For every state ∈ with ( ) = acc: q( , ) → acc( ) The tgds recursively mark configurations as accepting.Concretely, (51) directly marks all configuration with an accepting state, and (52) marks configurations with an existential state if they have an accepting successor.The tgds (53)-(55) mark a configuration with a universal state if all of its successors are accepting.To achieve this, the successors are traversed in an arbitrary but fixed order 1 , . . ., .Semantically, one could just combine these rules into one, but the above splitting ensures path-guardedness.Indeed, the relation of Definition 31 for the above sets of tgds contains to , 2 to , 3 and succ , 1 succ , 2 .Therefore, all tgds are path-guarded.To conclude, let Σ = Σ ≤ ∪ Σ init ∪ Σ + ∪ Σ → ∪ Σ eval .As Σ constructs and evaluates the configuration tree of M, we obtain that M accepts if and only if chase(Σ, D ) | = acc( 0 ).Since Σ is saturating with rank(Σ) = , arboreous, and path-guarded, this shows the claim.
A tgd is satisfied by I, written I | = , if all matches of on I are satisfied.A fact is satisfied in I, written I | = , if ∈ I. Satisfaction extends to sets of tgds and facts as usual.A tgd or fact is entailed by tgd set Σ and database D if I | = for all I with I | = Σ ∪ D.
where and are lists of fresh variables of length | | = | * | and | | = | * |, respectively; ˜ ℓ+ +1 is the renamed version of the final variable in Path( ); and the other mentioned variables stem from the atom sets Path(

1 1 2
• • • −1 −1 with 1 = = ; 2 , . . ., ∈ ; and an ¯ -path in the sense of Definition 13 for 1 ≤ < .For 1 ≤ ≤ , let ( ) The number of instantiations of and nulls with var( ) = is bounded by (in ) | | .This bound is in -exponential and rank( ) = in , so the claim holds.Case conf( ) ≥ 1.Let denote the set .The -depth of nulls has an upper bound that is polynomial in cxt .Indeed, consider an arbitrary chain of nulls with var( ) ∈ as in Definition 21.By Lemma 16, for every → ∈ with head ∃ , .[ , , , ] of the corresponding tgd , all applications of in use a different instantiation of .By Definition 13 (2), variables in can only be instantiated with values from the ≤ + cxt many -inputs for which has an incoming edge to .Hence, there are at most ( + cxt ) | | applications of in .Using to denote the maximal number of frontier variables in any tgd of , there are at most | | • ( + cxt ) applications of -tgds in .Now by Definition 13 (1), the number of consecutive tgd applications in that correspond to ¯ -edges is at most | ¯ |.Hence, the overall length of is bounded by | ¯ | • (| | • ( + cxt ) + 1).This bound is polynomial in cxt , since | ¯ |, | | and are fixed by Σ.Therefore, the -depth of nulls is at most polynomial in cxt , and hence bounded by an cxt -exponential function.