Containment of Graph Queries Modulo Schema

With multiple graph database systems on the market and a new Graph Query Language standard on the horizon, it is time to revisit some classic static analysis problems. Query containment, arguably the workhorse of static analysis, has already received a lot of attention in the context of graph databases, but not so in the presence of schemas. We aim to change this. Because there is no universal agreement yet on what graph schemas should be, we rely on an abstract formalism borrowed from the knowledge representation community: we assume that schemas are expressed in a description logic (DL). We identify a suitable DL that capture both basic constraints on the labels of incident nodes and edges, and more refined schema features such as participation, cardinality, and unary key constraints. Basing upon, and extending, the rich body of work on DLs, we solve the containment modulo schema problem for unions of conjunctive regular path queries (UCRPQs) and schemas whose descriptions do not mix inverses and counting. For two-way UCRPQs (UC2RPQs) we solve the problem under additional assumptions that tend to hold in practice: we restrict the use of concatenation in queries and participation constraints in schemas.


INTRODUCTION
Graph databases are today a mainstream technology with numerous applications in areas such as biology, social sciences, and logistics [35].For example, in bioinformatics graph databases are commonly used to represent protein, cellular, and drug networks [25,32].Existing graph query languages, such as SPARQL or Cypher, and the upcoming Graph Query Language standard [19,24] are navigational: they are build around regular path queries (RPQs) that allow one to test whether two nodes are related by a path of edges specified by a regular expression [1,6,13,14].Popular extensions of RPQs include: two-way RPQs (2RPQs), which can traverse edges forward and backwards; conjunctive RPQs (CRPQs), which are the closure of RPQs by conjunction and projection; and UC2RPQs, which combine both above extensions with closure under union.These formalisms have been widely studied in multiple classical contexts, one of which is static analysis.Arguably, the fundamental static analysis problem is query containment; that is, checking whether Fig. 1.Schema of a financial graph database a query necessarily yields a subset of the result of another query.It was showed long ago that the containment problem for UC2RPQs is ExpSpace-complete [13], improving a previously established ExpSpace upper bound for CRPQs [23].More recently, this result has been refined by investigating the containment problem of practical subclasses of CRPQs [22], or by identifying conditions under which the UC2RPQ-containment problem becomes tractable [21].However, none of these works considers containment of navigational queries in the presence of schemas.A notable exception is the work by Deutsch and Tannen [20] in which expressive fragments of first-order logic are used as schema languages; the technical contribution of this work are decidability results, but no tight complexity bounds are provided.The only other exception we are aware of is a very recent work on graph database transformations [8], which shows as a side result that containment of UC2RPQs in acyclic UC2RPQs modulo schemas expressed in the Horn fragment of the description logic ALCIF is in 2EXPTIME.
The scarcity of work on graph query containment in the presence of schemas is perhaps explained by the fact that there is no agreement yet on what graph schemas should be.Recently, however, a community proposal called PG-Schema for the property graph data model has been put forward [2], leveraging an earlier constraint language called PG-Keys [3].Motivated by this, we revisit the problem of containment of UC2RPQs modulo schema.Following the lead of Boneva et al. [8], we work with schemas expressed in description logics; specifically, in a logic called ALCQI which extends the basic Boolean-complete description logic ALC with the ability to count (Q) and see edges backwards (I) [5].Description logics are the go-to formalism for conceptual modelling in the knowledge representation community, conveniently capturing ER models and UML class diagrams [4,7,15,16].ALCQI specifically has been advocated as a good choice [7,15] and is well aligned with the core features of PG-Schema: over graphs with single labels on edges, ALCQI captures PG-Types (the core of PG-Schema) and a practically relevant subset of PG-Keys, including participation, cardinality, and unary key constraints (properties can be handled via reification).Among alternatives, ALUNI [16] cannot handle arbitrary combinations of node labels and DL-Lite [4] cannot talk about both endpoints of an edge simultaneously.
Example 1.1.Schema S in Figure 1 represents a conceptual data model where each customer owns at least one credit card; some credit cards are premier cards and earn rewards for purchases from partner retail companies and their subsidiaries; each premier card participates in at most 3 rewards programs.In Section 2 we show how to express this in ALCQI.We retrieve customers and partners from which they earn rewards using  1 (, ) = (Owns • Earns • Partner • Owns * ) (, ) and  2 (, ) = (Owns•Earns•Partner) (, )∧ ()∧Owns * (, ); we use • for concatenation and * for unbounded iteration.Without schema,  2 is contained in  1 , but not conversely; modulo S,  1 is contained in  2 as well.
Using description logics as an abstract schema specification language allows us to reuse some ideas and techniques, but only up to a point.The main obstacle is the attitude to infinity: while database theory focuses on finite structures, knowledge representation traditionally embraces infinite models.For containment of navigational queries these two worlds are far apart: even for very limited logics, the answer depends on whether infinite graphs are allowed.Hence, previous results on query containment modulo constraints expressed in description logics [12,17] do not apply in our case Our contribution.We solve the containment modulo schema for the following combinations of query and schema languages: (1) UCRPQs and either ALCIor ALCQ, (2) simple UC2RPQs and ALCQ, (3) UC2RPQs and ALCQI without participation constraints, where 'simple' essentially means no concatenation in regular expressions.In all cases the problem is in 2EXPTIME; in items ( 1) and ( 2) the bound is tight by earlier results discussed below.We leave open the more general combination of UC2RPQs and ALCQI: handling backward edges turned out to be rather subtle, both in queries and in schemas.However, the combinations we support already provide the expressivity needed in many settings.Item (2) is particularly useful.On one hand, recent studies of query logs [9,10] show that a vast majority of queries are simple.On the other hand, the combination allows capturing some backward constraints by reversing the edges, because the query language supports backward edges; for example, one-to-many relationships can be supported.
Our approach, detailed in Section 3, relies on a non-trivial reduction of query containment to a related problem of finite entailment, asking if a given query is satisfied in every finite graph that extends a given graph and satisfies a given schema.Then, we extend significantly the limited available results on finite entailment [18,27,28] to be able to cover the combinations of schema and query languages we aim at (Sections 5-6).As a technical highlight, let us point out a general method of building structures that avoid a given UC2RPQ (Section 4), akin to the large-girth method of avoiding conjunctive queries [33], based on a novel construction of the coil.

PRELIMINARIES
Graphs.We fix a recursively enumerable set Γ of node labels, and an recursively enumerable set Σ of edge labels.We model graph databases as labeled directed graphs in which nodes can have multiple labels, while edges have a single label.Parallel edges are allowed, as long as they have different labels.We present such graphs as relational structures over unary relation symbols Γ and binary relation symbols Σ.That is, a graph  is a pair dom(), •  where dom() is the set of nodes of  and the function •  maps each  ∈ Γ to a set   ⊆ dom() and each  ∈ Σ to a binary relation   ⊆ dom() × dom().Graph  is finite if dom() is finite and   and   are empty for all but finitely many  ∈ Γ and  ∈ Σ. Graph  is a subgraph of graph  ′ , written  ⊆  ′ , if dom() ⊆ dom( ′ ),   ⊆   ′ and   ⊆   ′ for all  ∈ Γ and  ∈ Σ.A homomorphism ℎ from graph  to graph  ′ , written ℎ :  →  ′ , is a function ℎ : dom() → dom( ′ ) such that  ∈   iff ℎ() ∈   ′ and (, ) ∈   implies ℎ(), ℎ() ∈   for all ,  ∈ dom(),  ∈ Γ, and  ∈ Σ; that is, ℎ preserves the absence of node labels .We use Ā for complement node labels: we say a node has label Ā iff it does not have label  and let Ā = dom( A type is a subset of Γ ± that contains at most one of  and Ā for all  ∈ Γ.A type over Γ 0 is a type that is a subset of Γ ± 0 .A node  is of type  in  if  ∈   for all  ∈ .Thus, every node is of type ∅ and for every homomorphism ℎ :  →  ′ , if  is of type  in  then ℎ() is of type  in  ′ .A graph realizes type  if it contains a node of type .A graph  respects a set Θ of types if each node in  is of some type from Θ.We use  − for inverse edges and let Queries.Let V be an enumerable set of variables.We work with conjunctive two-way regular path queries (C2RPQs) of the form (2) for all  ∈ {1, . . ., }, either ℓ  ∈ Σ ± and (3) the word ℓ 1 . . .ℓ  matches the regular expression .
We say that  is satisfied in  and write  |=  if there is a match of  in .Owing to homomorphisms preserving complement node labels, if  |=  and  maps homomorphically to  ′ then  ′ |= .
We also use unions of C2RPQs (abbreviated as UC2RPQs) represented as sets of C2RPQs  = { 1 , . . .,   } and extend the notion of satisfaction to UC2RPQs in the natural fashion.By (unions of) conjunctive regular path queries, abbreviated as (U)CRPQs, we mean (U)2CRPQs that do not use labels from Σ − in regular expressions.By (two-way) regular path queries, abbreviated as (2)RPQs, we mean binary atoms of C(2)RPQs.A query is test-free if it does not use labels from Γ ± in regular expressions.A query is simple if it only uses regular expressions of the forms  and Following [28], we sometimes work with UC2RPQs represented by means of a (nondeterministic) semiautomaton [26] A = (, Δ, ) where  is a finite set of states, Δ ⊆ Γ ± ∪ Σ ± is a finite alphabet, and  ⊆  × Δ ×  is the transition relation.A semiautomaton is essentially a nondeterministic finite automaton without initial and final states; a run of a semiautomaton A over a word  is defined just like for a nondeterministic finite automaton, except that it can begin in any state and there is no notion of accepting runs.Under this representation, (2)RPQs are atoms of the form A , ′ (,  ′ ) where ,  ′ ∈  are states of A. In the definition of a match we rephrase item (3) as follows: (3') some run of A over ℓ 1 . . .ℓ  begins in  and ends in  ′ .
Each UC(2)RPQ  can be effectively rewritten as a UC(2)RPQ  ′ expressed by means of a (nondeterministic) semiautomaton A of size linear in the total size of regular expressions in , by replacing each regular expression in  with A , ′ for some states ,  ′ of A. Simple UC(2)RPQs correspond to disjoint unions of single-edge automata and single-state automata, with Δ ⊆ Σ ± Description logics.We work with graph properties expressed in the description logic ALCQI(and its fragments) [5].In description logics, elements of Γ and Σ are called concept names and role names, respectively.ALCQIallows building more complex concepts with the following grammar: where  ∈ Γ ± ,  ∈ Σ ± , and  ∈ N. We extend the interpretation function •  to complex concepts as follows: We also use additional operators that are redundant but useful when defining fragments; for brevity we introduce them as syntactic sugar: In the logic ALCQ we disallow using inverse roles: in expressions of the form ∃ ≤  .(and all derived expressions), we require that  ∈ Σ.In ALCI, we disallow counting: in expressions ∃ ≤  .we require  = 0; this corresponds to allowing only ∃ .and ∀ ..If both above restrictions are imposed, we obtain the logic ALC.
A TBox in each of these description logics can be normalized; that is, up to introducing auxiliary concept names, it can be expressed equivalently in the same logic using only CIs of the forms  ⊑  ,  ⊑ ∃ .,  ⊑ ∀ .,  ⊑ ∃ ≤  .,  ⊑ ∃ ≥  ., where ,  ∈ Γ ± ,  ∈ Σ ± ,  is ⊤ or an intersection of concept names and complement concept names, and  is ⊥ or a union of concept names and complement concept names (see e.g.[29,Prop. 1]).
Example 2.1.The following CIs capture key features of the schema S from Example 1.1 (with self-explanatory abbreviations): In order to fully capture schema S we also need to specify that only the depicted relationships are allowed.For instance, PremCC ⊑ ∀earns.RwrdProg , RwrdProg ⊑ ∀earns − .PremCC , and similarly for partner and owns.We must also ensure that entities do not overlap, unless explicitly allowed by generalization relationships; for instance, The use of inverse role earns −1 can be avoided, by flipping the CI to the contrapositive: PremCC ⊑ ∀earns.RwrdProg.

CONTAINMENT AND ENTAILMENT
We focus on the Boolean variant of the containment problem.Given queries  and  and a TBox T , we write  ⊆ T  if  |=  implies  |=  for every finite graph  that satisfies T .In the problem of containment modulo schema the input is , , and T , and the question to decide is whether  ⊆ T .
Without loss of generality we can focus on the containment of a connected C2RPQ in a union of connected C2RPQs; by a slight abuse of terminology, we call a UC2RPQ connected if it contains only connected C2RPQs.We also assume that the TBox is normalized, as explained in Section 2.

TBoxes without participation constraints
To warm up, let us see how to solve the containment problem for TBoxes that do not use participation constraints; that is, CIs of the form  ⊑ ∃ .and  ⊑ ∃ ≥  ..We rely on a simple model property established in [17] in the context of unrestricted satisfiability of C2RPQs in the presence of an ALCIF TBox.We use a reformulation of the simple model property given by Boneva et al. [8] in terms of Lee and Streinu's sparse graphs [31] has label  and an  -successor without label , so does its image in .CIs of the form  ⊑ ∃ ≤  .carry over because if a node in   has label  and more than   -successors with label , so does its image in .Consequently, when deciding  ⊆ T  for T without participation constraints, it suffices to look for | |-sparse counterexamples.This is easy because every | |-sparse graph is a tree up to removing at most | | + 1 edges and reversing edges so that they point away from the root.Assuming that the endpoints of each removed edge are indicated with unique markers, and reversed edges are suitably labelled, we can construct a tree automaton recognizing trees resulting from -sparse counterexamples.To solve the containment problem, we test if the language recognized by the automaton is empty.Theorem 3.2.Containment of UC2RPQs modulo ALCQI TBoxes without participation constraints is in 2EXPTIME.

The entailment problem
We approach containment modulo schemas with participation constraints through a related problem of entailment.
Given a finite graph , a TBox T , and a query , we say that  is entailed by  and T , written as , T |= , if  ′ |=  for every graph  ′ such that  ′ |= T and  ⊆  ′ .We say that  is finitely entailed by  and T , written as , T |= fin , if the above holds for every finite graph  ′ .The finite entailment problem is to decide if , T |= fin  for given , T , and .(The traditional formulation uses a finite set of ground facts, called the ABox, instead of .) Finite entailment can be seen as a special case of containment modulo schema, via the wellknown correspondence between conjunctive queries and graphs.As we will see, under certain assumptions on the TBox, also the converse reduction is possible.This will allow us to leverage existing knowledge on the finite entailment problem.The following results will be relevant.Theorem 3.3.Finite entailment is 2EXPTIME-complete for the following combinations of query languages and description logics: (1) test-free UCRPQs and ALC [28], (2) UCQs with transitive atoms and ALCI or ALCQ [27].The lower bound in (2) holds already for ALC.
Recall that test-free means no labels from Γ ± in regular expressions.UCQs with transitive atoms are simple UCRPQs that only use regular expressions of the form  and  + with  ∈ Σ.

Main results
Our main results are summarized in the following theorem.Theorem 3.4.Containment modulo schema is 2EXPTIME-complete for the following combinations of query and schema languages: (1) UCRPQs and either ALCIor ALCQ, (2) simple UC2RPQs and ALCQ.
The 2EXPTIME lower bound of Theorem 3.3 carries over immediately to the containment of simple CRPQs modulo ALC TBox; the lower bounds in Theorem 3.4 follow.In order to establish the upper bounds, in the reminder of this section we show how to reduce containment modulo schema to finite entailment.Then, in Sections 4-6, we solve the finite entailment problem in the three cases necessary to establish Theorem 3.4.

From containment to entailment
We give an algorithm for containment modulo schema that uses finite entailment as an oracle.As for TBoxes without participation constraints, we first prove a simple countermodel property and then show how to decide if a simple countermodel exists.
We build upon Theorem 3.1.Essentially, we show that if we start from a graph  that satisfies T , then   can be extended to a simple graph that satisfies T and still maps homomorphically to , provided that T is an ALCI or ALCQ TBox.A graph  is star-like if it consists of  disjoint graphs  1 , . . .,   (called peripheral parts) and a graph  0 (called the central part) that shares exactly one node with each   for  = 1, . . .,  and the shared node has identical labels in both parts.In the lemma below, illustrated in Fig. 2, the central part corresponds to   and the peripheral parts are copies of  attached to provide witnesses for the participation constraints.Proof.Let  be a counter-model.By Theorem 3.1 there is a | |-sparse graph   that satisfies  and locally embeds into  via a homomorphism ℎ.For each node  in   , each role  ∈ Σ ± involved in a participation constraint in T , and each  -successor  of ℎ() in  that is not an image via ℎ of an  -successor of  in   , extend   by adding a fresh copy of  and an  -edge from  to the copy of .Let  be the resulting graph with the copies of  treated as peripheral parts.Graph  still maps homomorphically to , as ℎ extends naturally.Hence,  ̸ |= .It is easy to verify that  |= T : while one of the  − -successors of the copy of  is duplicated, we know that  is involved in a participation constraint.If T is an ALCI TBox, the duplication cannot be detected, because the logic does not count.If T is an ALCQ TBox, then  − ∈ Σ − , so the logic does not see  − -successors at all.The remaining three conditions from the statement of the lemma hold as well, by construction.□ To decide if such a graph  exists, we separate the existence of the central part from the existence of suitable peripheral parts.To this end, we replace global conditions  |= T and  ̸ |=  with local conditions over parts of  .We refer to this technique as factorizing the TBox T and the query .
To factorize , we replace it with a UC2RPQ  such that (1)  is factorized; that is,  holds in a star-like graph iff it holds in any of its parts.
Of course, we cannot expect  to be equivalent to .We use fresh node labels in  and ensure that (2)  holds in a graph  iff  holds in every graph  equal to  up to fresh node labels in .
This is sufficient to replace the condition  ̸ |=  in Lemma 3.5 with For  we can take the union of the following CRPQs:  2) can be constructed in exponential time, while ensuring that  is a union of C2RPQs of polynomial size, and if  is simple or one-way, so is .
Proof.We begin by developing some auxiliary notions.A pointed C2RPQ (, ) is a connected C2RPQ  along with a distinguished variable  ∈ var ().We say (, ) matches in graph  at node  if there is a match  of  in  such that  () = .Consider a match of a pointed C2RPQ (, ) in a star-like graph  with central part  0 and peripheral parts  1 ,  2 , . . .,   , with  matched in  0 .Intuitively, the match breaks (, ) down into fragments matched in the respective parts of .However, unlike for conjunctive queries, these fragments are not simply subsets of atoms of , because paths witnessing 2RPQs can move back and forth between parts of .
Let us split each 2RPQ A , (, ) in  into three 2RPQs, for fresh variables  ′ and  ′ , corresponding to three segments of the path witnessing A , (, ) in : the maximal prefix within a peripheral part, the middle segment, and the maximal suffix within a peripheral part.(If either of the segments is empty, adjust the split accordingly.)For each shared node, replace all variables mapped to it with a fresh variable.For  = 1, 2, . . ., , consider the pointed C2RPQ (  ,   ) collecting all atoms witnessed entirely within   , with   being the variable mapped to the shared node in   , and the pointed C2RPQ ( 0 ,  ′ ) collecting all remaining atoms (if none are left, take A , ( ′ ,  ′ ) for any ), with  ′ being either  itself or the variable that replaced it.Note that each (  ,   ) matches in   at the shared node, whereas ( 0 ,  ′ ) need not match in  0 as some of its 2RPQs may be witnessed by paths detouring into peripheral parts.
A unary factor of (, ) is any pointed C2RPQ that can be obtained as (  ,   ) for some match of (, ) in a star-like graph, as well as any pointed query of the form A , ′ (, ),  .
We aim to discover matches of (, ) based on information about matches of its unary factors (, ), encoded in fresh node labels  , dubbed permissions.Let ( ′ ,  ′ ) be obtained from ( 0 ,  ′ ) above by adding atoms    ,  (  ) for  = 1, 2, . . .,  and extending the underlying semiautomaton with 'shortcut' transitions from state  to state  ′ over node label  A , ′ (,), , to account for detours.A central factor of (, ) is any pointed C2RPQ that can be obtained as ( ′ ,  ′ ) for some match of (, ) in a star-like graph.
A unary factor of a connected C2RPQ  is a unary factor of any (, ) with  ∈ var ().Factors are connected C2RPQs, and a unary factor of a unary factor of  is a unary factor of  itself.In the case of simple C2RPQs, detours to peripheral parts are pointless so we keep the underlying semiautomaton unchanged, thus ensuring that factors are also simple.
We can now define  for a connected UC2RPQ  as the union of queries where (, ) is a unary factor of some  ∈  and ( ′ ,  ′ ) is a central factor of (, ), and queries where  ∈  and  ∈ var ().Note that  is connected, and if  is simple or one-way, so is .Given , one can compute  in exponential time, and each C2RPQ in  has polynomial size.It is routine, if a bit tedious, to check that  satisfies conditions (1) and (2).□ Factorizing the TBox is easy.Let T 0 be T with all participation constraints dropped.Because we assume that each peripheral part of  satisfies T , we can replace the condition  |= T with • the central part of  satisfies T 0 and each of its nodes satisfies all participation constraints in T unless it is shared with a peripheral part.
The only delicate point is that CIs of the form  ⊑ ∃ ≤  .with  ∈ Σ carry over from the parts of  to the whole  .This is the case because, by the additional condition for ALCQ in the third item of Lemma 3.5, all outgoing edges of any node belong to a single part of  .Note that for ALCQI this would not work, as we would also have to deal with incoming edges.Let Tp T ,  be the set of maximal types over labels used in T and , realized in finite graphs that satisfy T but not .We can now reformulate the criterion from Lemma 3.5 as follows:  ⊈ T  iff there is a | |-sparse graph  0 such that  0 |= ,  0 |= T 0 ,  0 ̸ |= , and each node violating a participation constraint from T is of some type from Tp T ,  , and has only one incident edge and, in the ALCQ case, no outgoing edges.Given Tp T ,  , we can adapt the automata-based argument behind Theorem 3.2 to decide if such  0 exists.It remains to compute Tp T ,  .For that we note that  ∈ Tp T ,  iff   , T ̸ |= fin , where   is a graph consisting of a single isolated node of type .That is, we can compute Tp T ,  by solving an instance of finite entailment for each maximal type over labels used in T and .This completes the reduction of containment modulo schema to finite entailment.Note that we have actually showed that it suffices to solve a variant of finite entailment that asks if a given type  ∈ Tp T ,  can be realized in a finite graph  such that  |= T and  ̸ |= .

ASSEMBLING COUNTERMODELS
In Section 3, we solved containment modulo schema using a reduction to finite entailment, relying on a star-like countermodel property and on factorizing queries and TBoxes.Our approach to finite entailment is similar, but we reduce to simpler instances of the same problem, and countermodels have richer structure.In this section we prepare our tools.We begin with concrete frames, which will be used in Sections 5-6 to represent and manipulate complex decompositions of countermodels.Next, we lift our method of factorizing queries from star-like graphs to graphs represented by frames (TBoxes are handled in Sections 5-6).Finally, we pass to abstract frames, which will be used in Sections 5-6 to reduce a complex instance of finite entailment to multiple simpler ones.
In what follows we use the notion of a pointed graph, which is just a graph with a distinguished node.Two pointed graphs are considered isomorphic if they are isomorphic as graphs and the isomorphism preserves the distinguished node.

Concrete frames
A concrete frame is a finite graph without self-loops whose nodes represent disjoint components of a graph and edges represent edges between these components.More precisely, each frame node  is labelled with a pointed graph   with distinguished node   , and each edge originating in a frame node  is labelled with a pair (,  ) where  ∈ dom(  ) and  ∈ Σ ± .We assume that dom(  ) ∩ dom(  ) = ∅ whenever  ≠ , and that different edges with labels (,  ) and (, ) have different targets.For every frame node  and node  ∈ dom(  ) we define   , as the pointed graph obtained as follows.For the distinguished node we take  with labels inherited from   .For each edge from  to  with label (,  ) we add to   , the distinguished element   of   (with labels inherited from   ) and add an  -edge from  to   .If  =  − ∈ Σ − , this results in an -edge from   to .We call graphs   the components and   , the connectors of the frame.While components are arbitrary pointed graphs, connectors are very simple pointed graphs with a single edge between the distinguished node and each non-distinguished node, no loops, and no edges between non-distinguished nodes.Every concrete frame  represents a graph   , obtained by taking the union of all its components and connectors.Edges in   that result from connectors are called frame edges; a frame edge and the corresponding edge in  may have opposite directions.
A concrete frame  realizes a type  if the distinguished node of some component of  is of type .Clearly, if  realizes type , so does   .

Factorizing queries over concrete frames
We aim to replace the condition   ̸ |=  by a local property of frame  , depending exclusively on the set of isomorphism types of the components and connectors of  .In particular, we cannot depend on how the components and connectors are arranged in  .
Let  be a connected UC2RPQ Proof.We proceed by induction on the depth of the tree.Suppose   |= .If  has depth 0, the unique component of  is equal to   , so it satisfies .Otherwise,   is a star-like graph whose central part is the root component   0 of  and each peripheral part is a star-like graph whose central part is a connector   0 , for some  in   0 , and the peripheral parts are pointed graphs represented by subtrees of  rooted at children of  0 .Applying twice the fact that  is factorized, we conclude that  is satisfied either in   0 , or in   0 , for some  ∈   0 or in the graph represented by a subtree  ′ of  rooted at a child of  0 .In the first two cases we are done.In the third case, by the induction hypothesis,  is satisfied in a component or connector of  ′ , and we are done too.□ In general, concrete frames that weakly refute  need not actually refute , but some can be restructured to actually refute .To achieve this, we apply a novel general graph-theoretical construction which we describe next.

The coil
Our aim is to unravel a given graph sufficiently, without making it infinite.Indeed, the coil construction involves a bounded-recall (or sliding-window) unravelling of a graph.We think of paths as sequences of nodes and edges.Unless specified otherwise, paths are directed.
For a graph  and  ≥ 0, let Paths(, ) denote the set of paths (not necessarily simple) of length at most  in , including paths of length 0 consisting of a single node.For a node  in , let Paths(, , ) denote the set of paths in Paths(, ) that originate in .
For a node  in a graph  and  > 0, the graph Unravel(, , ) is a tree with nodes Paths(, , ) and an edge (,  ′ ) whenever  ′ is an extension of  by one edge.The label of a node  in Unravel(, , ) is inherited from the last node of the path , and the label of an edge (,  ′ ) is inherited from the last edge of the path  ′ .
By the -suffix of a path  we mean the suffix of length  of  if  has length at least , and the whole path  otherwise.For a graph  and  > 0, Coil(, ) is the graph with nodes Paths(, ) × {0, . . ., } and an edge ((, ℓ), ( ′ , ℓ ′ )) whenever ℓ ′ ≡ ℓ + 1 (mod ( + 1)) and  ′ is the -suffix of an extension of  by one edge.The label of a node (, ℓ) in Coil(, ) is inherited from the last node of , and the label of an edge ((, ℓ), ( ′ , ℓ ′ )) is inherited from the last edge of the path  ′ .
The coil has three key properties.Let ℎ  : Coil(, ) →  map a node (, ℓ) in Coil(, ) to the last node on .
Proof.That ℎ  is a homomorphism follows directly from the construction of Coil(, ).Surjectivity is witnessed by nodes (, 0) for paths  of length 0 consisting of just one node.□ Property 2. For every node  in Coil(, ), the subgraph induced by all nodes reachable from  by paths of length at most  − 1 is isomorphic to Unravel(,  − 1, ℎ  ()).
Proof.We begin with two preparatory observations.We can extend the function ℎ  to also map an edge ((, ℓ), ( ′ , ℓ ′ )) in Coil(, ) to the last edge on the path  ′ ; note that it is an edge from ℎ  ((, ℓ)) to ℎ  (( ′ , ℓ ′ )) in .Then, for every node  in Coil(, ), ℎ  induces a bijection between the edges outgoing from  and the edges outgoing from ℎ  ().This follows straight from the construction: for a node  in , a path  ending in , and an edge  outgoing from , there is a unique -suffix of the extension of  with the edge .
We can also extend ℎ  to a length-preserving mapping from paths in Coil(, ) to paths in .By construction, for each edge  = ((, ℓ), ( ′ , ℓ ′ )) in Coil(, ), we have ℓ ′ ≡ ℓ + 1 (mod ( + 1)), and  ′ is the -suffix of the extension of  with the edge ℎ  ().In consequence, for a path  from (, ℓ) to ( ′ , ℓ ′ ) of length  in Coil(, ), we have ℓ ′ ≡ ℓ +  (mod ( + 1)), and  ′ is the -suffix of the path obtained by concatenating  with ℎ  ().Now, consider a node  ′ = ( ′ , ℓ ′ ) reachable from  = (, ℓ) by a path  of length at most  in Coil(, ).Then, by the second observation above, the length of  is  = (ℓ ′ − ℓ) mod ( + 1) and ℎ  () is the -suffix of  ′ .In the light of the first observation above, this means that  is a unique path of length at most  from  to  ′ .It follows that the subgraph of Coil(, ) induced by the set of nodes reachable from  by paths of length at most  − 1 is a tree.Using again the first observation above we show easily that this tree is isomorphic to Unravel(,  − 1, ℎ  ()) via the mapping  ′ ↦ → ℎ  ().□ For a node (, ℓ) in Coil(, ), we refer to the value ℓ as the level of the node.A subgraph of Coil(, ) visits level ℓ if it contains a node of level ℓ.Property 3. Every connected subgraph  of Coil(, ) that visits  ≤  levels maps homomorphically to Unravel(,  − 1, ) for some node  in .
Proof.Since  does not visit all levels and is connected, there exists a unique level ℓ 0 such that  visits level ℓ 0 , but does not visit level (ℓ 0 − 1) mod ( + 1).For each node (, ℓ) in  , let us call the value (ℓ − ℓ 0 ) mod ( + 1) the  -level of (, ℓ).The  -levels of nodes in  range from 0 to  − 1.A mapping  from (, ℓ) to the suffix of  of length equal to the  -level of (, ℓ) is the required homomorphism.For each edge ((, ℓ), ( ′ , ℓ ′ )) in  ,  ′ is the -suffix of the extension of  with one edge; since the  -level of (, ℓ) is not , (, ℓ) is a prefix of ( ′ , ℓ ′ ); thus, since  is connected, there exists a unique node  in  such that () begins in  for all nodes  in  .It is straightforward to verify that  is indeed a homomorphism from  to Unravel(,  − 1, ).□

Avoiding UC2RPQs
We aim to restructure a given concrete frame that weakly refutes  into one that actually refutes , while preserving all local properties of the frame, such as weakly refuting a query.(We shall see later that all other properties required of countermodels amount to local properties of frames).We call two concrete frames locally isomorphic if the sets of isomorphism types of their components and connectors are equal.Locally isomorphic concrete frames share local properties.Lemma 4.3 below allows us to avoid UC2RPQs, just as the large-girth method does for conjunctive queries [33].While in the case of conjunctive queries the construction depends only on the size of the queries, for UC2RPQs we rely on a more refined measure, defined below.
An undirected path in the graph   represented by a concrete frame  induces an undirected path in  .The span of an undirected path in frame  is the maximum absolute difference between the number of edges traversed forward and backward by an infix of the path.The span of a 2RPQ in  is the maximum span of undirected paths induced in  by paths witnessing the 2RPQ in   .Lemma 4.3.If a concrete frame  weakly refutes a connected UC2RPQ  such that all 2RPQs in  have bounded span in  , then some frame locally isomorphic to  actually refutes .
Proof.For a frame  and  > 1, let   be obtained from Coil(, ) by relabelling to ensure that all components have disjoint domains: for each node  in Coil(, ) with label   , change its label to a fresh isomorphic copy G of   , and change labels in all outgoing edges from (,  ) to ( ṽ,  ), where ṽ is the copy of  in G .Properties 1 and 2 ensure that   is a frame and that it is locally isomorphic to  .The notion of levels of nodes carries over from Coil(, ) to   .
The homomorphism ℎ  : Coil(, ) →  induces a homomorphism h :    →   that preserves frame edges; that is, an edge in    is a frame edge iff it is mapped by h to a frame edge in   .Claim 1.The span of a 2RPQ in   is bounded from above by its span in  .
Proof of Claim 1. Suppose that a witnessing path in    induces an undirected path  in   .Because   differs from Coil(, ) only in the labelling of nodes and edges, we can view  as an undirected path in Coil(, ); this change of perspective does not affect the span of .Applying h to the witnessing path in    , we obtain a witnessing path in   such that the induced undirected path  ′ in  can be obtained from  by applying ℎ  .As the span of a path is preserved by homomorphisms, it follows that  and  ′ have the same span.This proves that the span of the 2RPQ in  bounds its span in   from above.□ Consider a match of a C2RPQ  in    along with witnessing paths for all 2RPQs.Let  be the subgraph of   built from all nodes  such that some witnessing path contains a node in   or some variable is mapped to a node in   , and all edges corresponding to the frame edges in    traversed by some witnessing path.We call  a match of  in   .Note that  is a frame and   |= .Claim 2. If  is connected and consists of  2RPQs, each of which has span in   bounded by , then every match  of  in   visits at most  + 1 levels in   .
Proof of Claim 2. The set of levels visited by an undirected path in   of span at most  is of the form {ℓ +  mod  :  = 0, . . .,  } for some level ℓ and  ≤ .Because  is connected,  is a connected union of  undirected paths of span at most .Consequently, the set of levels visited by  is of the form {ℓ +  mod  :  = 0, . . .,  } for some level ℓ and  ≤ .□ We are now ready to prove the lemma.Let  = max{|| :  ∈  } and let  > 0 be an upper bound on the span in  of all 2RPQs occurring in .We will show that  +1 actually refutes .

Abstract frames
An abstract frame  over Γ  ⊆ Γ is essentially a specification of a concrete frame: it is a finite graph without self-loops, just as a concrete frame, and its nodes and edges still represent components of a graph and edges between them, but the representation is symbolic rather than concrete.Instead of a pointed graph   , each frame node  holds an abstract specification of a pointed graph, consisting of a set Θ  of maximal types over Γ  to be respected, a distinguished type   ∈ Θ  to be realized, a TBox T  to be satisfied, and a query   to be avoided.We call (  , T  , Θ  ,   ) an (abstract) component of  .
Each edge originating in frame node  is labelled with a pair (,  ) for some  ∈ Θ  and  ∈ Σ ± .Intuitively, it represents multiple edges originating in nodes of type .As in a concrete frame, different edges originating in  , labelled with (,  ) and (, ) for any ,  ∈ Σ ± , must have different targets.Connectors   , are defined like for concrete frames, with  replaced by , except that we need to materialize the type  and the types   : we use fresh nodes  and   .
A witnessing graph for component (  , T  , Θ  ,   ) is any finite pointed graph  such that  respects Θ  , the distinguished element of  is of type   ,  |= T  , and  ̸ |=   .An abstract frame  represents a concrete frame  ′ if  ′ can be obtained from  by replacing each (  , T  , Θ  ,   ) with an arbitrary witnessing graph   , and each edge labelled (,  ) from  to  with edges labelled (,  ) from  to , for  ranging over all nodes of type  in   .(Note that connector   , is then isomorphic to   , for each node  of type  in   .)An abstract frame  represents a graph  if it represents a concrete frame  ′ such that   ′ = .
An abstract frame  realizes a type  if  ⊆   for some component (  , T  , Θ  ,   ) of  .Clearly, if  realizes type , then so does every graph and every concrete frame represented by  .
An abstract component is productive if it has a witnessing graph.An abstract frame is productive if all its components are productive.Each productive abstract frame represents at least one concrete frame and at least one graph.Testing productivity of an abstract component is essentially a special case of finite entailment; testing if an abstract frame is productive leads to multiple instances of finite entailment.
To lift the notion of weakly refuting from concrete to abstract frames we rephrase the first condition as • for each component (  , T  , Θ  ,   ), if a graph does not satisfy   , then it refutes ; that is,   contains  in the query containment sense; and additionally require that Γ  includes all node labels used in .It follows that if an abstract frame weakly refutes , so do all concrete frames it represents.

ENTAILMENT OF ONE-WAY QUERIES
In this section we establish the following result.The lower bound is inherited from finite entailment of simple UCRPQs in ALC [27].The upper bound for ALCQ follows by eliminating tests from the query by encoding the type of each node in the label of each outgoing edge and careful inspection of the proof for ALC [28].Here, we show the upper bound for ALCI.As noted in Section 3, for Theorem 3.4 it suffices to decide if a given type  (over labels used in T and ) can be realized in a finite graph that satisfies T and Proc.ACM Manag.Data, Vol. 2, No. 2 (PODS), Article 77.Publication date: May 2024.refutes .As finite entailment is a special case of containment modulo schema, this also suffices for Theorem 5.1.
The general idea is to decompose countermodels into components in which it is enough to reason exclusively about forward (outgoing) edges or exclusively about backward (incoming) edges; this will ultimately allow us to reduce finite entailment in ALCI to finite entailment in ALC.In order to confine RPQs to a limited number of components, while at the same time providing both forward and backward witnesses required in the ALCI TBox, we shall alternate between forward and backward components.

Alternating frames
To distinguish forward and backward components we use a fresh node label  → .For the sake of symmetry, we refer to C→ as  ← .In any graph, nodes with label  → are called forward and those with label  ← are called backward.A concrete frame is alternating if • every connector   , is directed, that is, all its edges are directed from backward nodes to forward nodes and either  → or  ← occurs only in the distinguished node.For abstract frames we replace the first condition with • for every component (  , T  , Θ  ,   ), either each type in Θ  contains  → or each type in Θ  contains  ← .In a graph represented by an alternating frame, components have only incoming or only outgoing frame edges.Hence, an RPQ can only traverse a single frame edge; that is, its span is at most 1.

Factorizing ALCI TBoxes over alternating frames
In order to separate reasoning about forward and backward edges, we shall require that all forward witnesses of forward nodes be provided in components and all backward witnesses in connectors, and symmetrically for backward nodes.This leads to the following definition.
Let T → be an ALC TBox obtained from T by dropping all participation constrains involving inverse roles, that is, CIs of the form  ⊑ ∃ − .,and flipping CIs involving universal restrictions over inverse roles:  ⊑ ∀ − . is replaced by B ⊑ ∀ .Ā.The TBox T ← is defined symmetrically.Note that while T ← is not an ALC TBox, it uses only inverse roles.Hence, by treating inverse roles as role names, we can turn it into an ALC TBox.An alternating concrete frame satisfies T if For an alternating abstract frame  , we replace the first item with • for each component (  , T  , Θ  ,   ), either T  entails T → and each type in Θ  contains  → , or T  entails T ← and each type in Θ  contains  ← , and additionally require that Γ  includes all concept names used in T .As connectors in alternating frames are directed and CIs are suitably flipped in T ← and T → , the second item implies that each connector satisfies all CIs of the form  ⊑ ∀ .or  ⊑ ∀ − . in T .
In the definition above we provide all backward witnesses to forward nodes and forward witnesses to backward nodes in connectors, even if some are also provided in components.This is correct because ALCI cannot detect duplicate witnesses.Lemma 5.2.If an alternating frame  satisfies an ALCI TBox T , so does each graph  represents.

Finite entailment by way of frames
We are now ready to characterize finite entailment in terms of abstract frames.Lemma 5.3.For a unary type , an ALCI TBox T , and a connected UCRPQ , type  is realized in a finite graph that satisfies T and refutes  iff  is realized in a productive alternating abstract frame that satisfies T and weakly refutes .[Proof: A.1] Using Lemma 5.3, we can compute the set of such types  using a greatest fixed-point procedure (a variant of type elimination [30,34]).Testing productivity of the abstract components of the witnessing frames is reduced to finite entailment of test-free CRPQs in ALC [28] (see Appendix A.2).

ENTAILMENT OF TWO-WAY QUERIES
We now move to two-way queries and show the following theorem.Theorem 6.1.Finite entailment of simple UC2RPQs in ALCQ is 2EXPTIME-complete.
The lower bound follows from Theorem 3.3.Here we establish the upper bound.As in Section 5, we work with the variant of finite entailment that asks if a type  can be realized in a finite graph that satisfies a TBox T and refutes a simple connected UC2RPQ .
The overall strategy is to successively reduce the number of role names used in the TBox until none are left.The case with no roles is easy (see Appendix B.1).In order to reduce the number of roles by one, we perform an intermediate step that neutralizes certain atoms in the query.For Σ 0 ⊆ Σ, by a Σ 0 -reachability atom we mean an atom ( Let Σ T be the set of role names used in the input TBox T .Intuitively, we want to reduce an instance of the problem to multiple instances that do not involve Σ T -reachability atoms, and then reduce each of those to multiple instances with fewer role names in the TBox.However, rather than modifying the query, we modify the problem.We say that a graph  refutes  modulo Σ 0 -reachability if  ̸ |=  mod Σ 0 where  mod Σ 0 is the query obtained from  by dropping all Σ 0 -reachability atoms.The problem we will be solving, dubbed finite entailment modulo Σ 0 -reachability, is to decide if a type  can be realized in a finite graph that satisfies T , respects Θ, and refutes  modulo Σ 0 -reachability, where Σ 0 ⊇ Σ T .(We recover the original problem by taking Θ = {∅} and Σ 0 = Σ T ∪ { } for some fresh role name  .)The intermediate step will amount to replacing Σ 0 -reachability with Σ T -reachability.Next, we explain how to factorize ALCQ TBoxes over frames, which will be needed in both steps.

Factorizing ALCQ TBoxes
Let T be an ALCQ TBox.Let Γ T be a set of fresh concept names  ,, for each role name  and concept name  involved in an at-least or at-most restriction in T , and each  ≤  where  is one plus the maximal number used in T .Let T = be the TBox obtained from T by dropping all CIs involving roles, and adding  ,, ⊑ ∃ ≥  .for each  ,, ∈ Γ T ,  ,, ⊑ ∃ ≤  .for each  ,, ∈ Γ T with  <  , C,, ⊑ ∃ ≤−1  .for each   ,, ∈ Γ T .
For every graph that satisfies all CIs from T that do not involve an at-least restriction, there is a unique way to place labels  ,, that makes T = satisfied.In particular, there is one for every subgraph of a graph that satisfies T .We also define and require that Γ  includes all node labels used in .
An abstract frame respects a set Θ of types, if for each component (  , T  , Θ  ,   ), every type from Θ  contains a type from Θ. Lemma 6.3.Consider a type , an ALCQ TBox T , a set Θ of types, a connected simple UC2RPQ , and Σ 0 ⊇ Σ T .Type  is realized in a finite graph that satisfies T , respects Θ, and refutes  modulo Σ 0 -reachability iff  is realized in a productive abstract frame that is a tree, satisfies T , respects Θ, and weakly refutes  modulo (Σ T , Σ 0 )-reachability.[Proof: B.2] Witnessing abstract frames from Lemma 6.3 can be constructed bottom-up from suitable productive abstract components (Appendix B.3). Testing productivity of an abstract component amounts to finite entailment modulo Σ T -reachability, which we handle next.

Entailment modulo Σ T -reachability
This time we decompose countermodels into components that use one role name fewer.In different components we eliminate different role names, in a round-robin fashion, ensuring that the span of every simple 2RPQ is at most |Σ T |, unless it is a Σ T -reachability atom.
Let  1 ,  2 , . . .,   be an enumeration of Σ T .For convenience, let  +1 =  1 .For each  ∈ Σ T , let   be a fresh concept name.By an  -node we mean a node that has label   and labels C for all  ∈ Σ T \ { }.A concrete frame is role-alternating if • every component   satisfies for some  ∈ Σ T ; • every connector   , is role-directed, that is, for some , the distinguished node is an   -node, all remaining nodes are  +1 -nodes, and all edges are   -edges originating in the distinguished node.For abstract frames we replace the first condition with • for every component (  , T  , Θ  ,   ), there is  ∈ Σ T such that each  ∈ Θ  contains   , all  0,, ∈ Γ T , and all C with  ∈ Σ T \ { }.Being role-alternating is a local property of a frame: it is inherited by locally isomorphic frames.Lemma 6.4.For every role-alternating concrete frame  , the span in  of a simple 2RPQ that is not a Σ T -reachability atom, is at most |Σ T |. [Proof: B.4] Lemma 6.5.Consider a type , an ALCQ TBox T , a set Θ of types, and a connected simple UC2RPQ .Type  is realized in a finite graph that satisfies T , respects Θ, and refutes  modulo Σ T -reachability iff  is realized in a productive role-alternating abstract frame that satisfies T , respects Θ, and weakly refutes  modulo (Σ T , Σ T )-reachability.[Proof: B.5] The existence of witnessing abstract frame from Lemma 6.5 can be decided by a greatest fixedpoint procedure, as in Section 5 (see Appendix B.6). Testing productivity of the involved abstract components reduces to an instance of finite entailment modulo Σ T .Each component of a rolealternating abstract frame that satisfies T , effectively forbids some role  indicated by the concept name   present in the type of the distinguished node.We can eliminate role  entirely from the TBox, arriving at an instance of finite entailment modulo Σ T with one role fewer.Apart from that, the reduction only affects the type  to realize and the set Θ of allowed types, which must account for newly added concept names from Γ T .The size of Γ T is exponential in the size of the original TBox (not the current TBox T that is the result of previous reductions).This allows us to use a recursive call to our decision procedure, without additional blowup (see Appendix B.7).

DISCUSSION
In this paper we have made significant progress on the graph query containment problem modulo schemas.Along the way, we also extended existing results on finite entailment of graph queries in expressive description logics.We believe that our methods can be adapted to handle reasoning about parallel edges with different labels and to multiple labels over edges, possibly at the cost of increased complexity.Handling full ALCQI schemas, on the other hand, is much more challenging.Similarly, handling full UC2RPQs, even for the basic logic ALC seems to require new ideas.Besides this, it would also be interesting to develop techniques that are better suited for implementation, since our current approach relies on automata-and type-based techniques which are always worst-case complexity.Another direction is ontology-mediated query containment [11] of navigational queries.In this setting, it would be already challenging to consider queries that only allow reachability.that are realized in a component of such a frame, not necessarily in the distinguished node.Indeed, for a type  realized by a node  in a component   , we can add a fresh frame node  labelled with   with the distinguished node changed to , copying all the outgoing edges from  , and the result will still be an alternating concrete frame that satisfies T and weakly refutes .
For a set Ψ of maximal types over Γ 0 , let Ψ → and Ψ ← denote the sets of types from Ψ that contain  → and  ← , respectively.The set Ψ T,  is the greatest (wrt.inclusion) set Ψ of maximal types over Γ 0 such that for each  ∈ Ψ: • if  ∋  → , then the component (, T → , Ψ → , ) is productive, and there is a directed connector refuting  whose distinguished node is of type  and satisfies T ← , and whose nondistinguished nodes are of types from Ψ ← ; • if  ∋  ← , then the component (, T ← , Ψ ← , ) is productive, and there is a directed connector refuting  whose distinguished node is of type  and satisfies T → , and whose nondistinguished nodes are of types from Ψ → .
Note that, for a fixed , the conditions are monotone w.r.t.Ψ: that is, if they hold for Ψ, they also hold for each Ψ ′ ⊇ Ψ.Consequently, we can compute Ψ T,  by a simple greatest fixed point procedure.For  = 0, 1, 2, . . ., we iteratively compute sets Ψ  of maximal types  over Γ 0 that satisfy the conditions above with Ψ = Ψ  −1 , starting from the set Ψ 0 of all maximal types over Γ 0 .We stop when Ψ +1 = Ψ  and let Ψ T,  = Ψ  .Then, it suffices to check whether Ψ T,  contains a type  such that  ⊇ .
For a given  and Ψ = Ψ → ∪ Ψ ← , the conditions can be verified in 2EXPTIME.Indeed, testing if an abstract component is productive amounts to solving the corresponding instance of finite entailment; it either is an instance with ALC TBox, or it can be easily turned into one, as we have explained in the main part of the paper.(In the latter case, we also need to adjust the query: reverse the transitions in the underlying semiautomaton and replace each atom A , ′ (,  ′ ) with A  ′ , ( ′ , ).)Moreover, we can eliminate tests from , by encoding node types in the labels of the outgoing edges.Hence, we can apply the algorithm from [28].While this algorithm has doubly exponential complexity, it is in fact only doubly exponential in the maximal size of any involved CRPQ.Each of our instances involves a union of exponentially many CRPQs of polynomial size, and a TBox of exponential size (due to the elimination of tests).It follows that the overall complexity of each productivity test is still doubly exponential.The existence of a suitable connector for  and Ψ = Ψ → ∪ Ψ ← can also be decided in 2EXPTIME.Indeed, observe that it suffices to consider connectors with at most one non-distinguished node for each participation constraint in T .There are doubly exponentially many such connectors (despite the number of available types already being doubly exponential).Checking that a given connector refutes  amounts to evaluating  over the connector, which can be done in time  (|  | •   •  •  • ), where  is the maximal size of a CRPQ in  (linear in the maximal size of a CRPQ in ),  is size of the connector, and  is the size of the semiautomaton underlying .Checking the remaining conditions is straightforward.Hence, each iteration of the greatest fixed point procedure takes doubly exponential time.The number of iterations is also at most doubly exponential, because we begin from a doubly exponential set of types, and in each iteration at least one type is eliminated.

B PROOFS FOR SECTION 6: ENTAILMENT OF TWO-WAY QUERIES B.1 No roles
We need to solve the following problem: given a type , a TBox T mentioning no roles, a set Θ of types, a connected simple UC2RPQ , and Σ 1 ⊆ Σ, decide if  is realized in a graph that satisfies T , has only nodes of types from Θ, and does not satisfy  mod Σ 0 .Because T does not mention any roles, we can restrict our attention to graphs consisting of a single isolated node.Moreover, we can assume that the graph uses only node labels mentioned in , T , Θ, and .As the number of labels is exponential, the number of such graphs is doubly exponential.Checking if one such graph is a witness we are looking for can be done in time exponential in the size of the input.
Proof.Just like for , we can prove that  mod Σ 0 is factorized.Given that, the claim follows immediately from Lemma 4.2 for  =  mod Σ 0 .□ Proof of Lemma 6.3.For the left-to-right implication, consider a graph  that realizes , satisfies T , has only nodes of types from Θ, and refutes  modulo Σ 0 -reachability.Without loss of generality we can assume that  only uses edge labels from Σ T .
For each node  in  let   be a pointed graph obtained by taking a fresh copy of the strongly connected component of  that contains , with  as the distinguished node.Consider a graph  0 with a node   labelled with   for each node  in .Whenever there is an  -edge from  to , add an edge from   to   with label (  ,  ) for every  whose SCC in  contains , with   being the copy of  in   .Graph  0 need not be a tree.In fact, it may not even be a concrete frame, because it may contain parallel edges that share the first component of the label.However, because  0 is acyclic, we can turn it into a tree by unravelling it from any component whose distinguished node is of type  (acyclicity ensures that the unravelling is finite), and adjusting the labels so that all components are disjoint.The resulting graph  is a concrete frame and also a tree.It obviously realizes  and all its components only contain nodes of types in Θ.In each component of  we add labels  ,, in the unique way that ensures that  satisfies T .As the added labels are fresh, this does not affect Θ.
Up to labels  ,, , every component of  is isomorphic to an SCC in , and every connector of  is isomorphic to the one-step unravelling of a subgraph of  formed by a node and all its successors.It follows that all components and connectors refute  modulo Σ 0 -reachability.Moreover, because each component of  is strongly connected via edges with labels from Σ T ⊆ Σ 0 , it actually refutes  modulo Σ T -reachability.That is,  refutes  modulo (Σ T , Σ 0 )-reachability.
To conclude the proof of the left-to-right implication, we turn  into a suitable abstract frame  ′ as in the proof of Lemma 5.3, with the following differences.For Γ  ′ we take the set of node labels used in , T , Θ, and , plus all node labels of the form  ,, used in T = and T + .For each frame node  , we let T  = T = and   =  mod Σ T .
For right-to-left implication, take such a productive abstract frame and consider some concrete frame  it represents.Obviously,   realizes  and contains only nodes of types from Θ.Because  refutes  modulo (Σ T , Σ 0 )-reachability and Σ T ⊆ Σ 0 ,  also refutes  modulo (Σ 0 , Σ 0 )-reachability.By Lemma B.1,   refutes  modulo Σ 0 -reachability.By Lemma 6.2,   |= T .This completes the proof of the right-to-left implication.□ B.3 Finding witnesses guaranteed by Lemma 6.3 It remains to test if an abstract frame satisfying the conditions from Lemma 6.3 exists.This time we use a least-fixed-point algorithm.We compute the set Ψ T,Θ, of types of distinguished elements in components of abstract frames satisfying the conditions of Lemma 6.3.This set coincides with the set of all unary types that occur in interpretations represented by such abstract frames.We

Lemma 3 . 5 .
For every connected C2RPQ , UC2RPQ , and ALCI or ALCQ TBox T ,  ⊈ T  iff there is a finite star-like graph  such that  |= T ,  ̸ |= , and • the central part of  is | |-sparse and satisfies , • all peripheral parts of  satisfy T , • in the central part, shared elements have only one incident edge and, in the ALCQ case, no outgoing edges.
. A finite connected graph  with  nodes and  edges is -sparse for an integer  ≥ −1 if  ≤  + .A graph  locally embeds into graph  if there exists a homomorphism ℎ :  →  such that for all  ∈ Σ ± and (,  1 ), (,  2 ) ∈   , if  1 ≠  2 then ℎ( 1 ) ≠ ℎ( 2 ).For every connected C2RPQ  and every graph  that satisfies  there exists a | |-sparse graph   that satisfies  and locally embeds into .Theorem 3.1 is the key to the containment problem.Indeed, for every UC2RPQ , if  ̸ |= , then   ̸ |=  because   maps homomorphically into .Moreover, for every ALCQI TBox T without participation constraints, if  |= T then   |= T .Indeed, let us fix a homomorphism witnessing that   locally embeds into .With participation constraints forbidden, T can only contain CIs of three forms:  ⊑ ,  ⊑ ∀ .,and  ⊑ ∃ ≤  ..CIs of the form  ⊑  carry over from  to   because homomorphisms preserve types of nodes.CIs of the form  ⊑ ∀ .carry over because if a node in Intuitively,  detects if label   is missing in a node  -reachable from  or label   is missing in a node from which  is  -reachable, or some node has both label   and label   .Suppose that all edges in the graph  in Figure2have label  .Then,  ̸ |= .Let  be obtained from  by adding label   to all nodes reachable from the unique node with label , and label   to all nodes from which the unique node with label  can be reached.Then,  ̸ |= .Now, let  be any graph obtained from  by adding labels   and   in some nodes.Suppose that  satisfies  2 =   () ∧  * (, ) ∧ C ().Then, on the path witnessing this there is an edge from a node with label   to a node without label   .It follows that some part of  satisfies  2 .For other disjuncts of  the argument is either analogous or trivial.Hence, if  |=  then some part of  satisfies .
) ∧  * (, ) ∧   (), C () ∧ () .Note that complement node labels are crucial in this construction.Lemma 3.7.Given a connected UC2RPQ , a connected UC2RPQ  satisfying conditions (1) and ( . A graph  refutes  if  ̸ |=  (in particular,  ̸ |= ).A concrete frame  weakly refutes  if • each component   refutes ; that is,   ̸ |= ; • each connector   , refutes ; that is,   , ̸ |= .We say that  actually refutes  if   refutes .'Weakly refuting ' is a local property and it works for frames that are trees.Lemma 4.1.If a concrete frame is a tree and weakly refutes a connected UC2RPQ , then it actually refutes .Lemma 4.1 follows immediately from the lemma below applied to .Lemma 4.2.Let  be a factorized UC2RPQ and let  be a concrete frame that is a tree.Then   |=  iff some component or connector of  satisfies .
Towards a contradiction, suppose that   +1 |= .Then there is a match  of some  from  in  +1 .Since  is connected, so is  and  .By Claims 1-2,  visits at most  + 1 out of the  + 2 levels in  +1 .Let  Because  is a tree, by Lemma 4.1,  actually refutes .That is,   ̸ |= .The obtained contradiction shows that   +1 ̸ |= .□ ′ be the subgraph of Coil(,  + 1) corresponding to  (equal to  up to relabelling).By Property 3,  ′ maps homomorphically to Unravel(, ,  ) for some  in  .Let  be a frame obtained from Unravel(, ,  ) by relabelling to ensure that all components have disjoint domains, as was done for  +1 .It follows that   maps homomorphically to   , which means that   |= .But at the same time,  is locally isomorphic to a subframe of  , so  weakly refutes .
T + as the TBox obtained from T by replacing each  ⊑ ∃ ≤  .with+canbenormalizedat the cost of introducing additional concept names, polynomially many in the size of T and  .A concrete frame satisfies an ALCQ TBox T if• each component   satisfies T = ;• in each connector   , , the distinguished node satisfies T + and has no incoming edges.For abstract frames we rephrase the first condition as• for each component (  , T  , Θ  ,   ), T  entails T = .Lemma 6.2.If a frame satisfies an ALCQ TBox T , each graph it represents satisfies T .From Σ 0 -reachability to Σ T -reachabilityWithout loss of generality we can assume that countermodels only use role names from Σ T .The general idea of this step is to decompose countermodels into strongly connected components.Within each component, all Σ T -reachability atoms are trivially satisfied, so if the component refutes  modulo Σ 0 -reachability, it also refutes  modulo Σ T -reachability (because Σ T ⊆ Σ 0 ).Replacing Σ 0 -reachability with Σ T -reachability makes the conditions imposed on witnessing countermodels more demanding, but we are guaranteed to find such countermodels if any countermodels exist.A delicate aspect here is that we will be testing the existence of suitable components by invoking a simpler instance of the decision problem, and there is no way to ensure that they are strongly connected.The point is, however, that this is not needed: any component, strongly connected or not, that refutes  modulo Σ T -reachability, also refutes  modulo Σ 0 -reachability (provided Σ T ⊆ Σ 0 ).This leads to the following strengthening of the notion of weakly refuting.For a connected simple UC2RPQ  and Σ 1 ⊆ Σ 2 ⊆ Σ, a concrete frame weakly refutes  modulo (Σ 1 , Σ 2 )-reachability if• each component   refutes  modulo Σ 1 -reachability; that is,   ̸ |=  mod Σ 1 ; • each connector   , refutes  modulo Σ 2 -reachability; that is,   , ̸ |=  mod Σ 2 .For an abstract frame  we rephrase the first condition as• for each component (  , T  , Θ  ,   ),   contains  mod Σ 1 in the query containment sense; Proc.ACM Manag.Data, Vol. 2, No. 2 (PODS), Article 77.Publication date: May 2024.T