Evaluating Datalog over Semirings: A Grounding-based Approach

Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring $\sigma$, what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over $\sigma$: first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over $\sigma$. We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.


INTRODUCTION
Datalog is a recursive query language that has gained prominence due to its expressivity and rich applications across multiple domains, including graph processing [43], declarative program analysis [42,44], and business analytics [23].Most of the prior work focuses on Datalog programs over the Boolean semiring (this corresponds to the standard relational join semantics): popular examples include same generation [5], cycle finding, and pattern matching.Many program analysis tasks such as Dyck-reachability [31,32,41], context-free reachability [36], and Andersen's analysis [4] can also be naturally cast as Datalog programs over the Boolean domain.However, modern data analytics frequently involve aggregations over recursion.Seminal work by Green et al. [24] established the semantics of Datalog over the so-called -relations where tuples are annotated by the domain of a fixed semiring.Recently, Abo Khamis et al. [28] proposed Datalog • , an elegant extension of Datalog, and established key algebraic properties that governs convergence of Datalog • .The authors further made the observation that the convergence rate of Datalog can be studied by confining to the class of naturally-ordered semirings (formally defined in Section 2).
A parallel line of work has sought to characterize the complexity of Datalog evaluation.The general data complexity of Datalog is P-complete [11,20], with the canonical P-complete Datalog program: ( 1 ) ← ( 1).
(1) Some fragments of Datalog can have lower data complexity: the evaluation for non-recursive Datalog is in AC 0 , whereas evaluation for Datalog with linear rules is in NC and thus efficiently parallelizable [3,45].However, all such results do not tell us how efficiently we can evaluate a given Authors' addresses: Hangdong Zhao, University of Wisconsin-Madison, Madison, USA, hangdong@cs.wisc.edu;Shaleen Deep, Microsoft Gray Systems Lab, USA, shaleen.deep@microsoft.com;Paraschos Koutris, University of Wisconsin-Madison, Madison, USA, paris@cs.wisc.edu;Sudeepa Roy, Duke University, Durham, USA, sudeepa@cs.duke.edu;Val Tannen, University of Pennsylvania, Philadelphia, USA, val@seas.upenn.edu.
Datalog program.As an example, the program (1) can be evaluated in linear time with respect to the input size [22].A deeper understanding of precise upper bounds for Datalog is important, given that it can capture practical problems (all of which are in P) across various domains as mentioned before.
Unfortunately, even for the Boolean semiring, the current general algorithmic techniques for Datalog evaluation typically aim for an imprecise polynomial bound instead of specifying the tightest possible exponent.Semi-naïve or naïve evaluation only provides upper bounds on the number of iterations, ignoring the computational cost of each iteration.For example, program (1) can be evaluated in at most ( ) iterations, but the cost of an iteration can be as large as Θ( ).Further, going beyond the Boolean semiring, obtaining tight bounds for evaluating general Datalog programs over popular semirings (such as absorptive semirings with a total order which are routinely used in data analytics [39,40]) is unclear.In particular, proving correctness of program evaluation is not immediate and the impact of the semiring on the evaluation time remains unknown.
Endeavors to pinpoint the exact data complexity for Datalog fragments have focused on the class of Conjunctive Queries (CQs) [26,29,37] and union of CQs [8], where most have been dedicated to develop faster algorithms.When recursion is involved, however, exact runtimes are known only for restricted classes of Datalog.Seminal work of Yannakakis [50] established a ( 3 ) runtime for chain Datalog programs (formally defined in Section 2.3), where is the size of the active domain.Such programs have a direct correspondence to context-free grammars and capture a fundamental class of static analysis known as context-free reachability (CFL).When the chain Datalog program corresponds to a regular grammar, the runtime can be further improved to ( • ) 1 .An ( 3 ) algorithm was proposed for the Datalog program that captures Andersen's analysis [36].Recently, Casel and Schmid [9] studied the fine-grained complexity of evaluation, enumeration, and counting problems over regular path queries (also a Datalog fragment) with similar upper bounds.However, none of the above techniques generalize to arbitrary Datalog programs.
Our Contribution.In this paper, we ask: given a Datalog program over a naturally-ordered semiring , what is the tightest possible runtime as a function of the input size and the size of the active domain ?Our main contributions are as follows.
We propose a general, two-phased framework for evaluation of over .The first phase grounds into an equivalent system of polynomial equations .Though constructing groundings naïvely is rather straightforward, we show that groundings of smaller size are attainable via tree decomposition techniques.The second phase evaluates the least fixpoint of over the underlying semiring .We show that finite-rank semirings and absorptive semirings with total order, two routinelyused classes of semirings in practice, admit efficient algorithms for least fixpoint computation.We apply our framework to prove state-of-the-art and new results for practical Datalog fragments (e.g.linear Datalog).
Further, we establish tightness of our results by demonstrating a matching lower bound on the running time (conditioned on the popular min-weight ℓ-clique conjecture) and size of the grounded program (unconditional) for a class of Datalog programs.

PRELIMINARIES 2.1 Datalog
We review standard Datalog [1] that consists of a set of rules over a set of extensional and intensional relations (simply referred to as EDB and IDB respectively, henceforth).EDBs correspond to relations in a given database, each comprising a set of EDB tuples, whereas IDBs are derived by 1 denotes the size of the input data and denotes the size of the active domain for a Datalog program throughout the paper.
the rules.The head of each rule is an IDB, and the body consists of zero or more EDBs and IDBs as a conjunctive query defining how the head IDB is derived.We illustrate with the example of transitive closure on a binary EDB denoting edges in a directed graph, a single IDB , and two rules: In standard Datalog, IDBs are derived given EDBs (equivalently, evaluation is over the Boolean semiring), whereas as discussed in Section 1, the evaluation can also be considered over other semirings where the EDBs are annotated with elements of that semiring.
-complete & -continuous Semirings.An -chain in a poset is a sequence 0 ⊑ 1 ⊑ . . .with elements from the poset.A naturally-ordered semiring is -complete if every (infinite) -chain We say that a semiring has rank if every strictly increasing sequence has rank at most .Any semiring over a finite domain has constant rank (e.g.B has rank 1).

Datalog over Semirings
Next we describe the syntax of Datalog over a semiring .Here EDBs are considered as -relations, i.e., each tuple in each EDB is annotated by an element from the domain of .Tuples not in the EDB are annotated by 0 implicitly.Standard relations are essentially B-relations.When a Datalog program is evaluated on such -relations, IDB -relations are derived.Here, conjunction (∧) is interpreted as ⊗, and disjunction as ⊕ of , as discussed in [24].
Sum-product Queries.The Datalog program (2) has two rules of the same head, thus an IDB tuple can be derived by either rule (i.e. a disjunction over rules).Following Abo Khamis et al. [28], we combine all rules with the same IDB head into one using disjunction.Hence the program under a semiring is written as: where ⊕ corresponds to alternative usage (or disjunction) [24] and the implicit ∃ 3 in the second rule of ( 2) is made explicit by 3 .Let ℓ be the number of variables 1 , 2 , • • • , ℓ in a Datalog program as in Eqn.(3).For ⊆ [ℓ], where [ℓ] = {1, 2, . . ., ℓ}, x denotes the set of variables { | ∈ }.In the rest of the paper, w.l.o.g., we consider a Datalog program as a set of rules of distinct IDB heads, where each rule has the following form: Here x denotes the set of head variables.The body of a rule is a sum (i.e., ⊕) over one or more sum-product queries (defined below) 1 (x ), 2 (x ), . . .corresponding to different derivations of the IDB (x ) over a semiring .Formally, a sum-product (in short, sum-prod) query over has the following form: (x ) : where (1) x is the set of head variables, (2) each is an EDB or IDB predicate, and (3) ( [ℓ], E) is the associated hypergraph of with hyperedges E ⊆ 2 [ℓ ] and vertex set as For every Datalog program in the sum-product query form, we will assume that there is a unique IDB that we identify as the target (or output) of the program.We use arity( ) to denote the maximum number of variables contained in any IDB of a program .Monadic, Linear & Chain Datalog.We say that a Datalog program is monadic if every IDB is unary (i.e.arity( ) = 1); a Datalog program is linear if every sum-prod query in every rule contains at most one IDB (e.g., the program in Eq. ( 3) is linear).A chain query is a sum-prod query over binary predicates as follows: ( 1 , +1 ) : A chain Datalog program is a program where every rule consists of one or a sum of multiple chain queries.A chain Datalog program corresponds to a Context-Free Grammar (CFG) [50].Least Fixpoint Semantics.The least fixpoint semantics of a Datalog program over a semiring is defined as the standard Datalog, through its immediate consequence operator (ICO).An ICO applies all rules in exactly once over a given instance and uses all derived facts (and the instance) for the next iteration.The naive evaluation algorithm iteratively applies ICO until a least fixpoint is reached (if any).In general, naive evaluation of Datalog over a semiring may not converge, depending on .Kleene's Theorem [13] shows that if is -continuous and the semiring iscomplete, then the naive evaluation converges to the least fixpoint.Following prior work [16,24], we assume that is -continuous and -complete.
-equivalent.Two Datalog programs , are -equivalent if for any input EDB instance annotated with elements of a semiring , the targets of and are identical -relations when least fixpoints are reached for both.
Parameters & Computational Model.We use to denote the sum of sizes of the input EDBrelations to a Datalog program (henceforth referred to as the "input size is ").We use to denote the size of the active domain of EDB -relations (i.e., the number of distinct constants that occur in the input).If arity( ) = , then ≤ • .We assume data complexity [47], i.e. the program size (the total number of predicates and the variables) is a constant.We use the standard word-RAM model with (log )-bit words and unit-cost operations [10] for all complexity results.hides poly-logarithmic factors in the size of the input.

FRAMEWORK AND MAIN RESULTS
This section presents a general framework to analyze the data complexity of Datalog programs.We show how to use this framework to obtain both state-of-the-art and new algorithmic results.
A common technique to measure the runtime of a Datalog program is to multiply the number of iterations until the fixpoint is reached with the cost of each ICO evaluation.Although this method can show a polynomial bound in terms of data complexity, it cannot generate the tightest possible upper bound.Our proposed algorithmic framework allows us to decouple the semiringdependent fixpoint computation from the structural properties of the Datalog program.It splits the computation into two distinct phases, where the first phase concerns the logical structure of the program, and the second the algebraic structure of the semiring.Grounding Generation.In the first phase, the Datalog program is transformed into a -equivalent grounded program, where each rule contains only constants.A grounding of a grounded Datalog program is a system of polynomial equations where we assign to each grounded EDB atom a distinct coefficient in the semiring and to each grounded IDB atom a distinct variable .Our goal in this phase is to construct a -equivalent grounding of the smallest possible size | |.The size of is measured as the total number of coefficients and variables in the system of polynomial equations.Grounding Evaluation.In the second phase, we need to deal with evaluating the least fixpoint of over the semiring .The fixpoint computation now depends on the structure of the semiring , and we can completely ignore the underlying logical structure of the program.Here, we need to construct algorithms that are as fast as possible w.r.t. the size of the grounding | |.For example, it is known that over the Boolean semiring, the least fixpoint of can be computed in time (| |) [22].

Grounding Generation
The first phase of the framework generates a grounded program.
Example 3.1.We will use as an example the following variation of program (3), over an arbitrary semiring : We introduced an unary atom to capture properties of the nodes in the graph.Consider an instance where contains two edges, ( , ) and ( , ), and contains three nodes, , , .One possible -equivalent grounded Datalog program (over any semiring ) is: This corresponds to the following system of polynomial equations: As there is a one-to-one correspondence between a grounded program and its grounding, we will only use the term grounding from now on.The naive way to generate a -equivalent grounded program is to take every rule and replace the variables with all possible constants.However, this may create a grounding of very large size.Our key idea is that we can optimize the size of the generated grounding by exploiting the logical structure of the rules.Upper Bounds.Our first main result (Section 5) considers the body of every sum-product query in the Datalog program is acyclic (formally Definition 5); we call such a program rulewise-acyclic T 3.2.Let be a rulewise-acyclic Datalog program over some semiring , with input size , active domain size , and arity( ) ≤ .Then, we can construct a -equivalent grounding in time (and has size) A direct result of the above theorem is that for monadic Datalog, where arity( ) = 1, we obtain a grounding of size ( ), which is essentially optimal.The main technical idea behind Theorem 3.2 is to construct the join tree corresponding to each rule, and then decompose the rules following the structure of the join tree.
It turns out that, in analogy to conjunctive query evaluation, we can also get good bounds on the size of the grounding by considering a width measure of tree decompositions of the rules.We show in Section 6 the following theorem, which relates the grounding size to the maximum submodular width (Section 6) across all rules.T 3.3.Let be a Datalog program where arity( ) ≤ , subw is the maximum submodular width across all rules of , and is a dioid and suppose the input size is , and the active domain size is .Then, we can construct a -equivalent grounding in time (and has size) The above theorem requires the semiring to be idempotent w.r.t. the ⊕ operation (i.e. a dioid).For a general semiring, the best we show is that we can replace subw with the weaker notion of fractional hypertree width (Proposition 6.4).Sections 5 and 6 show refined constructions that improve the grounding size when the Datalog program has additional structure (e.g., linear rules); we summarize these results in Table 1.Lower Bounds.One natural question is: do Theorem 3.2 and 3.3 attain the best possible grounding bounds?This is unlikely to be true for specific Datalog fragments (e.g.linear Datalog has a tighter grounding as shown in Proposition 6.3); however, we can show optimality for a class of programs (see Appendix D for the proof).T 3.4.Take any integer ≥ 2 and any rational number ≥ 1 such that • is an integer.There exists a (non-linear) Datalog program over the tropical semiring Trop + with arity( ) ≤ , such that: (1) subw( ) = , (2) any Trop + -equivalent grounding has size Ω( −1+ ), (3) under the min-weight ℓ-Clique hypothesis [33], no algorithm that evaluates has a runtime of ( −1+ − (1) ).

Grounding Evaluation
Given a grounding for a Datalog program, we now turn to the problem of computing the (least fixpoint) solution for this grounding.In particular, we study how fast can we evaluate under different types of semirings as a function of its size.Ideally, we would like to have an algorithm that computes the fixpoint using (| |) semiring operations; however, this may not be possible in general.In Section 4, we will show fast evaluation strategies for two different classes of semirings: ( ) semirings of finite rank, and ( ) semirings that are totally ordered and absorptive.Our two main results can be stated as follows.
T 3.5.We can evaluate a grounding over any semiring of rank using ( •| |) semiring operations.
T 3.6.We can evaluate a grounding over any absorptive semiring with total order using Many semirings of practical interest are captured by the above two classes.The Boolean semiring in particular is a semiring of rank 1.Hence, we obtain as a direct corollary the result in [22]: We can evaluate a grounding over the Boolean semiring in time (| |).
The set semiring (2 , ∪, ∩, ∅, ) for a finite set has rank | |.In fact, all bounded distributive lattices have constant rank.As another example, the access control semiring ({ , , , , 0}, min, max, 0, ) [19] employs a constant number of security classifications, where = PUBLIC, = CONFIDENTIAL, = SECRET, = TOP-SECRET and ⊏ ⊏ ⊏ ⊏ 0 is the total order for levels of clearance.For all these semirings, their grounding can be evaluated in time (| |) by Theorem 3.5.The class of absorptive semiring with total order contains Trop + that has infinite rank.Hence Theorem 3.5 cannot be applied, but by Theorem 3.6, we can evaluate over Trop + in time (| | log | |).

Applications
Finally, we discuss algorithmic implications of our general framework.In particular, we demonstrate that our approach captures as special cases several state-of-the-art algorithms for tasks that can be described in Datalog.Table 2 summarizes some of our results that are straightforward applications of our framework.
To highlight some of our results, Dikjstra's algorithm for single-source shortest path is a special case of applying Theorem 3.6 after we compute the grounding of the program.As another example, Yannakakis [50] showed that a binary (i.e.EDB/IDB arities are at most 2) rulewise-acyclic programs can be evaluated in time (3 ).By Theorem 3.2 (let = 2), an equivalent grounding of size ( 3 ) can be constructed in time ( 3 ) for such programs.Then by Corollary 3.7, we can evaluate the original program in ( 3 ) time, thus recovering the result of Yannakakis.
If the binary rulewise-acyclic Datalog program is also linear, it can be a evaluated in ( • ) time [50].We generalize this result to show that the ( • ) bound holds for any linear rulewiseacyclic Datalog program with IDB arity at most 2, even if the EDB relations are of higher arity (see Table 2, Proposition 5.4).
As another corollary of Theorem 3.3 and Corollary 3.7, monadic Datalog can be evaluated in time ( subw ).This is surprising in our opinion, since ( subw ) is the best known runtime for Boolean CQs.Hence, this result tells us that the addition of recursion with unary IDB does not really add to the runtime of evaluation!

GROUNDING EVALUATION
This section presents algorithms for evaluating a grounding over two types of commonly-used semirings [28,39,40].First, Section 4.1 presents a procedure that transforms the grounding to one with a more amenable structure, called a 2-canonical grounding.Then, we present evaluation algorithms for semirings of rank (Section 4.2), and for absoprtive semirings that are totally ordered (Section 4.3).

2-canonical Grounding
We say that a grounding is 2-canonical if every equation in is of the form = ⊕ or = ⊗ .As a first step of our evaluation, we first transform the given grounding into a 2-canonical form using the following Lemma 4.1.

Monadic
Datalog [21,22] Boolean monadic ( subw ) Linear Datalog [25] Boolean IDB arity ≤ ( −1 subw−1 • ( + )) This transformation increases the size of the system by at most a factor of two.Indeed, the monomial contributes to | |, and the new equations contribute 1 After the first step, we are left with equations that are either a product of two elements, or a sum of the form = 1 ⊕ 2 ⊕ . . .(w.l.o.g., we can assume no equations of the form = ).Here, we apply the same idea as above, replacing multiplication with addition.This transformation will increase the size of the system by another factor of two using the same argument as before.The equivalence of the new grounding follows from the associativity property of both ⊕, ⊗.
We demostrate Lemma 4.1 with an example.
Here 1 , 2 are new IDB variables introduced to make the system of polynomial equations 2canonical.

Finite-rank Semirings
We now present a grounding evaluation algorithm (Algorithm 1) over a semiring of rank , which is used to prove Theorem 3.5.
Algorithm 1: Grounding evaluation over a rank semiring The key idea of the algorithm is to compute the least fixpoint in a fine-grained way.Instead of updating all equations in every iteration, we will carefully choose a subset of equations to update their left-hand side variables.In particular, at every step we will pick a new variable (Line 7) and then only update the equations that contain this variable2 (Line 8-10).Because the semiring has rank , the value of each variable cannot be updated more than times; hence, we are guaranteed that each equation will be visited only 2 • times.Moreover, since the grounding is 2-canonical, updating each variable needs only one ⊕ or ⊗ operation; the latter property would not be possible if we had not previously transformed the grounding into a 2-canonical one.

Absorptive Semirings with Total Order
We present a Dijkstra-style algorithm (Algorithm 2) for grounding evaluation over an absorptive semiring with ⊑ being a total order to prove Theorem 3.6.It builds upon prior work [39] by further optimizing the original algorithm to achieve the almost-linear runtime.
The algorithm follows the same idea as Algorithm 1 by carefully updating only a subset of equations at every step.However there are two key differences.First, while Algorithm 1 is agnostic to the order in which the newly updated variables are propagated to the equations (hence the use of a queue), Algorithm 2 needs to always pick the variable with the current maximum value w.r.t. the total order ⊑.To achieve this, we need to use a priority queue instead of a queue, which is the reason of the additional logarithmic factor in the runtime.The second difference is that once a variable is updated once, it gets "frozen" and never gets updated again (see .We show in the detailed proof of correctness (Appendix B) that it is safe to do this and still reach the desired fixpoint.
Algorithm 2: Grounding evaluation over an absorptive semiring with total order Input : a grounding 1 transform to be 2-canonical via Lemma 4.1 2 construct a hash table , such that for every variable in , [ ] is the set of all equations that contain in the right-hand side ; init F ← ∅ insert into of value ℎ( )

GROUNDING OF ACYCLIC DATALOG
In this section, we study how to find an efficient grounding (in terms of space usage and time required) of a Datalog program over a semiring .First, we introduce rulewise-acyclicity of a program using the notion of tree decompositions.Tree Decompositions.Let be a sum-prod query (5) and ( [ℓ], E) be its associated hypergraph.A tree decomposition of is a tuple (T , ) where ( ) T is an undirected tree, and ( ) is a mapping that assigns to every node ∈ (T ) a set of variables ( ) ⊆ [ℓ], called the bag of , such that (1) every hyperedge ∈ E is a subset of some ( ), ∈ (T ); (2) (running intersection property) for each ∈ [ℓ], the set { | ∈ ( )} is a non-empty (connected) sub-tree of T .A tree decomposition is a join tree if ( ) ∈ E for all ∈ (T ).Acyclicity.A sum-prod query is said to be acyclic if its associated hypergraph admits a join tree.The GYO reduction [51] is a well-known method to construct a join tree for .We say that a rule of a Datalog program is acyclic if every sum-prod query in its body is acyclic and a program is rulewise-acyclic if it has only acyclic rules.
We formally prove Theorem 3.2, by introducing a grounding algorithm for rulewise-acyclic programs attaining the desired bounds; the pseudocode is in Algorithm 3 in the appendix.
The -equivalent grounding is constructed rule-wise from .Take any acyclic rule from with head (x ).Note that its arity | | ≤ .We ground each sum-prod query (x ) in this rule one by one.To ground a single (which is of the form ( 5)), we construct a join tree (T , ) of .Each bag in the join tree will correspond to an atom in the body of .We root T from any atom that contains at least one of the variables in x (i.e. head variables), say 0 .This orients the join tree.For any node , define ( , E ), where ⊆ [ℓ], E ⊆ E, to be the hypergraph constructed by only taking the bags from the subtree of T rooted at as hyperedges.Note that ( 0 , E 0 ) = ( [ℓ], E) is the associated hypergraph of .Let ⊆ be the head variables that occur in this subtree (so at root, 0 = ).Further, rename the head atom (x ) to be (⊥, 0 ) (x ), with ⊥ being an imaginary parent node of the root and (⊥, 0 ) = .
Next, we use a recursive method G to ground .Let be the parent of in T .A G ( , ( , ) ) call at a node grounds the sum-prod query Hence, calling G ( 0 , (⊥, 0 ) ) at 0 suffices to ground .We describe the procedure G ( , ( , ) ), which has three steps: ( ) Refactor.Define for each child of (i.e. ( , ) ∈ (T ), (T ) being the directed edges of T ), We refactor as follows: where (1) regroups the product into the node and subtrees rooted at each child of , (2) safely pushes aggregation over every child (since any ∈ \ ( , ) only occurs in the subtree rooted at , otherwise by the running intersection property, ∈ ( ) ∩ ( ) ⊆ ( , ) , a contradiction), and (3) replaces every child sum-prod query by introducing a new IDB ( , ) (x ( , ) ) and a corresponding query: ( ) Ground.Instead of grounding the sum-prod query (6) as a whole, we ground the -equivalent (but refactored) query Notice that is exactly the set of variables appear in the body of the refactored query.We ground this query as follows: for each possible tuple (say c ( ) ) in ( ) , we add a grounded rule for each tuple (say c \ ( ) ) that can be formed over the schema x \ ( ) using the active domain of each variable.This is done by taking the attribute values for each variable from c ( ) and c \ ( ) , and substituting it in every predicate (EDB, IDB, or the head) of the query.
( ) Recurse.Now every child of has an introduced IDB ( , ) .For each , we call into G ( , ( , ) ) to recursively ground the new rule (7).If there are no children (i.e. is a leaf node), the G call on terminates immediately. 14 Fig. 1.A join tree with its corresponding rewriting.
We argue the grounding size and time at the grounding step ( ), for any node when grounding the refactored query.Recall that is the parent node of .We consider all possible tuples over x (variables in the body of the refactored query).First, ( ) is of size ( ) if it is an EDB, or ( ) if it is an IDB.Next, for the number of tuples that can be constructed over x \ ( ) , note that ( −1 • ( + )), i.e. the product of the cardinality of ( ) and the number of possible tuples that can be formed over the schema x \ ( ) .
Example 5.2.We illustrate Theorem 3.2 with the acyclic rule: A join tree is drawn in Figure 1 with new intermediate IDBs and rules (one per edge of the join tree) introduced recursively by the proof of Theorem 3.2.In the transformed program, each rule produces ( + ) = ( ) groundings.Hence, its overall size is ( ).A step-by-step walkthrough of this example (Example C.1) and more applications of Theorem 3.2 are in Appendix C.
Free-connexity.Take a tree decomposition (T , ) of a sum-prod query (x ) rooted at a node ∈ (T ).Let TOP ( ) be the highest node in T containing in its bag.We say that (T , ) is free-connex w.r.t.if for any ∈ and ∈ [ℓ] \ , TOP ( ) is not an ancestor of TOP ( ) [48].We say that (T , ) is free-connex if it is free-connex w.r.t.some ∈ (T ).An acyclic query is free-connex if it admits a free-connex join tree.If every sum-prod query in every rule of a program admits a free-connex join tree, then is rulewise free-connex acyclic.Theorem 5.3 (see Table 1) shows that in such a case, there is a much tighter bound (dropping the −1 term).P 5.3.Let be a rulewise free-connex acyclic Datalog program over some semiring with input size , active domain size , and arity( ) is at most .Then, a -equivalent grounding can be constructed in time (and has size) ( + ).Linear Acyclic Programs.If arity( ) ≤ 2, Theorem 3.2 states that a rulewise-acyclic Datalog program admits a grounding of size ( 3 ).If the program is also linear (e.g.same generation [5]), we strengthen the upper bound to ( • ) (see Table 1).The proof and an example (Example C.4) can be found in Appendix C. P 5.4.Let be a linear rulewise-acyclic Datalog program over some semiring with input size , active domain size , and arity( ) is at most 2.Then, we can construct a -equivalent grounding in time (and has size) ( • ).

GROUNDING OF GENERAL DATALOG
This section introduces an algorithm for grounding any Datalog program over an arbitary dioid via the PANDA algorithm [29] (or PANDA, for short).PANDA is introduced to evaluate CQs (i.e.sum-prod queries over the Boolean semiring).Submodular width.A function : . A non-negative, monotone, submodular set function ℎ such that ℎ(∅) = 0 is a polymatroid.Let be a sum-prod query (5).Let Γ ℓ be the set of all polymatroids ℎ on [ℓ] such that ℎ( ) ≤ 1 for all ∈ E. The submodular width of is where F is the set of all non-redundant tree decompositions of .A tree decomposition is nonredundant if no bag is a subset of another.Abo Khamis et al. [29] proved that non-redundancy ensures that F is finite, hence the inner minimum is well-defined.We define the free-connex submodular width of , fsubw( ), by restricting F to the set of all non-redundant free-connex tree decompositions.
This section describes the key ideas of the algorithm underlying Theorem 3.3 (formally as Algorithm 4).We demonstrate the grounding algorithm step-by-step on the following concrete example.
Example 6.1.The following Datalog computes a diamond-pattern reachability starting from some node a set ( 1 ): ( 1) ← ( 1), We ground the second rule that involves a 4-cycle join.In particular, our grounding algorithm constructs a grounding of size ( 3/2 ).
( ) Grounding Bags.A naïve grounding of (say) [3] is of size ( • ) (cartesian product of 32 with domain of 1 ), which is suboptimal.Instead, we use PANDA to ground the new IDBs.
Let (x [4] ) be the set of tuples such that a tuple c [4] ∈ if and only if its annotation (c 3 ) ⊗ ( ) Grounding Acyclic Sub-queries.With the grounded bags, we now rewrite ( 1 ) as a sum of two acyclic sum-prod queries: We ground the two acyclic sub-queries (one per decomposition) using the construction in the proof of Theorem 3.2.For example, if we root both trees at 1 , we get a rewriting: guaranteeing that every intermediate IDB has a grounding of size ( 3/2 ).Thus, the total size of the grounding for is ( 3/2 ).
We show a free-connex version (Proposition 6.2) by restricting the first refactoring step to use only free-connex decompositions.Though fsubw ≥ subw, a −1 factor is shaved off from the bound.P 6.2.Let be a Datalog program with input size , active domain size , arity( ) ≤ .fsubw is its free-connex submodular width.Let be a dioid.Then, a -equivalent grounding can be constructed in time (and has size) ( fsubw + •fsubw ).
Linear Programs.For linear programs, a careful analysis yields the following improved result (proof is in Appendix E.4).P 6.3.Let be a linear Datalog program where arity( ) ≤ , input size , and active domain size .Let be a dioid.Then, a -equivalent grounding can be constructed in time (and has size) Fractional Hypertree-width.So far, all our results only apply to dioids.However, we can extend our results to any semiring by using the InsideOut algorithm [2].Similar to subw( ) (8), the fractional hypertree-width of a sum-prod query , i.e. fhw( ), is defined as The fractional hypertree-width of a program is the maximum fhw over all sum-prod queries, denoted as fhw( ).We show Proposition 6.4 for the grounding size over any semiring (see Table 1).P 6.4.Let be a Datalog program with input size , active domain size , and arity( ) ≤ .Let be a naturally-ordered semiring.Then, a -equivalent grounding can be constructed in time (and has size) ( −1 • ( fhw + •fhw )).

RELATED WORK
Complexity of Datalog.Data complexity of special fragments of Datalog has been explicitly studied.Gottlob et al. [22] showed that monadic acyclic Datalog can be evaluated in linear time w.r.t the size of the program plus the size of the input data.Gottlob et al. [21] defined the fragment Datalog LITE as the set of all stratified Datalog queries whose rules are either monadic or guarded.The authors showed that this fragment can also be evaluated in linear time as monadic acyclic Datalog.This result also follows from a generalization of Courcelle's theorem [12] by Flum et al. [18].Our framework subsumes these results as an application.Lutz et al. [34] studied efficient enumeration of ontology-mediated queries that are acyclic and free-connex acyclic.The reader may refer to Green at al. [23] for an in-depth survey of Datalog rewriting, Datalog with disjunctions, and with integrity constraints.However, there is no principled study on general Datalog programs in parameterized complexity despite its prevalence in the literature of join query evaluation [2,29,35,38].
Datalog over Semirings.Besides Datalog • [28], recent work [25] has looked at the convergence rate for linear Datalog • .The evaluation of Datalog over absorptive semirings with a total order has also been studied [39] where the key idea is to transform the program (and its input) into a weighted hypergraph and use Knuth's algorithm [30] for evaluation.Our paper recovers and extends this result by formalizing the proof of correctness via the concept of asynchronous Kleene chains (Appendix B.1), and showing a precise runtime for evaluating a grounding over such semirings.None of the works mentioned above focus on finding the smallest possible grounding, a key ingredient to show the tightest possible bounds.
Datalog Provenance Computation and Circuits.Algorithms for Datalog provenance computation provides an alternative way to think of Datalog evaluation.Deutch et al. [14] initiated the study of circuits for database provenance.They show that for a Datalog program having | | groundings for IDBs, a circuit for representing Datalog provenance for the Sorp( ) semiring (absorptive semirings are a special case of Sorp) can be built using only | | + 1 layers.As an example, for APSP, the circuit construction and evaluation cost ( 4 ) time.An improvement of the result [27] showed that for APSP, a monotone arithmetic circuit of size ( 3 ) can be constructed by mimicking the dynamic programming nature of the Bellman-Ford algorithm.We use this improvement to show a circuit unconditional lower bound on the grounding size (see Appendix D).

CONCLUSION
This paper introduces a general two-phased framework that uses the structure of a Datalog program to construct a tight grounding, and then evaluates it using the algebraic properties of the semiring.Our framework successfully recovers state-of-the-art results for popular programs (e.g.chain programs, APSP), and uncovers new results (e.g. for linear Datalog).We also show a matching lower bound (both for running time and space requirement) for a class of Datalog programs.Future work includes efficient evaluation over broader classes of semirings [28], circuit complexity of Datalog over semirings and general grounding lower bounds.
The first inequality follows from the inductive hypothesis, and the second inequality is implied by the monotonicity of the Kleene chain.Both inequalities use the monotonicity of the function .
B.2 Proof of Theorem 3.5 In this part, we complete the proof of the theorem, by showing that Algorithm 1 correctly computes the solution with the desired runtime.P T 3.5.Algorithm 1 terminates when no variable changes value, in which case we have obtained a fixpoint.By Proposition B.2, we have reached the least fixpoint (since we can view the updates as an asynchronous Kleene chain).
We now analyze its runtime.The hash table construction and the variable initialization cost time (| |).For the while loop, observe that each operation in Line 7-10 costs (1) time.Hence, it suffices to bound the number of times we visit each equation in .For this, note that to consider the equation = ⊛ , either or needs to be updated.However, since the semiring has rank , each variable can be updated at most times.Hence, an equation can be considered at most 2 times in the while loop.

B.3 Proof of Theorem 3.6
In the following, we assume to be a 2-canonical grounding after applying Lemma 4.1.First, we establish the following two lemmas throughout the while-loop of Algorithm 2. The first one is the following: , (2) , . . .are the values of in the asynchronous Kleene chain constructed from the while-loop of Algorithm 2.

P
T 3.6.We show that Algorithm 2 terminates and yields a fixpoint, and by Proposition B.2, we get the least fixpoint for free because the Dijkstra-style while-loop is a special case of asynchronous Kleene chain.First, it is easy to see that Algorithm 2 terminates because the size of always decreases by 1 (popped IDB variables will not be pushed back into the queue, using F for bookkeeping of already popped variables).
We now show that the returned ℎ(•) is a fixpoint.To prove the fixpoint, we show that once the queue pops off an IDB variable as some value ℎ( ), the equation with at the left-hand side in , say = ⊛ , always holds from that point onwards.We prove this by contradiction.Suppose that = ⊛ causes the earliest violation in the underlying asynchronous Kleene chain.By Lemma B.3, the values of , are non-decreasing w.r.t.⊑.Therefore, the only way to violate = ⊛ is that the value of an IDB variable (or ) strictly increases after is popped off the queue.On the other hand, the queue will eventually pop off that IDB variable ( or ) ⊐ , since Algorithm 2 always terminates and , can not grow smaller by Lemma B.3.Yet, it is a direct contradiction to Lemma B. 4.
Lastly, we provide the runtime analysis.Using similar arguments in the proof of Theorem 3.  along with a new rule for the intermediate IDB: ( ) (Ground) We ground the refactored rule 4 ( 4 ) ← By Theorem 3.2, we can obtain an equivalent grounded Datalog program of size ( 3 ) in time ( 3 ) since for binary EDB predicates, ( + 2 ) = ( 2 ).
Next, we ground an acyclic Datalog Program for the language , where ≥ 1.Note that this language is not context-free.
Example C.3.Suppose we have a directed labeled graph, where each directed edge is labeled as , , or (possibly more than one labels for a single edge).We construct a Datalog program over the natural number semiring N = (N, +, •, 0, 1) to compute: for every pair of node ( , ) in this graph, the number of directed labeled paths from to that satisfies the expression for ≥ 1.The target in the following program (of arity 6) is the IDB and the EDB predicates are , , , where for each tuple ( , ) in the EDB , its natural number annotation ( , ) = 1 (since it means that there is a directed edge from node to node labeled as in the graph), and similarly for EDBs and .Tuples absent from the EDB predicates are assumed to have annotation 0 (i.e.no such labeled edge in the graph).The program is as follows: path ( , , , ) ← ( , , , , , ), = , = .
We will use our grounding method to bound the cost of evaluating this Datalog program.The third rule obviously has a grounding of size 4 .For the second rule that involves selection predicates, one can rewrite it as the acyclic rule path ( , , , ) ← ( , , , , , ), ( , ), ( , ).where ( , ) are new EDB predicates that represent the identity function over the active domain, i.e. ( , ) = 1 if = and 0 otherwise.Hence, ( , ) and ( , ) are of size ( ) and this rule has a grounding of size ( 2 • • ) = ( 4 ).Now we ground the first rule (also acyclic) as follows: the number of groundings of the first sum-prod query ( , ) • ( , ) • ( , ) is • 4 .For the second sum-prod query, we can decompose it as follows: It is easy to see that the groundings for each rule are bounded by • 5 .Hence, the overall grounding is of size C.3 Rulewise free-connex acyclic Datalog programs P P 5.3.The construction is similar to that of Algorithm 3, except that we root the join tree at the node , where the tree is free-connex w.r.t. .Now, we give a more finegrained argument on the grounding size at Line 12-20 of Algorithm 3 when Ground( , ( , ) (x ( , ) )) is called.Recall that we want to ground the intermediate rule where ( , ) = ( ( ) ∩ ( )) ∪ .Follow the proof of Theorem 3.2, it suffices to cover ( ) ∪ and for that, we split into the following two cases: • If ∈ (T ) contains only head variables (i.e. ( ) ⊆ ).Then, ( ) ∪ ⊆ and taking all instantiations in the active domain of ( ) ∪ suffices to ground the body for the above rule.There are ( ) such instantiations since | | ≤ .
• Otherwise, contains at least one non-head variable (say ).Then, ⊆ ( ) since if there is a head variable ∉ ( ), but it occurs in the subtree rooted at , then TOP ( ) must be an ancestor of TOP ( ), which contradicts the fact that the join tree is free-connex w.r.t. .Thus, ( ) ∪ = ( ) and we can trivially cover ( ) by ( + ) groundings.For both cases, the size of the groundings inserted into for the intermediate rule is ( + ).
C.4 Linear rulewise-acyclic Datalog programs of arity at most 2 P P 5.4.We illustrate the grounding technique for a single acyclic sum-prod ( , ) containing at most one IDB.If ( , ) has no IDB (so essentially a sum-prod query), then there is a grounding of size ( • ) that can be constructed in time ( • ), since now every atom has a grounding of ( ) instead of ( + 2 ) in the proof of Theorem 3.3.Thus, we only analyze the case where ( , ) contains exactly one binary IDB .We also assume that no EDB atom contains both variables of (otherwise, we can ground the IDB by ( ) and reduce back to the sum-prod query case).
Similar to Algorithm 3, we construct a join tree (T , ) rooted at a node containing the head variable .If now the IDB is a leaf node, then we simply follow Algorithm 3 to get a grounding of size ( • + 2 ) = ( • ).Otherwise, let T be the subtree rooted at , where ( ) corresponds to the IDB atom , and be the set of head variables that occur in T .If ∈ ( ), or if ∉ , then we subsitute T in ( , ) by a new IDB ′ (x ( ) ) and ground a new free-connex acyclic rule By Theorem 5.3, we can construct for this rule a grounding of size ( + 2 ) in time ( + 2 ).
Then, we reduce back to the previous case when we ground ( , ) since the only IDB ′ becomes a leaf node.The trickier case is when ∈ \ ( ).First, we identify the joining variable between and its parent atom.Observe that there can not be more than one join variables because we assumed T E.2.The above evaluation algorithm evaluates a sum-prod query over a dioid in time ( subw • | (x )|), where subw is the submodular width of the query, is the sum of sizes of the input -relations (x ) and (x ) is the output -relation.

E.2 Continuing Example 6.1
Example E.3.In Example 6.1, we ground the following Datalog program that finds nodes that are reachable via a dimond-pattern from some node in a set : A simple grounding is to ground the 4-cycle join first, materializing the opposite nodes of the cycle, i.e.
Cycle( 1 , 3 ) ← The key idea of Algorithm 4 is to group atoms in a (possibly cyclic) sum-prod query (x ) into an acyclic one using tree decompositions, then ground the acyclic sum-prod query using Algorithm 3. Suppose that (x ) has a tree decomposition (T , ) with an assignment function : E → (T ).Then, we can rewrite (x ) as where we substitute ( ) (x ( ) ) by ∈ E: ( )= (x ).The resulting sum-prod query is acyclic because T is its join tree.If using multiple (say ) tree decompositions as in Algorithm 4, we can rewrite (x ) as where the second equality uses the idempotency of ⊕.To ground the new IDB ( ) (x ( ) ), we apply the PANDA algorithm over the dioid to get a view * ( ) (x ( ) ), for every ∈ (T ) and where Π (x \ , c ) is the relation obtained by first selecting tuples that agree with the tuple c on x and then projecting on x and | is a natural number.A cardinality constraint in our setting is therefore a degree constraint with | ∅ = .Similarly, the active domain of for a variable ( ∈ ) is a degree constraint with { } | ∅ = .In fact, Proposition 6.3 is an improved grounding results for linear Datalog, using the constraints of active domains on each variable.
In addition, a functional dependency → is a degree constraint with ∪ | = 1.This can capture primary-key constraints EDBs, and repeated variables in IDBs (e.g. ( , , ) essentially says, the first variable decides the second).
The evaluation algorithm presented in Appendix E.1 uses PANDA as a black-box, thus it naturally inherits the capability of handling degree constraints as (13) from the original PANDA algorithm, to evaluate a sum-prod query (x ) over a dioid more efficiently.Its runtime is shown to be predicated by the degree-aware submodular width.We refer the reader to [29] for the formal definition of degree-aware submodular width and in-depth analysis.
Using PANDA as a subroutine, our grounding algorithm (Algorithm 4) also handles degree constraints for tighter groundings.For example, we show an improved grounding for linear Datalog, using the refined constraint that each variable in an IDB has an active domain of (see Table 1), instead of ( ) for every IDB predicate.P P 6.3.We follow the exact proof of Theorem 3.3.However, instead of the crude ( subw + •subw ) bound for each view * ( ) (x ( ) ) obtained at Line 9 of Algorithm 4, we show a tighter bound of ( subw • ( / + 1)).
By the properties of PANDA [29] and the fact that all rules are linear, the fine-grained cardinality constraints are: ( ) at most one IDB in each sum-prod query with an active domain of ( ) for each of its variable, ( ) other EDB atoms are of size ( ).Note that ( ) is a stronger constraint than the IDB cardinality constraint of ( ).
We can assume that PANDA terminates in time ( − • ) and the size of each resulting view * ( ) (x ( ) ) has a more accurate size bound of ( − • ) = ( • ( / ) ), where ≤ subw and ≤ 1.One may refer to Lemma 5.3 of [29] for the primal and dual linear programs that justifies the assumed bound.
P P 6.4.We still use Algorithm 4 to ground , with two slight modifications: ( ) for each sum-prod query , pick one tree decomposition (T , ) that attains fhw( ); and ( ) use the InsideOut algorithm instead of PANDA on when grounding the bags ( ) (x ( ) ).Thus, the size of the grounding for each bag is ( fhw + •fhw ), without the polylog factor!The rest of the proof is similar to that of Theorem 3.3 without using the idempotence of ⊕, since only one tree decomposition is used for each sum-prod query ( = 1 in that proof).Similar to the free-connex submodular-width, the free-connex fractional hypertree-width of a sum-prod query (x ) is defined as ffhw( (x )) := min where F is the set of all non-redundant free-connex tree decompositions of (x ).Obviously, ffhw( (x )) ≤ fhw( (x )).We obtain the following corollary directly from Theorem 5.3 and Proposition 6.4.

C
E.4.Let be a Datalog program where arity( ) is at most .Let ffhw be the freeconnex fractional hypertree-width of .Let be a naturally-ordered semiring.Then, a -equivalent grounding can be constructed of size (and time) ( ffhw + •ffhw ).

L 4 . 1 .
A grounding can be transformed into a -equivalent 2-canonical grounding of size at most 4| | in time (| |).
holds since ⊆ if is a child of .Observe that if , then | | ≤ −1; otherwise (i.e., = ), since we have rooted the join tree to a node that contains at least one head variable, at least one of the head variables in must belong to ( ) (again by the running intersection property), in which case also | \ ( )| ≤ −1.Thus, | \ ( )| ≤ |( ( ) ∪ ) \ ( )| = | \ ( )| ≤ − 1.Hence, the groundings inserted for any intermediate rule is of size (and in time)

5 ,
Lines 1-7 of Algorithm 2 can execute in (| |) time.For its while-loop, each iteration binds an IDB variable and calls the insert operations of the priority queue at most deg( ) times, where deg( ) denotes the number of the equations in [ ].A 2-canonical grounding has that ∈ deg( ) = 2| |.Using a classic max heap implementation of a priority queue, inserts (or updates) of values can run in (log | |) time.Thus, the overall runtime is | | + ∈ deg( ) • log | | = (| | • log | |).C MISSING DETAILS FROM SECTION 5 In this part, we include: (C.1) Algorithm 3 for grounding a rulewise-acyclic Datalog programs and a step-by-step walkthrough on Example 5.2.(C.2) Two addtional acyclic Datalog applications (Example C.2 and Example C.3) of Theorem 3.2.(C.3) Missing details for rulewise free-connex acyclic Datalog programs.(C.4) Missing details for linear rulewise-acyclic Datalog programs of arity at most 2.

Table 1 .
Summary of the grounding results.The notation hides polylog factors in (total input size) and (size of the active domain).denotes the arity of the program.See Section 6 for definitions of fhw, subw.

Table 2 .
Summary of runtime results.The notation hides polylog factors in (total input size) and (size of the active domain).denotes the arity of the program.See Section 6 for the definition of subw.