Qr-Hint: Actionable Hints Towards Correcting Wrong SQL Queries

We describe a system called Qr-Hint that, given a (correct) target query Q* and a (wrong) working query Q, both expressed in SQL, provides actionable hints for the user to fix the working query so that it becomes semantically equivalent to the target. It is particularly useful in an educational setting, where novices can receive help from Qr-Hint without requiring extensive personal tutoring. Since there are many different ways to write a correct query, we do not want to base our hints completely on how Q* is written; instead, starting with the user's own working query, Qr-Hint purposefully guides the user through a sequence of steps that provably lead to a correct query, which will be equivalent to Q* but may still"look"quite different from it. Ideally, we would like Qr-Hint's hints to lead to the"smallest"possible corrections to Q. However, optimality is not always achievable in this case due to some foundational hurdles such as the undecidability of SQL query equivalence and the complexity of logic minimization. Nonetheless, by carefully decomposing and formulating the problems and developing principled solutions, we are able to provide provably correct and locally optimal hints through Qr-Hint. We show the effectiveness of Qr-Hint through quality and performance experiments as well as a user study in an educational setting.


INTRODUCTION
In an era of widespread database usage, SQL remains a fundamental skill for those working with data.Yet, SQL's rich features and declarative nature can make it challenging to learn and understand.When students encounter difficulties in debugging their SQL queries, they often turn to instructors and teaching assistants for guidance.However, this one-on-one approach is limited in scalability.Syntax errors are easy to fix, but many queries contain subtle semantic errors that may require careful and time-consuming debugging.To save time, the teaching staff is often tempted to give hints based on how the reference solution query is written, ignoring what students have written themselves, but doing so misses opportunities for learning.A SQL query can be written in many ways that are different in syntax but nonetheless equivalent semantically.Seasoned teaching staff knows how to guide students through a sequence of steps that, starting with their own queries, lead them to a corrected version that is equivalent to the solution query but Second, it is often impossible to declare a part of  as "wrong" since one could instead fix the remainder of  to compensate for it.For example, we could argue that s1.price>s2.price in  is "wrong," but there exists a correct query containing precisely this condition, e.g., with (s1.price>s2.priceOR s1.price=s2.price).Hence, it is difficult to formally define what "wrong" means.Instead of basing our approach heuristically on calling out "wrong" parts, we formulate the problem as finding the "smallest repairs" to  that make it correct.
Third, hints are for human users, so for a query with multiple issues-which is often the case in practice-we must be aware of the cognitive burden on users and not overwhelm them by asking them to make multiple fixes simultaneously.This desideratum introduces the challenge of planning the sequence of hints and defining appropriate intermediate goals.
Finally, effective hinting faces several fundamental barriers.Realistically, we cannot hope to always provide "optimal" hints because doing so entails solving the query equivalence problem for SQL, which is undecidable [1,27,41,53]; even for decidable query fragments, Boolean expression minimization is known to be on the second level of the polynomial hierarchy (precisely Σ  2 [16]).To address the challenges, we propose Qr-Hint, a system that, given a target query  ★ and a working query , follows the logical execution flow (i.e., FROM→WHERE→GROUPBY→HAVING→SELECT) and produces step-by-step hints for the user to edit the working query to eventually achieve  ★ .The sequence of steps is guaranteed to lead the user on a correct path to eventual correctness.The following example shows Qr-Hint helps fix the query in Example 1.
Example 2. Continuing with Example 1, Qr-Hint automatically generates the sequence of hints below.Currently built for the teaching staff, Qr-Hint only generates the "repairs" below; using these repairs, the teaching staff would then hint the user in natural language.With the recent advances in generative AI chatbots, it would not be difficult to automate the natural language hints as well; the advantage of using Qr-Hint in that setting would be to provide provable guarantees on the quality of hints, which otherwise would be difficult, if not impossible, for generative AI to achieve by itself.Note the sequential nature of the hints above; the working query constantly evolves.Qr-Hint first focuses on FROM and will only proceed to WHERE after FROM is "viable."After adding Frequents to FROM, the user will also need to add appropriate join conditions in WHERE; if these were not added correctly, the second step above would suggest additional repairs.It turns out that for this example, only the above two hints are needed to fix the query.In particular, Qr-Hint knows not to suggest spurious hints such as adding to Frequents.drinker to GROUP BY or changing s2.beer to Likes.beer in SELECT.
We make the following contributions: • We develop a novel framework that allows Qr-Hint to provide step-by-step hints to fix a working SQL query with the goal of making it equivalent to a target query.This framework formalizes the notion of "correctness" for a sequence of hints, allowing Qr-Hint to guarantee that every hint is actionable and is on the right path to achieve eventual correctness.Further, by formulating the hinting problem in terms of finding repair sites in  with viable fixes, we are able to quantify the quality of the hints.• Since the optimality of hints, in general, is impossible to achieve due to the foundational hurdles discussed earlier, we aim to provide guarantees on the "local" optimality of Qr-Hint in each step.We design practical algorithms with sensible trade-offs between optimality and efficiency.• We evaluate the performance and efficacy of Qr-Hint experimentally.We further perform a user study involving students from current/past database courses offered at the authors' institution.Our findings indicate that Qr-Hint finds repairs that are optimal or close to optimal in practice under reasonable time, and they lead to hints that are helpful for students.

RELATED WORK
Debugging Query Semantics.There are two main lines of work toward debugging query semantics (as opposed to syntax or performance).The first line helps debug a query but without knowing the correct (reference) query; in this regard, it differs fundamentally from Qr-Hint.Qex [54] is a tool for generating input relations and parameter values for unit-testing parameterized SQL queries.SQLLint [10][11][12][13]31] detects suspected semantic errors in a query, alerting users to what may be indicative of efficiency, logical, or runtime errors.The work highlights a list of common semantic errors made by students and SQL users [12], but it does not suggest edits, and fixing the suspected errors will not guarantee that the query is correct.Habitat [25,32] is a query execution visualizer that allows users to highlight parts of a query and view their intermediate results.While it helps users spot possible errors, it gives no edit suggestions if errors exist.More recently, QueryVis [42] turns queries into intuitive diagrams, helping users better understand the semantics of the queries and spot potential errors.
The second line of work, more directly related to Qr-Hint, focuses on checking a query against a reference query and/or helping to explain their difference.However, previous work has not been able to suggest small fixes that will make the user query equivalent to the reference query.XData [19] checks the correctness of a query by running the query on self-generated testing datasets based on a set of pre-defined common errors, but it provides no guarantees beyond this pre-defined set.Cosette [21][22][23] uses constraint solvers and theorem provers to establish the equivalence of two queries or construct arbitrary instances that differentiate them.From a large database instance, RATest [45] utilizes data provenance to generate a small, illustrative instance to differentiate queries.C-instances [30] aims at constructing small abstract instances based on c-tables [37] that can differentiate two given queries in all possible ways.While Cosette, RATest, and c-instances can provide examples illustrating how two queries are semantically different, they can only indirectly help users pinpoint errors in the original query; none of them is able to suggest fixes.Chandra et al. [18] developed a grading system that canonicalizes queries by applying rewrite rules and then decides partial credits based on a tree-edit distance between logical plans.However, as query syntax differs significantly from canonicalized plans after rewrite, edits on a canonicalized plan do not translate naturally to small fixes on the original query, making it hard for users to use these edits as hints.Finally, SQLRepair [48] fixes simple errors in an SPJ query using constraint solvers to synthesize/remove WHERE conditions until the query produces correct outputs over all testing instances.Its scope of error is much narrower than what we consider, and its tests-driven nature offers no guarantee of query equivalence.
Program Repair and Feedback for GPL.In the domain of program repair for general-purpose programming language (GPL), several types of approaches have been developed but none of them can be directly applied or easily transferred to cover SQL.First, a wrong program is usually aligned with reference program(s) ( [3,33,55]) and fixes are generated based on the selected reference program using various techniques.Such an approach is similar to Qr-Hint, but SQL is essentially different from GPL as SQL is declarative and GPLs are usually procedural.While it is possible to write programs in GPL to simulate the execution of a specific SQL query, there is no well-defined mapping between the syntax of SQL and any GPL.As a result, it is impossible to apply such program repair techniques to SQL in general.Another approach is to leverage test cases to synthesize "patches" for the wrong program so that it returns the same output as the reference program for all test cases ([36, 46, 49, 52, 57]).However, such an approach heavily relies on the test cases to cover all possible errors and thus usually fails to guarantee semantic equivalence.Besides the traditional approaches, recent work explores ML algorithms to provide feedback and correction ( [8,9,20,34,35,43,47]).In addition, large language models such as GPT-3 [15] have shown an ability to explain the semantics of SQL queries, but does not guarantee the correctness of fixes.
Testing query equivalence.While the query equivalence problem in general is undecidable [1,5,51,53], tools and algorithms are developed to check the equivalence of various classes of queries with restrictions and assumptions [4, 17, 21-23, 38-40, 50, 56, 58].Although they give a deterministic answer on equivalence, these tools/algorithms cannot provide any explanation on which parts of the users' queries cause semantic differences from the reference queries.

THE QR-HINT FRAMEWORK
Queries.We consider SQL queries that are select-project-join queries with an optional single level of grouping and aggregation.For simplicity of presentation, we assume these are singleblock SQL queries with SELECT, FROM (without JOIN operators), and WHERE (with condition defaulting to TRUE if missing) clauses, 2together with optional GROUP BY and HAVING clauses.We refer to such a query as an SPJA query if it contains grouping or aggregation or DISTINCT; otherwise, we will call it an SPJ query.
We assume the default bag (multiset) semantics of SQL.Given query , let F() denote the cross product of 's FROM tables (including multiple occurrences of the same table, if any); and let FW() denote the query that further filters F() by 's WHERE condition (i.e., FW() is a SELECT * query with the same FROM and WHERE clauses as ).Furthermore, if  is SPJA, let FWG() denote the (non-relational) query3 that further groups the result rows of FW() according to 's GROUP BY expressions (or ∅ if there are none but  contains aggregation nonetheless, in which case all result rows belong to a single group).Finally, if  is SPJA, let FWGH() denote the (non-relational) query that filters the groups of FWG() according to 's HAVING conditions (which defaults to TRUE if missing).When discussing equivalence (denoted ≡) among above queries, we require that they return the same bag of result rows (ignoring row and column ordering) for any underlying database instance, and additionally, for queries returning groups, they return the same partitioning of result rows (ignoring group ordering).
SMT Solvers.As with previous work [21,45,58], we leverage satisfiability modulo theory (SMT ) solvers to implement various primitives used by our system.Such a solver can decide whether a formula, modulo the theories it references, is satisfiable, unsatisfiable, or unknown (beyond the solver's capabilities).Specifically, we use the popular SMT solver Z3 [24] to implement the following three primitives.Given two quantifier-free expressions, IsEquiv( 1 ,  2 ) tests whether  1 ⇔  2 (for logic formulae such as those in WHERE) or  1 =  2 (for value experssions such as those in SELECT or GROUP BY).Given a logic formula , IsUnSatisfiable() and IsSatisfiable() return, respectively, whether  is satisfiable or unsatisfiable, respectively.All above primitives may return "unknown" when Z3 is unsure about its answer.However, when they return true, Z3 guarantees that the answer is not a false positive.Our algorithms in subsequent sections act only on (true) positive answers from these primitives.For complex uses, it is often convenient to frame equivalence/satisfiability testing using a context C, or a set of logical assertions (e.g., types declaration, known constraints, and inference rules) under which testing is done.We use subscripts to specify the context: e.g., IsUnSatisfiable C () is a shorthand for IsUnSatisfiable ((∧  ∈ C ) ∧ ).
Example 3. Consider a query with a WHERE condition stipulating that  > 100 for an INT-typed column , as well as a HAVING condition MAX() ≥ 101.We might wonder whether the HAVING condition is unnecessary.To this end, we call IsUnSatisfiable C () with The first two assertions in C are derived from the type of  and the WHERE conditions; here the array-typed A refers to a collection of  values.The last two specify (some) general inference rules on the SQL aggregate function MAX.Z3 correctly returns true, meaning that MAX() ≥ 101 must be true under C and is therefore unnecessary.
Our use of Z3 for reasoning with SQL aggregation, such as the example above, goes beyond the practice in previous work, where aggregation functions are mostly treated as uninterpreted functions.For example, to test the equality of two aggregates, [58] conservatively checks whether input value sets or multisets for the aggregate function are equal.In contrast, we encode properties of SQL aggregation functions in a way that allows Z3 to reason with them.As formulae become more complicated, e.g., with quantifiers and arrays, Z3 no longer offers a complete decision procedure (as there exists no decision procedure for first-order logic) and may return "unknown" more often.Nonetheless, practical heuristics employed by Z3 allow it to handle many cases of practical uses to Qr-Hint.

Approach
Given a (syntactically correct) working query  and a target query  ★ , Qr-Hint provides hints in stages to help the user edit the working query incrementally until it becomes semantically equivalent to  ★ .Each stage focuses on one specific syntactic fragment of the working query.Qr-Hint gives actionable hints for the user to edit this fragment with the aim of bringing  a step "closer" to being equivalent to  ★ .Qr-Hint strives to suggest the smallest edits possible and avoid suggesting unnecessary edits.Upon passing a viability check, the working query  clears the current stage and moves on to the next.After clearing all stages, Qr-Hint guarantees that  ≡  ★ (even if syntactically they are still different).
We now briefly outline the concrete stages of Qr-Hint; the details will be presented in the subsequent sections.
For an SPJ query, there are three stages.(1) We start with 's FROM clause (Section 4) and make sure that its list of tables can eventually lead to a correct query; following this stage, F() ≡ F( ★ ).(2) Next, we provide hints to repair 's WHERE clause (Section 5) such that FW() ≡ FW( ★ ), i.e., the repaired query returns the same sub-multiset of rows as  ★ that satisfy the WHERE clause, ignoring SELECT.(3) Finally, we handle 's SELECT clause and ensure the working query returns correct output column values.Importantly, we make inferences of equivalence under the premise that all rows before SELECT already satisfy WHERE; this use of WHERE allows us to infer more equivalent cases and avoid spurious hints.
For an SPJA query, there are five stages.(1) The first stage handles FROM as in the SPJ case.(2) The second stage handles WHERE, but with a twist.As we have seen from Example 1, some condition can be either WHERE or HAVING, and it would be misleading to hint its absence from WHERE to be wrong; hence, Qr-Hint will look "ahead" at the two queries' HAVING and GROUP BY clauses to avoid misleading the user.At the end of this stage, instead of insisting that FW() ≡ FW( ★ ) for the original  ★ , we may rewrite  ★ (by legally moving some conditions between WHERE and HAVING) as needed first.(3) The third stage is GROUP BY, where we provide hints to edit 's GROUP BY expressions to achieve equivalent grouping, i.e., FWG() ≡ FWG( ★ ).Here, we infer equivalence under the premise that the rows to be grouped all satisfy WHERE.(4) The fourth stage is HAVING, where we provide hints to repair 's HAVING condition in the same vein as WHERE; however, inferences in this stage would additionally consider both WHERE and GROUP BY, and they are more challenging because of aggregation functions.After this stage, we have FWGH() ≡ FWGH( ★ ).(5) The fifth and final stage is SELECT, which is similar to the SPJ case, but with the challenge of handling aggregation functions while simultaneously considering WHERE, GROUP BY, and HAVING.
Progress and Correctness.Note that to clear a stage, the user only needs to come up with a fix to pass the viability checks up to this stage.Even though Qr-Hint may examine the queries in their entirety, the user does not have to think ahead about how to make the entire query correct. 4Moreover, once a stage is cleared, Qr-Hint never requires the user to come back to fix the same fragment again.This stage-by-stage design with "localized" hints helps limit the cognitive burden on the user.
The following theorem formalizes the intuition that this stagebased approach leads to steady, forward progress toward the goal of fixing the working query.It follows from the observation that our solution for each stage ensures the properties asserted below, which we will show stage by stage in the subsequent sections.We delegate all proofs in this paper to the appendix.
Optimality.Ideally, we would like Qr-Hint to suggest the "best possible" hints, e.g., those leading to minimum edits to the working query.Unfortunately, it is impossible for any system to provide such a guarantee in general, because doing so entails being able to determine the equivalence of SQL queries: if  ≡  ★ to begin with, the system should not suggest any fix.It is well-known that the equivalence of first-order queries with only equality comparisons is undecidable [1].Under bag semantics, even the decidability of equivalence of conjunctive queries has not been completely resolved [41].Once we open up to the full power of SQL, which can express integer arithmetic, even equivalence of selection predicates becomes undecidable via a simple reduction to the satisfiability of Diophantine equations [27].
Given the foundational hurdles above, Qr-Hint seeks a pragmatic solution.Instead of offering any global guarantee on the optimality of its hints, which is impossible, Qr-Hint establishes, for each stage, guarantees on the necessity or minimality of its hints under certain assumptions.For example, for the FROM stage, Qr-Hint guarantees its suggested fixes are optimal for SPJ queries, but for some SPJA queries, it may suggest a fix that turns out to be unnecessary.As another example, for the WHERE stage, the optimality of Qr-Hint depends on, among other things, Z3-based primitives offering complete decision procedures.In each subsequence section, we will state any such assumption explicitly.
Finally, it is important to note that Qr-Hint's progress and correctness properties (Theorem 3.1) do not rely on these assumptions.In the worst case, the user may be hinted to make some fixes that are unnecessary or unnecessarily big, but Qr-Hint will still ensure that the user gets a correct working query in the end.
Limitations.Following Theorem 3.1, Qr-Hint is guaranteed to generate correct hints for select-project-join queries with an optional single level of grouping and aggregation.On the other hand, Qr-Hint currently has several limitations.1) Qr-Hint may sometimes suggest suboptimal or even unnecessary fixes (even though they still lead to correct queries), as discussed above; the reason lies in fundamental hurdles due to the undecidability of SQL query equivalence and the use of heuristics to tame complexity.2) Qr-Hint currently does not handle NULL values and assumes that all database columns are NOT NULL.With some additional effort and complexity, Qr-Hint can be extended to handle NULL using the technique in [58] of encoding each column with a pair of variables in Z3 (one for its value and the other a Boolean representing whether it is NULL).The same applies to OUTER JOIN.3) Except the case of aggregation-free subqueries in FROM mentioned in Footnote 2, Qr-Hint does not support subqueries in general.Subqueries involving aggregation in general cannot be folded into the outer query block.Subquery constructs such as NOT EXISTS and NOT IN entail supporting queries involving the difference operator, which we have not yet studied.If we do not care about the number of duplicates in the result, positive subqueries with EXISTS and IN could be rewritten as part of the join in the outer select-project-join query and supported as such.However, this approach is unsatisfactory, especially since our handling of FROM (Section 4) does assume that duplicates matter.In general, more work is needed to develop a comprehensive solution for subqueries.4) Finally, Qr-Hint does not consider database constraints such as keys and foreign keys.While we can, in theory, encode some constraints as logical assertions and include them as part of the context when calling Z3, these assertions (with quantifiers) can significantly hamper Z3's performance.Future work is needed to develop more robust algorithms for incorporating constraints.

FROM STAGE
This stage aims to ensure F() ≡ F( ★ ).Recall that a FROM clause may reference a table  multiple times, and each reference is associated with a distinct alias (which defaults to the name of  ).Each column reference must resolve to exactly one of these aliases.Let Tables() denote the multiset of tables in the FROM clause of , and let Aliases() denote the set of aliases they are associated with in .With a slight abuse of notation, given table  , let Aliases(, ) denote the subset of Aliases() associated with  (a non-singleton Aliases(, ) implies a self-join involving  ).Given an alias  ∈ Aliases(), let Table(, ) denote the table that  is associated with in .
The viability check (Theorem 3.1, stage 1) for FROM is simple: where □ = denotes multiset equality.If the working query  fails the viability check, Qr-Hint simply hints, for each table  whose counts in Tables() and Tables( ★ ) differ (including cases where  is used in one query but not the other), that the user should consider using  more or less to make the counts the same.It is straightforward to see that this hint leads to a fix that makes Tables() □ = Tables( ★ ), which enables the user to further edit  into some Q ≡  ★ without retouching FROM: at the very least, one can make Q isomorphic to  ★ up to the substitution of table references with those in Aliases().This observation establishes the progress and correctness properties (see Theorem 3.1) of FROM-stage hints, which we state below along with the remark that F() ≡ F( ★ ) after this stage.Lemma 4.1.Qr-Hint's FROM-stage hint leads to a fixed working query  1 that (1) passes the viability check  1 Tables (𝑄 1 ) □ = Tables( ★ ); (2) satisfies F( 1 ) ≡ F( ★ ); and (3) leads to eventual correctness.
While the correctness of the FROM-stage hint is straightforward, its optimality is surprisingly strong.The following lemma states that the viability check is, in fact, necessary-regardless of what could be done in WHERE and SELECT-under reasonable assumptions.Lemma 4.2.Two SPJ queries  ★ and  cannot be equivalent under bag semantics if Tables( ★ ) ̸ □ = Tables() assuming no database constraints are present, and there exists some database instance for which either  ★ or  returns a non-empty result.
If the queries have no self-joins, it is straightforward to establish this mapping by table names.With self-joins, however, it can be tricky because we must match multiple roles played by the same table across queries.The information contained in FROM alone would be insufficient for matching.One approach is to explore every possible table mapping and select the one that leads to the minimum fix.Doing so would blow up complexity by a factor exponential in the number of self-joined tables.Qr-Hint instead opts for a heuristic that picks the single most promising table mapping.Here we describe the heuristic briefly.For each alias, we build a "signature" that captures how its columns are used by various parts of the query in a canonical fashion.We define a distance (cost) metric for the signatures.Then, for each table involved in self-joins, to determine the mapping between its aliases in  and  ★ , we construct a bipartite graph consisting of these aliases and solve the minimum-cost bipartite matching problem.We illustrate the high-level idea using the example below.For example, S1.beer's WHERE/ HAVING signature says that it is involved in an equality comparison with both L.beer and S2.beer; the latter is inferred-Qr-Hint automatically adds column references and constants that obviously belong to the same equivalence class.Likewise, S1's GROUP BY signature includes both bar and beer, with the latter added because of its equivalence to the GROUP BY column L.beer.When comparing signatures, all aliases are replaced by table names (which is a heuristic simplification); therefore, all four WHERE/ HAVING signatures above for beer are considered the same.In this case, what makes the difference in bipartite matching turns out to be the SELECT signatures for bar, which clearly favors the mapping with S1 ↦ → s2 and S2 ↦ → s1.
Once we have selected the table mapping , we can then "unify"  ★ and .For convenience, we simply rename each alias  in  ★ to (); in subsequent sections, we shall assume that  ★ and  have consistent column references.

WHERE STAGE
WHERE is our most involved stage, aimed at making small edits to the WHERE condition of  so that it becomes logically equivalent to that of  ★ , thereby ensuring FW( ★ ) ≡ FW() (recall from Section 3.1).Let  and  ★ denote the WHERE predicates in  and  ★ , respectively.We assume that they have already been unified by the selected table mapping to have the same set of column references, as discussed in Section 4. The viability check for the WHERE stage (Theorem 3.1, stage 2) is simply that  is logically equivalent to  ★ : there are many different ways to modify  so that becomes logically equivalent to  ★ , and it is impossible to declare any part of  as definitively "wrong."Therefore, we suggest the smallest possible edits on  to reduce the cognitive burden on the user.We formalize the notion of "small edits" below.We represent  and  ★ using syntax trees, where: The syntax tree of  is shown in Figure 1.The syntax tree of  ★ would have the positions of nodes  2 and  3 reversed and red nodes replace with =, >10, and <5 respectively.Definition 2 (Repair for SQL Predicate).Given a quantifierfree logical formulae  represented as a tree, a repair of  is a pair (S, F ) where S is a set of disjoint subtrees of  called the repair sites, and F is function that maps each site  ∈ S to a new formulae F () called the fix for .Given a target predicate  ★ , a repair (S, F ) for  is correct if applying it to -i.e., replacing each  ∈ S with F ()-results in a formulae  ′ such that  ′ ⇔  ★ .Definition 3 (Cost of a repair).Given target predicate  ★ , the cost of a repair (S, F ) for  is: ∈ R + controls the relative weights of the cost components.
Here, we simply define dist(•, •) to be the number of nodes deleted and inserted by the repair; other notions of edit distance could be used too.The denominator under dist(•, •) serves to normalize the measure relative to the sizes of the queries.Also, note that the  • |S| term adds a fixed penalty for each additional repair site.Intuitively, Qr-Hint will present all repair sites (without the associated fixes) to the user as a hint.Even a moderate number of repair sites will pose a significant cognitive challenge-if there were so many issues with , we might as well ask the user to rethink the whole predicate (which would be a single repair site at the root).In our experiments (Section 9), we set  = 1/6, and the number of repair sites per WHERE rarely goes above two or three.Example 6.Consider Figure 1.One correct repair for  consists of three sites ( 4 ,  10 ,  12 ) and the corresponding fixes (=, >10, <5).The cost for this repair is 3 + 75.Another correct repair for  consists of two sites ( 5 ,  3 ) and the corresponding fixes <5 ∨ >10 ∨ <7 and = ∧ (≠ ∨ > ).The cost for this repair is 2 Algorithm 1 is our overall procedure for computing a minimumcost repair for a predicate.It considers all possible sets of repair sites, prioritizing smaller ones because the number of repair sites heavily influences the repair cost, and stopping once the lowest cost found so far is no greater than a conservative lower bound on the cost of the repairs to be considered.In the worst case, the number of repairs to be considered is exponential in the size of , but in practice, the early stopping condition usually kicks in when the number of repair sites is 2 or 3, so the number of repairs considered is usually quadratic or cubic in | |.
The two key building blocks of Algorithm 1 are CreateBounds and DeriveFixes, which we describe in more detail in the remainder of this section.Intuitively, CreateBounds (Section 5.1) provides a quick and "exact" test to determine whether a given set of repair sites could ever lead to a correct repair.If yes, DeriveFixes (Section 5.2) then finds the "optimal" fixes for these repair sites.Our algorithms use Z3, so their exactness and optimality depend on Z3's completeness for the types of predicates they are given.DeriveFixes's optimality further hinges on a Boolean minimization procedure (MinBoolExp) that it also uses.On the other hand, since Z3 inferences are sound, progress and correctness (Section 3.1) are guaranteed.Lemma 5.1.WHERE-stage hint leads to a fixed working query  2 with WHERE condition that 1) passes the viability check  ⇔  ★ ; 2) satisfies FW( 2 ) ≡ FW( ★ ); and 3) leads to eventual correctness.Lemma 5.2.Given  and  ★ , assuming that Z3 inference is complete with respect to the logic exercised by  and  ★ , and that MinBoolExp finds a minimum-size Boolean formula equivalent to its given input, the repair returned by RepairWhere(,  ★ , | |) is optimal (i.e., has the lowest possible cost) if there exists an optimal repair that either contains a single site or has all its sites sharing the same parent in .
Note that Lemma 5.2 provides optimality for two important cases that commonly arise in practice: 1)  makes a single (presumably small) mistake; 2)  is either conjunctive or disjunctive (because all atomic-predicate nodes share the same ∧ or ∨ parent node).

Viability of Repair Sites
The key idea is that, given a set of repair sites in , we can quickly compute a "bound" that precisely defines what can be accomplished by any fixes at these sites (and only at these sites).We first give the definition of bounds and introduce some notations.Give quantifierfree logical formulae  ⊥ , , and We call  ⊤ an upper bound of  and  ⊥ a lower bound of .
CreateBounds(, S) (Algorithm 2) computes a precise bound for any predicate that can be obtained by fixing  at the given set S of repair sites.It works by computing a bound for each node in  in a bottom-up fashion, starting from the repair sites or leaves of .We 6 For node  in  , S [ ] denotes the subset of S that belong to the subtree rooted at .  false true The following shows that repair bounds computed by CreateBounds are valid.The proof uses an induction on the structure of .

Lemma 5.3 (Validity of Repair Bounds
).Given a predicate  and a set S of repair sites, CreateBounds(, S) outputs two predicates  ⊥ and  ⊤ , such that applying any repair (S, F ) (with the given S) will result in a predicate Lemma 5.3 immediately yields a method for deciding whether a candidate set S of repair sites is viable: if the target formula  ★ ∉ [ ⊥ ,  ⊤ ] given S, then there does not exist a set of correct fixes F for S. The next natural question to ask is: if the target formula  ★ ∈ [ ⊥ ,  ⊤ ] given S, is it always possible to find some correct fixes?The answer to this question is yes-and Section 5.2 will provide constructive proof.Hence, repair bounds provide a precise test of whether a set S of repair sites is viable.
For example, continuing from Example 7, using Z3, it is easy to verify that  ★ ∈ [= ∧ <7, ≠ ∨ > ∨ =]; therefore, { 4 ,  10 ,  12 } is a viable set of repair sites for  with respect to  ★ .Algorithm 3: DeriveFixes(, S,  ★ ,  ★ ) Input : a predicate , a set S of disjoint subtrees (repair sites) of , and a target bound [ ★ ,  ★ ] for  to achieve by fixes Output : a repair represented as a set of (,  ) pairs, one for each

Derivation of Fixes
Suppose the target formula  ★ falls within the repair bound [ ⊥ ,  ⊤ ] computed by CreateBounds(, S).We now introduce DeriveFixes (Algorithm 3) that computes correct fixes F for S. The idea is to traverse 's syntax tree top-down and derive a target bound for each node .As long as we repair subtrees rooted at 's children such that the resulting predicates fall within their respective target bounds, we will have a repair for  that makes its result predicate fall within 's target bound.We start from 's root with the desired target bound [ ★ ,  ★ ] and "push it down"; whenever we reach a repair site, its fix would simply be the smallest formula (found by MinFix) that falls within the target bound we have derived for the repair site.
The intuition behind how to "push down" the target bound at node  to its children is as follows.First, the repair bound on a child  of  dictates what repairs are possible-the target bound we set for  must be bound by its repair bound.However, we want to tighten the repair bound as little as possible because a looser target bound gives MinFix more freedom in finding a small formula.As a simple example, consider the target bound where  1 ,  2 ,  3 are independent atomic predicates.The smallest formula within this bound is  1 ∧  2 .However, if the target bound were looser, e.g., [ 1 ∧  2 ∧  3 , ( 1 ∧  2 ) ∨  3 ], the smallest formula within this new bound would be just  3 , smaller than before.
Lines 15-22 of Algorithm 3 spells out our strategy.We will illustrate the key ideas with Example 7 and Figure 1.Consider pushing down the target bound of [ ★ ,  ★ ] at  1 to  2 and  3 .Note that Node(s) target lower bound target upper bound same as in original predicate same as in original predicate same as in original predicate Table 1: Target lower and upper bounds in Example 5 our choices of target bounds for  2 and  3 are constrained by their respective repair bounds in the table of Example 7; in general, we will need to raise these lower bounds and/or lower these upper bounds in a way such that any repairs on  2 and  3 within these bounds ensure that  1 's target bound is met.Let us focus on setting the target bound for  2 .As argued above, we would like it to be as loose as possible.Thankfully, because  1 ⇔ 2 ∨ 3,  3 can help "cover" some of  1 .Specifically, no matter how we end up repairing  3 , we know it is lower-bounded by = ∧ <7 (denote this formula by  ′ ).Hence,  3 will certainly cover the  ★ ∧ ′ part of  ★ , leaving  2 responsible to cover only  ★ ∧ ¬ ′ .This observation motivates us to set the lower target bound for  2 by raising its lower repair bound (denote it by   ) to   ∨ ( ★ ∧ ¬ ′ ) (Line 20) instead of all the way up to   ∨  ★ .On the other hand,  3 does not help with setting the upper target bound for  2 .We have to set  2 's upper target bound to  ★ , because if  2 "overshoots"  ★ , ∨-ing it with any  3 formula will not bring it down.In sum, we set the target bound for  2 as A symmetric argument leads to setting the target bound for  3 as [(= ∧<7) ∨ ★ ,  ★ ] (in this case  2 offers no help to  3 because it is lower-bounded only by false).The intuition behind pushing the target bound through ∧ is analogous to that described above for ∨ but instead boils down to lowering upper bounds as little as possible (as opposed to raising lower bounds).Completing the rest of Example 7, we show the target bounds derived by DeriveFixes for  given repair sites { 4 ,  10 ,  12 } in Table 1.
Another aspect of DeriveFixes worth mentioning is its handling of the case when multiple repair sites have the same ∧ or ∨ parent (which is common because many queries in practice are conjunctive; therefore, their trees have only two levels-the root and the leaves).Since ∧ and ∨ are commutative, all such sites can be combined into effectively one site ( in Algorithm 3) to be fixed.In Example 5 above,  10 and  12 are handled in this manner.Once we obtain a fix for  using MinFix (in conjunctive normal form for ∧ or disjunctive normal form for ∨), DistributeFixes distributes the  's clauses to the repair sites (Line 22) based on syntactic similarities between them.
The following is the main result of this section, which affirms that so long as a candidate set S of repair sets passes the repair bound check in Section 5.1, there must exist a correct repair for F and DeriveFixes will find it.This lemma and Lemma 5.3 together imply that our repair bound check is exact.Lemma 5.4 (Existence of Correct Repair).Suppose  ★ ∈ CreateBounds(, S).DeriveFixes(, S,  ★ ,  ★ ) returns F such that applying (S, F ) to  yields a formula equivalent to  ★ .
In the remainder of this section, we first focus on MinFix, which DeriveFixes uses to find the smallest formula within a target bound.We end with a discussion of complexity, optimality, and, when we cannot guarantee optimality, techniques to mitigate suboptimality.
Finding Smallest Formula with a Bound.Given a target bound [ ★ ,  ★ ] for a repair site, MinFix needs to find a formula  with the smallest size possible such that  ∈ [ ★ ,  ★ ].This goal is intimately related to the Boolean minimization problem, which has been well studied and known to be hard [16].Many practically effective tools have been developed over the years, so our strategy is to leverage these tools for Qr-Hint.There are two technical challenges: 1) Boolean minimization is formulated in terms of expressions involving independent Boolean variables, while our formulae involve atomic predicates whose truth values are not independent.2) Our minimization problem is given a bound as opposed to a single expression that Boolean minimization typically expects.
To address (1), we run a heuristic procedure using Z3 to identify a set A of "unique" atomic predicates that appear in  ★ and  ★ ; those that are logically equivalent to others or can be expressed easily in terms of others (e.g., with a negation) are excluded.This procedure does not need to detect or remove intricate dependencies (such that > follows from > and  ≤); any such dependencies will still be caught later.Then, we map each predicate in A to a unique Boolean variable and convert  ★ and  ★ into Boolean expressions involving these variables.
To address (2), we note that many practical Boolean minimization tools accept the specification of Boolean expressions as truth tables with possible don't-care output entries.Our idea is to use don't-cares to encode the constraint implied by the target bound.Specifically, we generate a truth table whose rows correspond to truth assignments of the Boolean variables for A. If a particular assignment is not feasible (which is testable in Z3) due to interacting atomic predicates, we mark the output for the row as don't-care.For each feasible assignment, if  ★ and  ★ evaluate to the same truth value, we designate the output for that row to be this value.If  ★ evaluates to false and  ★ evaluates to true, we mark the output as don't-care-reflecting the flexibility offered by the bound.(Note that because  ★ ⇒  ★ , the case where  ★ and  ★ evaluate to true and false respectively cannot occur.) The current implementation of Qr-Hint uses ESPRESSO [14] as the primitive MinBoolExp for finding a minimum-size Boolean expression given a truth table with don't-cares.
Complexity and Optimality.In our analysis below, let  denote the combined size of formulae  and  ★ .DeriveFixes's main cost comes from calls to MinFix and Z3.The number of times that MinFix is invoked is |S|, which is O() but is usually a small constant in practice.MinFix runs in time exponential in the number of Boolean variables, which is capped at .To construct the input truth table for MinBoolExp, MinFix will also call Z3  (2  ) times.Each Z3 call may take time exponential in the length of its input, though in practice, we time out with an inconclusive answer.Finally, as discussed at the beginning of Section 5, the number of calls to DeriveFixes by RepairWhere can be worst-case exponential in , but in practice it will be O( 3 ).Regardless, the overall complexity of RepairWhere is exponential in the complexity of the WHERE predicates.Although this worst-case complexity seems daunting, we have found that Qr-Hint delivers acceptable performance in practice: thankfully,  is often small, and the structures of  and  ★ and the interdependencies among their atomic predicates tend to be much simpler than, e.g., our Example 5.
The optimality result is presented earlier as Lemma 5.2.Intuitively, the guarantees (which still depend on the primitives Z3 and MinBoolExp) stem from two observations: 1) if repair is limited to a single site, the target bound computed by DeriveFixes is indeed the best one can do; and 2) if all sites share the same parent, DeriveFixes would effectively process them as a single site.However, target bounds for non-combinable repair sites cannot be set optimally in an independent manner; the approach taken by DeriveFixes, which essentially assumes that siblings receive the least amount of help possible from each other when pushing down target bounds, cannot guarantee a minimum-size repair.Indeed, our running example Example 5 with repair sites { 4 ,  10 ,  12 } is an instance where DeriveFixes fails to set target bounds optimally, because  4 has a different parent from  10 and  12 .To mitigate this problem, we have developed a more sophisticated algorithm (called DeriveFixesOPT) for finding fixes for multiple sites holistically.A full discussion of DeriveFixesOPT is in the appendix.DeriveFixesOPT increases the complexity by another factor of 2 | S | .It is heuristic in nature (as it prioritizes repair sites by how constrained they are) and cannot guarantee optimality beyond Lemma 5.2.However, it does well in practice and better than DeriveFixes.Since |S| is small in practice, the complexity overhead is a good price to pay.On the other hand, DeriveFixesOPT finds the optimal fixes  4 ↦ → =;  10 ↦ → >10;  12 ↦ → <5.

GROUP BY STAGE
We check the GROUP BY equivalence assuming  ★ ,  have equivalent FROM and WHERE clauses.We focus on ensuring FWG() ≡ FWG(), regardless of the order and the number of expressions involved in their GROUP BY clauses.
In the following, we consider the case where both  and  ★ have grouping and/or aggregation.Suppose we have unified the WHERE conditions and GROUP BY expressions in the two queries according to the table mapping .Let  denote the resulting formula for  ★ 's WHERE condition (which at this point is logically equivalent to 's), and let ì  and ì  ★ denote the resulting lists of GROUP BY expressions for  and  ★ , respectively.Note that the ordering of the GROUP BY expressions is unimportant.Also, if a query involves aggregation but has no GROUP BY, we consider the list of GROUP BY expressions to be an empty list.Same column references across , ì , and ì  ★ are treated as same variables.Our goal is to compute a subset Δ − of GROUP BY expressions to be removed from , as well as a set Δ + of additional GROUP BY expressions to be added to , such that the resulting query will always produce the same grouping of intermediate result tuples (produced by FROM-WHERE) as  ★ .In practice, we may not want to reveal Δ + , but instead simply hint that  misses some GROUP BY expressions.We may repeat the hinting process several times until GROUP BY is completely fixed.Input : a formula  and two expression lists ì  and ì  ★ Output : a pair (Δ − , Δ + ), where denote the set of variables in  , ì , and ì  ★ ; 2 let  1 ,  2 be two assignments of ì  to new sets of variables ì  1 and ì  2 ; Repairing grouping is trickier than it seems because seemingly very different GROUP BY lists can produce equivalent grouping, as illustrated by the following example.The two queries are equivalent, even though none of the pairs of GROUP BY expressions are equivalent when examined in isolation.
To address this challenge, instead of comparing pairs from ì  ★ and ì  in isolation, we holistically consider these lists as well as the WHERE condition, and go back to the definition of GROUP BY as computing a partitioning of intermediate result tuples.Formally, the viability check for this stage is that ì  and ì  ★ achieve the same partitioning, or more precisely: Note that instead of referring to tuples  1 and  2 , we simply refer to variables representing their column values in the above.
In FixGrouping, to find Δ − , which are "wrong" expressions in ì , we check, for each   , whether it is possible that given that means   is wrong with respect to  ★ , because while  1 and  2 should belong to the same group per  ★ , grouping by   alone would have forced them into separate groups instead.After identifying all wrong expressions in ì  and removing them, we are left with a partitioning potentially coarser than  ★ but otherwise consistent with  ★ .We then find Δ + to be further added in a similar fashion.Lemma 6.2.We say that two lists of GROUP BY expressions are equivalent if they produce the same partitioning for the above query over any database instance.Let (Δ − , Δ + ) = FixGrouping(, ì , ì  ★ ).Assuming that subroutine IsSatisfiable returns no false positives, we have: Correctness: GROUP BY-stage hint leads to a fixed working query  3 that 1) passes the viability check (ì , ì  ★ are equivalent), 2) satisfies FWG( 3 ) ≡ FWG( ★ ); and 3) leads to eventual correctness.
Further assuming that IsSatisfiable returns no false negatives, we have: denote the minimal Δ − and Δ + respectively, then for any The strong minimality of Δ − means that we can hint each expression therein as a "must-fix." The weak minimality of Δ + works perfectly as we simply hint that the wrong query needs some additional GROUP BY expressions.

HAVING STAGE
At HAVING stage, we aim at further ensuring that FWGH() ≡ FWGH( ★ ) assuming that  ★ and  unified by a table mapping and have equivalent FROM, WHERE, and GROUP BY.While HAVING can also be modeled as a logical formula, there are new challenges: 1) unlike WHERE, inputs to HAVING formulae are arrays of tuples [ 1 , ...,   ] instead of single tuples, 2) we need to consider aggregate functions, and 3) we cannot test HAVING alone without considering WHERE's effect.Our strategy is to construct two formulae  ★ ,  for the HAVING conditions of  ★ ,  respectively, such that equivalence of  ★ and  implies FWGH() ≡ FWGH( ★ ).To this end, for each reference to a GROUP BY column in HAVING, we replace it with a variable from the same domain, and we translate HAVING expressions outside aggregate function calls in the same way as we handle WHERE: e.g., A>B+3 becomes >+3.For each reference to a column not in GROUP BY, we introduce an array variable to capture the fact that it refers to a collection of values from rows in the same group.Moreover, for each aggregate function call, we introduce a new array variable to represent the collection of input values if they are computed from an expression, and we use a universally quantified assertion to relate this variable to the source column values: e.
We test their equivalence under the following context: In the above, the assertions underneath the horizontal line are generic assertions encoding properties of aggregate functions useful for inferring equivalences.Only those relevant to Example 10 are listed here; for a complete list see [28].
The viability check for HAVING (Theorem 3.1, stage 4) is that  is logically equivalent to  ★ under HAVING base context C, i.e.: Note that this check implicitly applies to all groups.If a constraint solver fails to establish equivalence, we invoke the exact same procedures as for WHERE to find a repair.Lemma 7.1.HAVING-stage hint leads to a fixed working query  4 with HAVING condition that 1) passes the viability check; 2) satisfies FWGH( 4 ) ≡ FWGH( ★ ); and 3) leads to eventual correctness.
As with WHERE, the correctness of the above lemma relies only on the fact that Z3 inference is sound with respect to the logic exercised by  ,  ★ , and C and that MinBoolExp always finds a Boolean formula equivalent to its given input.We could additionally guarantee optimality similar to Lemma 5.2 by making the same assumptions therein (completeness of Z3 inference and optimality of MinBoolExp) plus the additional assumption that the context C encodes all properties of aggregate functions relevant to inference.

SELECT STAGE
This stage aims at fixing SELECT as needed to ensure  ≡  ★ , assuming that they already have equivalent FROM, WHERE, GROUP BY and HAVING.We test the equivalence between SELECT expressions with a context C dependent on the type of the query: if the queries are SPJ, we simply assert the WHERE condition in C; if the queries are SPJA, we use the same C defined by the HAVING-stage.
Let ì  and ì  ★ denote the resulting ordered lists of SELECT expressions for ,  ★ , respectively.The viability check ( 5 ) is that dim both SELECTs have the same number of expressions and expressions on the same index position are equivalent.If SELECT clauses are not equivalent between  ★ , , our goal becomes to compute Δ − of SELECT expression to be removed from  at the corresponding index position and Δ + of expressions to be added to  at the corresponding index position.

The algorithm checks the equivalence between (ì 𝑜 [𝑖], ì
★ []) and add Δ − and Δ + respectively if they are inequivalent.Finally, excessive expressions in  or  ★ will also be added to Δ − and Δ + respectively.After fixing SELECT, we guarantee  ★ ≡ .

EXPERIMENTS
We test three aspects of Qr-Hint: coverage, accuracy, and running time.For coverage, we test the ability of Qr-Hint to fix wrong queries that arise in real-world classroom settings.For accuracy and running time, we focus on Algorithm 1, which is the bottleneck of Qr-Hint due to calls to DeriveFixes or DeriveFixesOPT.As fix minimization incurs exponential time, we examine 1) how the number of unique predicates affects running time, 2) how close the generated repairs are to the optimal if queries are not conjunctive, 3) a comparison between the running time and optimality of DeriveFixes and DeriveFixesOPT.In general, DeriveFixesOPT strives for smaller fixes and hence incurs longer running time than DeriveFixes.
Implementation/Test Environment.We implemented Qr-Hint in Python 3.10 using Apache Calcite [6] to parse SQL queries and Z3 SMT Solver [24] to test constraint satisfiability.We use ESPRESSO in PyEDA [26] for fix minimization.We run the experiments locally on a 64-bit Ubuntu 20.04 LTS server with 3.20GHz Intel Core i7-8700 CPU and 32GB 2666MHz DDR4.
Test Data Preparation.To prepare the first test dataset, denoted Students, we examined 2,000+ real student queries from an undergraduate database course in one semester at the first author's institution.These queries came from 4 introductory-level SQL questions (with 4 reference queries), and altogether they included 341 wrong queries.Out of these, 35 (11%) used SQL features not supported by Qr-Hint (see limitations at the end of Section 3).Hence, we end up with 306 supported wrong queries in Students.(At the time of writing, we are still exploring with the institutional review board the possibility of making this dataset publicly available.) To further expand coverage of errors, we cross-checked Students queries with the list of SQL issues indicative of semantic errors categorized by Brass et al. [12] (which did not publish a query dataset).Out of the 43 issues in [12], 18 involve SQL features not currently supported by Qr-Hint, but they only make up for a small minority (11.4%) of the observed instances as reported by [12].Out of the 25 issues Qr-Hint should support, 17 are already represented in the 306 Students queries.To cover the remaining 8, we handcrafted two queries according to each issue and added to the dataset; we also handcrafted corresponding reference queries (free from any issue in [12]).We denote the resulting dataset Students+, with 322 queries having errors/issues.
Our second test dataset, denoted TPCH, is based on TPC-H [7] schema and queries, with synthetic errors injected.This dataset allows us to stress-test Qr-Hint with queries that are more complex than Students.Also, because errors are synthetic, we have the "ground-truth" repair sites and fixes, allowing us to easily assess the optimality of Qr-Hint fixes.Most WHERE conditions in TPC-H queries are conjunctive: we chose 7 TPC-H queries with conjunctions of 4,5,6,7,9,10,11 atomic predicates (TPC-H Query 4,3,10,9,5,8,21 respectively).Since we did not find a TPC-H query with exactly 8 predicates, we synthesized one by removing one   For each query, we then introduced errors into two atomic predicates to make the wrong query, which remained conjunctive.Thus, each pair of wrong and reference queries has 6-13 unique atomic predicates.Furthermore, to test cases beyond conjunctive WHERE conditions, we chose TPC-H Query 7, whose WHERE contains multiple nested AND and OR, and created 5 wrong queries by injecting 1-5 errors by changing atomic predicates or logical operators.For fair comparison, we ensured that the number of unique atomic predicates is always 10 between the reference query and each wrong query.

Results and Discussion
Student+.To test coverage and optimality of Qr-Hint, we ran Qr-Hint for the 322 Student+ queries with errors/issues, along with their reference queries, and examined all Qr-Hint fixes.For the 25 issues in [12] that Qr-Hint should support, we found that they were handled in three ways: 1) 11 of them were indeed errors, and Qr-Hint correctly identified and fixed them all; 2) 3 of them were efficiency/stylistic issues where the queries were semantically still correct (e.g., logically correct WHERE containing some tautological conditions, such as A >= B OR A < B), and Qr-Hint did not flag any error; 3) the remaining 11 of them were also efficiency/stylistic issues (e.g., unnecessarily joining a primary key with its corresponding foreign key but only projecting the foreign key column), but Qr-Hint failed to detect query equivalence in this case and suggested some fixes.This last category is the only case where Qr-Hint showed suboptimal behavior, though its suggested fixes still lead to correct queries, and with the interesting side effect of resolving efficiency/stylistic issues.The detailed analysis can be found in the appendix.It is worth noting that Qr-Hint perfectly handles all of the 10 most common issues in [12].Qr-Hint's average running time per query on Student+ is 0.2 seconds, using DeriveFixes.However, note that most Student+ queries are rather simple, with conjunctive WHERE (which does not need DeriveFixesOPT for optimality) and at most 5 unique atomic predicates.Therefore, we further stress-tested Qr-Hint using TPCH.
TPCH, conjunctive WHERE with varying number of atomic predicates.Here, we study Qr-Hint's running time and optimality (as measured by repair cost, the lower the best) as we vary the number of atomic predicates involved in repairing WHERE.We compare versions of Qr-Hint using DeriveFixes vs. DeriveFixesOPT, both set to explore up to two repair sites.Figure 2a confirms that for conjunctive queries, both always return optimal repairs according to the ground truth, regardless of the size of WHERE.(Note that the repair cost is not proportional to the number of atomic predicates because it is normalized by the query sizes per Equation ( 1)). Figure 2b shows that as expected, both have running times exponential in the number of unique atomic predicates, but DeriveFixes runs much faster than DeriveFixesOPT.Furthermore, the plot labeled "1st Repair Sites" shows that it takes less than one second for Qr-Hint to find the first viable (not necessarily optimal) repair site, so there is additional room to trade optimality for faster running time.
TPCH, WHERE with nested AND/OR and varying number of injected errors.As shown in Figure 3a, when the optimal repair (according to the ground truth) involves only one repair site (a single error), both DeriveFixes and DeriveFixesOPT are able to find this optimal repair, confirming Lemma 5.2.When there are more errors (2-3), DeriveFixes returns suboptimal repairs while DeriveFixesOPT is still able to find optimal or near-optimal repairs (for the cases of 2 and 3 errors, respectively).However, with 4-5 errors-which are arguably not the cases Qr-Hint targets-both suffer from suboptimality because they are set to explore up to two repair sites; in fact, both decided that it was best to just repair the whole WHERE condition.Figure 3b shows that DeriveFixesOPT's better optimality comes at the expense of slower speed than DeriveFixes, however.Interestingly, with 4-5 errors, both run faster than with 2-3 errors, because the large numbers of errors severely limit the number of possibilities of single-and 2-site repairs, speaking to the effectiveness of CreateBounds in quickly spotting and bailing out of difficult situations.
Finally, Figure 4 shows all unpruned viable repairs found during Qr-Hint's course of execution, in terms of when they were found and how much they cost; there is one trace for each execution.Traces for 1 (blue), 4 (red), and 5 (purple) errors degenerate into single dots because Qr-Hint eventually finds only one solution as viable repair options are limited.Recall that we heuristically prioritize the viable repairs to consider, but there is no guarantee that a cheaper repair will always be found earlier.Hence, there are fluctuations in the repair costs over time, although the general trends are up, confirming the effectiveness of our heuristic.Furthermore, note that the lowest-cost repairs tend to surface early during execution.In closing, while the total and worst-case running times of Qr-Hint grow exponentially in query size, in practice the running times are reasonable considering that Qr-Hint is intended for education settings, where returning hints instantaneously may not be necessary or desirable for learning.With the observation that Qr-Hint often returns some low-cost repairs early, we can offer them as preliminary hints to get students thinking, while Qr-Hint continues to look for better repairs in the meantime.

USER STUDY
We conducted a small-scale user study to evaluate Qr-Hint: 1) whether students can understand what is wrong with the suggested hints, and 2) how the hints generated by Qr-Hint compare with ones provided by "expert users" (teaching assistants in our study).
Participants.We recruited 38 students who have taken/are taking a graduate or undergraduate database course.Except for an incentive of receiving a small gift card and practicing SQL, the participation was voluntary.In the end, we collected 15 complete and valid answers.A possible explanation for the low completion rate was the significant effort required to debug SQL queries with subtle mistakes (we observed that some participants took more than an hour to finish).We considered the possibility of recruiting participants from other sources (e.g., Amazon Mechanical Turk), but decided against it because they would not represent our targeted population (students).Furthermore, given the significant effort required from the participants as observed above, it would be hard to incentivize participants who are not actively learning SQL: a low reward would turn them away, while a high reward might encourage undesirable behaviors.
Preparation.To design the survey, we first performed an analysis of the Students queries to get a sense of what the common errors were.Overall, most errors came from WHERE and HAVING (130 out of 341 are wrong due to WHERE); students often missed join conditions for queries involving many tables.Other common errors include incorrect/redundant/missing tables in FROM, incorrect order and missing/redundant expressions in SELECT, and incorrect expressions in GROUP BY.We decided not to use the same queries from Students, as our participants had done the same/similar homework previously, which might bias the results.Nonetheless, based on these observations, we designed four SQL questions using the DBLP schema (details in the appendix).For each question, we crafted a wrong solution containing one or more mistakes: two WHERE errors for  1 , one GROUP BY error and one SELECT error for  2 , one WHERE error for  3 , and one each WHERE and HAVING errors in  4 .Even though the queries are over a different schema, the errors above faithfully reflect real errors from Students, and they are consistent with the common errors found by others [2,12].
Then, we performed a small study with four graduate teaching assistants (TAs) to generate hints for these queries.Each TA was asked to pinpoint all mistakes in each query and offer hints, as if they were helping students debug wrong queries.To simulate an office-hour setting, we asked TAs to finish all four questions in one sitting, with no help from Qr-Hint.We collected all hints provided by the TAs as "expert" hints.
Next, we ran Qr-Hint on all wrong queries to obtain repair sites and fixes.We removed fixes and only showed repair sites to the participants as hints.To prevent participants from recognizing the source of hints (experts vs. Qr-Hint) by their wording, we paraphrased all hints to use a common template "In [SQL clause], [hint]" and standard wording.
Tasks.Using the four queries, each participant saw and completed three questions.Students were required to complete questions on Q1 and Q2, and they completed one of Q3 and Q4 at random.For each question, students were given the database schema, problem statement in English, and the wrong SQL query, and were asked to explain what is wrong with the query.For creating treatment and control groups, students received hints from Qr-Hint for either Q1 or Q2 (not both) at random, and for the other one they were asked to detect errors without any hints provided; the order of the two questions with and without hints was also chosen at random.For the last question, participants received Q3 or Q4 at random, and we showed the union of hints (mixed together) generated by the TAs as well as by Qr-Hint, and asked participants to categorize each hint as one of the following: "Unhelpful or incorrect", "Helpful but require thinking", and "Obvious and giving away the answer".
Participants were asked to finish all questions in one sitting.We recorded the time a participant spent on each question 8 .In our study, for Q1, 8 students answered it with no hints and 7 with hints from Qr-Hint.For Q2, these numbers are 7 and 8 respectively.For the third question, 7 received Q3 and 8 received Q4.
Result and Analysis.Our results for Q1 and Q2 show that participants were better at identifying at least one error in the query given the hints provided by Qr-Hint compared to no hints.As shown in Figure 5a and Figure 5b, 100% and 87.3% of the participants were able to identify at least one of the two errors in the wrong query in Q1 and Q2 respectively after receiving hints from Qr-Hint, as opposed to 14.3% and 71.4% who were able to do so without a hint.While there is a single participant who correctly identified both errors without any hint for Q1, this participant spent more than 20 minutes doing so, while most participants spent no more than 10 minutes on the same question without hints.
Q3 and Q4 are used to evaluate whether Qr-Hint provided hints that are comparable to the ones given by teaching assistants in terms of their quality.For Q3, there are four TA hints and one hint from Qr-Hint; and there are four TA hints and two hints generated by Qr-Hint for Q4.For all responses, we sum up the number of times participants vote for each of the three categories of hint ranks: "Obvious", "Unhelpful", and "Helpful".The results are shown in Figures 6a, 6b.In summary, the quality of TAs' hints varies greatly as perceived by participants.On the other hand, Qr-Hint is consistently perceived by participants as "helpful but require thinking", which might be best suited for classroom settings.

CONCLUSION AND FUTURE WORK
We presented Qr-Hint, a framework for automatically generating hints and suggestions for fixes for a wrong SQL query with respect to a reference query.We developed techniques to fix all clauses in a query and gave theoretical guarantees.There are multiple intriguing directions of future work, including the support of more complex constructs such as subqueries, outer-joins (NULL), and database constraints.There are many steps where the framework evaluates all possible options (e.g., repair sites), hence improving the scalability of the system is also a future work.It will also be interesting to develop techniques to avoid the limitations of SMT solvers in our framework.We are implementing a graphical user interface so that Qr-Hint can better assist students/TAs in database courses.Conducting a larger-scale user study to further understand the effectiveness of Qr-Hint is also important for future work.

A RELATED WORK SUPPLEMENT
Extended Discussion on Testing query equivalence.There are several classical results on query equivalence.Chandra et al. [17] show that equivalence testing of conjunctive queries is NPcomplete.Aho et al. [4] propose tableau to represent the value of a query, which is used to give algorithms for checking equivalence of SPJ queries.Sagiv et al. [50] give a procedure for testing the equivalence of Select-Project-Join-Difference-Union (SPJDU) queries.Klug [40] presents algorithms for checking the equivalence of conjunctive queries with inequalities.The equivalence problem of some classes of queries under bag semantics has been proved to be undecidable [38,39].While the query equivalence problem in general is undecidable [1,53], tools are developed to check the equivalence of various classes of queries with restrictions and assumptions.Cosette [21][22][23] transforms SQL queries to algebraic expressions and uses a decision procedure and rewrite rules to check if the resulting expressions of two queries are equivalent.EQUITAS [58] develops a symbolic representation of SQL queries in first-order logic, and uses satisfiability modulo theories (SMT) to check query equivalence.WeTune [56] builds a query equivalence verifier by utilizing the U-semiring structure [21].These tools for query equivalence do not give hints on how to modify one query to become equivalent to another.

B FROM STAGE SUPPLEMENT B.1 Finding Table Mapping
In this section, we describe our heuristics for determining a table mapping.Since looking at FROM alone does not have enough information for determining a mapping (e.g. when self-join exists), we gather information from other clauses (i.e., WHERE, GROUP BY, SELECT) to make a decision.Given that both  and  ★ share the same multiset of tables, the general idea is to turn the table mapping into establishing one-to-one matching for elements between the two multisets.
To uniquely identify each table in both multisets, we first introduce table signature.We first describe the construction of a table signature for a table in query : (1) Scanning 's WHERE and HAVING, for each selected operator (=, <, >, ≤, ≥, LIKE) and each attribute  in the table, we create a set of attributes that "interact" with  in some atomic predicates in WHERE and/or HAVING (note: we rewrite the predicates to make sure  is on the left-hand side of the operator in the case of inequality).If  does not appear in WHERE or HAVING or it does not appear in a predicate with the selected operator, the corresponding set will be empty.After creating each set, we expand it to the entire equivalence class of its current attributes.
We then replace each attribute in the set with the name of the original table they belong to.(2) Scanning 's GROUP BY, create a set of attributes from the table that appear in any GROUP BY expression.(3) Scanning 's SELECT, for each attribute  in the table, create a set of indices such that this attribute appears in the SELECT expression at the indexed position.In summary, let  denote a table in a query , a table signature is a triple  = (  ,   ,   ), where   is a function   (, ) ↦ →  ,  ∈ Attributes(),  ∈ {=, <, >, ≤, ≥, LIKE}, and  is the set of tables that interact with  in  through ;   ↦ →  s.t. ⊆ Attributes() and ∀ ∈  appear in at least one GROUP BY expression in ; and   is a function   () ↦ → ,  ∈ Attributes(), and  is the set of integers whose indexed SELECT expression in  contain .
With table signatures, let  = {=, <, >, ≤, ≥, LIKE} denote the set of operators, we then define the following metric for calculating a normalized similarity between two signatures  = ( , , ),  ′ = ( ′ ,  ′ ,  ′ ): |Attributes()| Here we define dist as the Jaccard similarity between two sets.Each component is a normalized Jaccard similarity between the corresponding sets, and we take the sum of three Jaccard similarities (i.e. for WHERE, GROUP BY, SELECT respectively) as our final similarity metric.Note that when two sets are empty, we count their Jaccard similarity as 1.
With the similarity metric, we build a bipartite graph where • Partitions 1 and 2 contain the table signatures for tables in  ★ and , respectively.• Each node in partition 1 is connected with all nodes in partition 2 that refer to the same table.The weight of the edge between two nodes is the similarity metric between their signatures.Given the bipartite graph, we select a table mapping that has the maximum cumulative similarity by solving the minimum-weight perfect matching problem.We now use an example to demonstrate the heuristic.
Example 12. Continuing with Example 1, the initial signatures for S1, S2, s1, s2 are shown in Example 4. After replacing attribute names with table names, the final signatures are the following: The weight of the edge between S1 and s1 is calculated as followed: Comparing the WHERE and HAVING in S1 and s1's signatures, the Jaccard distance between  (bar, =) and  ′ (bar, =) is 0 as they do not have common element.Since the other 4 operators do not involve bar, their corresponding sets are all empty and thus yield a Jaccard distance of 1 between S1 and s1.Thus the total Jaccard distance for bar is 4. Similarly, we obtain 5, 3, respectively for beer and price, and the sum of these Jaccard distances is normalized.The Jaccard distance between the GROUP BY is simply 1  2 as there is only one common element.For SELECT, beer and price have the same sets between S1 and s1, so the normalized Jaccard distance is 2  3 .Following the same fashion, the weight of the rest of edges are below: With the signatures, the corresponding bipartite graph is shown below: By negating the weight of each edge, we then convert the prob-

B.2 Proof for Lemma 4.2
Proof of Lemma 4.2.Without loss of generality, suppose that  ★ returns a non-empty result over some database instance .We construct a new database instance  ′ as follows: (1) For each unique table  ∈ Tables( ★ ), duplicate each row in  a number of times equal to a unique prime   (such that   1 ≠   2 for any  1 ≠  2 ).(2) For each table  ∉ Tables( ★ ), make  empty.We have For any  equivalent to  ★ , it must be the case that Tables() ⊆ Tables( ★ ) (disregarding counts); otherwise  ( ′ ) would be empty by construction of  ′ .Furthermore, note that for  to be equivalent to  ★ , we must have It then follows from the Prime Factorization Theorem [29] that Tables() and Tables() contain the same set of tables, and that for each unique ∈ Tables() (or Tables()), |Aliases(, )| = |Aliases( ★ , )|.In other words, Tables() □ = Tables( ★ ).□

C WHERE STAGE SUPPLEMENT
In this section, we give a complete story for the derivation of fixes (i.e.Algorithm 3) and its optimization, as well as give proof to Lemma 5.1, Lemma 5.2, Lemma 5.3 and Lemma 5.4.

C.1 DeriveFixes Revisited
C.1.1 Target Bound Derivation.We first completely map out the story for lines 15-22 of Algorithm 3, which contain the strategy for pushing down target bound based on the repair bounds of the children of a node.For illustration and without loss of generality, we first assume that all ∧, ∨ nodes have only two children ,  ′ .While addressing a particular node , we use the same notation as in Section 5, where [  ,   ], [  ′ ,   ′ ] represent the repair bound of the child ,  ′ respectively, and [ ★ ,  ★ ] denotes the 's target bound, and denote the target bound of ,  ′ respectively.The principle that guides the formulation of the target bound is simple: knowing that we can only pick a formula within the repair bounds at any node (as the repair bounds dictate the possible formula), we would like to tighten the repair bounds as little as possible to form target bounds so that MinFix have more freedom to find a small formula.We use an example to demonstrate why loose bounds potentially yield small formulas.
Example 13.Let  ≡ ( 1 ∨  2 ) ∧  3 and  ≡  1 ∨  2 be the lower and upper bounds, where  1 ,  2 ,  3 are independent atomic predicates.The minimal formula within [, ] is  1 ∨  2 .However, let  ′ ≡  1 ∨  2 ∨  3 be the new upper bound, the minimal formula that falls within [,  ′ ] becomes  3 . 3 is smaller than  1 ∨  2 in terms of the size of their syntax trees.Here we expand the range of [, ] by raising the upper bound (i.e., adding a disjunct to the upper bound).
Symmetrically, we can create a case when range expansion is done by restricting the lower bound (i.e.adding a conjunct to the lower bound).Let  ≡  1 ∧  2 and  ≡ ( 1 ∧  2 ) ∨  3 be the lower and upper bounds.If we restrict  further by constructing  ′ ≡  1 ∧  2 ∧  3 , the minimal formula moves from  1 ∧  2 to  3 for bounds [, ] and [ ′ , ] respectively.
With such a principle, the next question becomes how to tighten children's repair bounds to form target bounds such that we still guarantee combining the children yields the target formula.Situations differ based on the logical operator at each node.
When  is rooted at ∧, the formula has a general form of  ∧  ′ .Assuming both ,  ′ contain some repair sites, we want to have formulas at the children   ,   ′ satisfying the following constraints after fixes are applied: (1)   ∈ [  ,   ] and   ′ ∈ [  ′ ,   ′ ], i.e.,   and   ′ are within their respective repair bounds.(2)   ∧   ′ ∈ [ ★ ,  ★ ], i.e., combining   ,   ′ using ∧ forms a new formula that falls within the target bounds at . (3) The target bounds to be pushed down to ,  ′ are contained in their repair bounds, i.e., (i) Given the above constraints, the relationship among target bounds and repair bounds are depicted in Figure 8a: We now make the following observation that helps us determine how we tighten [  ,   ] and (1) scan through both  ★ and  ★ and extracts all semantically unique atomic predicates.(2) determines a mapping that maps each semantically unique atomic predicate in both  ★ ,  ★ to a set of unique Boolean variables that represent the atomic predicates (e.g.,  =  is semantically equivalent to  + 1 =  + 1, thus they will be mapped to the same Boolean variable).(3) construct a mapping  which maps any subexpression in a predicate (formula) to a Boolean function so that  ( ★ ) and  ( ★ ) return the Boolean functions that represent the truth table of  ★ and  ★ respectively.Given the Boolean functions for  ★ and  ★ (line 2), MinFix then calls a subroutine BuildTruthTable to construct the Boolean function representing the truth table of the minimal formula  ∈ [ ★ ,  ★ ] (line 3 in MinFix) by going through all possible truth value assignments for  ★ and  ★ (line 2 in BuildTruthTable) with the following criteria: • If the conjunction of all atomic predicates in a row is not satisfiable (e.g. =  and  >  cannot both be true simultaneously), we mark the truth value for  by " * " (don't-care) since such situation can never happen (line 4-5 in BuildTruthTable).• If the conjunction of all atomic predicates in a row is satisfiable (line 6-9 in BuildTruthTable): (1) if  ★ and  ★ are evaluated to the same truth value (i.e. both true or both false), then the same truth value will also be assigned to . (2) if  ★ and  ★ are evaluated to different truth values (i.e. false for  ★ and true for  ★ ), then a don't-care is assigned to .The purpose of such a don't-care assignment is to allow more flexibility for Quine-McCluskey's method to minimize  as much as possible.Note that the situation where  ★ is evaluated to true and  ★ is evaluated to false can never happen due to  ★ ⇒  ★ .After obtaining the Boolean function (i.e.truth table) for , MinFix feeds it to a subroutine MinBoolExp which minimizes a given Boolean function  with possible "don't-care outputs" (denoted * ).Boolean minimization in general is NP-complete, but good heuristics exist that often find near-optimal solutions for even a large number of variables.Our implementation uses ESPRESSO [14], but alternatives such as the Quine-McCluskey algorithm [44] can also be used.We demonstrate how MinFix works using the following example.Example 14.Consider the following bounds for a minimal formula :  ★ ≡ ( ≥  ∧  = ) ∨ = , and  ★ ≡  =  ∨ =  ∨ > .Thus, we can construct a truth table for each formula shown in Figure 9. Based on the truth tables, it is clear that contradiction occurs when •  =  and  >  are both true.
• Either  =  or  >  is true but  ≥  is false.
•  ≥  is true but both  =  and  >  are false.
Given MinFix, the fixes computed for the repair sites in Example 8 are the following: •

C.2 Optimization for Algorithm 3
The suboptimality of multiple fixes comes from the independent derivation of the target formula bounds (Algorithm 3).Using the same notations as in Algorithm 3, consider the target bounds for the children of an ∨ node with only two children and their Venn Diagram in Figure 8b: Here we use the Venn diagram to illustrate the source of suboptimality.The regions representing  ★  and  ★  ′ overlap (same for  ★  ,  ★  ′ ).Though their union matches exactly the region of  ★ ( ★ respectively), such overlap indicates an overlap in their semantics, meaning semantic redundancy exists in the target bounds of Children().This implies that any combination of   ∈ [ ★  ,  ★  ] and   ′ ∈ [ ★  ′ ,  ★  ′ ] have semantic overlaps, causing suboptimality in the subsequent derivation of fixes.To illustrate this, reconsider Example 8.The predicates  = ,  > 10, and  < 7 appear in the fix for  4 unnecessarily.
To reduce such suboptimality, we propose to use a new procedure instead of Algorithm 3 to consider all repair sites simultaneously by leveraging the fact that Algorithm 3 returns optimal fix for a single repair site (Lemma 5.2).The overall routine is shown in Algorithm 7.
The general intuition of Algorithm 7 is to start with  ★ and  ★ being the reference formula and updates them in the same manner as Algorithm 3 until the lowest common ancestor (LCA) of all repair sites.Treating the LCA as a single repair site, the corresponding optimal fix lies within its [ ★ ,  ★ ], and Algorithm 7 aims to make up such optimal fix by collectively deriving fixes for all actual repair sites.We next describe the major steps of Algorithm 7.For a concise and clear illustration, we denote the optimal single fix at the LCA with  .

Build consistency
Let the repair sites in  be  = 3 and  = 1 with Boolean variables  1 ,  2 , and their lowest common ancestor be the top-level ∨.We can obtain  ★ ⇔  ★ ⇔  ★ .Therefore, the consistency table can be constructed as shown in Figure 10.

Figure 10: The consistency table for Example 15
The consistency table gives a direct view of the occasions where  and  ′ are consistent (e.g.blue highlights, evaluated to the same truth value) and inconsistent (e.g.red highlights, evaluated to different truth values).This helps us determine how to construct each fix in later steps as we want to avoid any inconsistency between  and  ′ to achieve  ⇔  ′ .

Build constraint table.
After acquiring the consistency table, the next question becomes how to use all available atomic predicates to construct fixes for each repair site while avoiding inconsistencies.For this purpose, we turn all atomic predicates as "constraints" for all Boolean variables that represent the repair sites, thus constructing a "constraint table" (line 7).A constraint table is a truth table where inputs are all the atomic predicates in the consistency table, and the output is the concatenation of truth values of Boolean variables.For each row (i.e.truth assignment of all atomic predicates), the constraint table aggregates and lists all truth assignments of all Boolean variables where  and  ′ are consistent, and these are the potential truth values to be assigned individually to each Boolean variable.
Example 16.Consider the consistency table in Figure 10 from Example 15.The corresponding constraint table is shown in Figure 11.
Compute minimal fixes.The final step is to compute a fix for each repair site according to the constraint table.While a constraint table lists all possible simultaneous truth assignments for all repair sites (e.g. last column of Figure 11), we cannot make independent choices for the truth value of each repair site as dependencies exist.For example, in the highlighted row in Figure 11, if  1 is assigned 0, then  2 can only be assigned 1 for consistency.On the other hand, if  1 is assigned 1, then  2 can be assigned either 0 or 1.At this point, it is unclear how truth values can be assigned to each repair site to obtain minimum fixes, we thus follow a greedy procedure (line 10-14): (1) Randomly pick a repair site  .
(  12. Given the previous procedure, we first give  1 maximum flexibility for constructing its formula, which yields  = 1.We then update the don't-cares accordingly before computing  2 .As for  2 , we follow the same procedure and derive the truth assignment based on the truth values of  1 .Finally, the procedure yields  = 3.These are indeed the optimal fixes.In addition, running Qr-Hint-optimized over Example 5 yields optimal fixes  =  and  > 10 ∧  < 5 for repair sites  4 , ( 10 ,  12 ).

C.3 Proof of Lemma 5.1
Proof of Lemma 5.1.We prove the correctness of WHERE-stage by proving the correctness of repair returned by Algorithm 1, i.e. applying the repair sites and fixes returned by Algorithm 1 yields a new formula  ′ such that  ★ ⇔  ′ .The proof contains two steps: Step 1: Algorithm 2 returns the correct repair sites.Assume that there exists a set of fixes F for a set of repair sites S in .By Lemma 5.3, we can create a lower bound  ⊥ and an upper bound  ⊤ such that  ′ ∈ [ ⊥ ,  ⊤ ], where  ′ is the formula obtained by applying fixes to repair sites.Since  ′ ⇔  ★ ,  ★ ∈ [ ⊥ ,  ⊤ ].Thus, if a set of fixes exists for a set of repair sites,  ★ must fall within the corresponding repair bounds at the root of .This validates the procedure for determining repair sites.
Step 2: Algorithm 3 returns the correct fixes.Given a lower bound  ⊥ and an upper bound  ⊤ for  with respect to S such that  ★ ∈ [ ⊥ ,  ⊤ ], by Lemma 5.4, we can derive a set of fixes F for S through Algorithm 3. □ C.4 Proof for Lemma 5.2 Proof.We use the induction over the structure of  to prove the minimality of the fix  for a single repair site  in .
Base case.The base case is simply when  is  (i.e. the entire  is the repair site).In such case,  is a minimal DNF of  ★ returned by the Quine-McCluskey's method.Thus, removing any clause from  causes  ⇎  ★ .Induction Step.When  is not , there are three possible cases.Case 1.  is in the form of  1 ∧...∧  , where   is the repair site.The target bounds for   are derived to be where the repair bounds of  There are two cases depending on Θ.In the case that Θ is ∧ or ∨, since ∀ ∈ Children() :   ⇒  ′ ⇒   , we have Θ  ∈Children( )   ⇒ Θ  ∈Children( )  ′ ⇒ Θ  ∈Children( )   , which means  ′ is within the bound returned by Line 6 of CreateBounds.In the case that Θ is ¬, since   ⇒  ′ ⇒   , we have ¬  ⇒ ¬ ′ ⇒ ¬  , which means  ′ is within the bound returned by Line 10 of CreateBounds.□ C.6 Proof of Lemma 5.4 Proof of Lemma 5.4.We use induction over the structure of  to prove that for any subtree  in  for which DeriveFixes(, S,  ★ ,  ★ ) is invoked: • (H1)  ★ ⇒  ★ , and the bound [ ★ ,  ★ ] implies (is equivalent or tighter than) the bound returned by CreateBounds(, S). • (H2) The repair returned by DeriveFixes(, S,  ★ ,  ★ ) yields some  ′ ∈ [ ★ ,  ★ ].Note that applying (H2) to the root of  proves Lemma 5.4.
Proving (H1) top-down.The base case is when  is the root of ; we only invoke DeriveFixes if  ★ ∈ CreateBounds(, S), so obviously [ ★ ,  ★ ] implies CreateBounds(, S).For the induction step, assuming that (H1) holds for , we now show that (H1) for each child of  for which DeriveFixes is invoked.There are three cases.
For the inductive step, assuming that (H2) holds for each child of , we now show that (H2) holds for .There are three cases.
Case 1.  has form ¬. By the inductive hypothesis, DeriveFixes on  returns a repair that results in some  ′ such that ¬ ★ ⇒  ′ ⇒ ¬ ★ .Clearly, the same repair, which is returned by DeriveFixes(, S,  ★ ,  ★ ), changes  to ¬ ′ , which satisfies  ★ ⇒ ¬ ′ ⇒  ★ .We show that it is necessary to fix GROUP BY.
Lemma D.1.Consider two single-block SQL queries  1 and  2 , where  1 has no GROUP BY or aggregation, while  2 has GROUP BY and/or aggregation but no HAVING. 1 and  2 cannot be equivalent under bag semantics, assuming that no database constraints are present and there exists some database instance for which either  1 or  2 returns a non-empty result.
Proof of Lemma D.1.Suppose for some instance , both  1 and  2 return the same non-empty results.Pick any  ∈ Tables (𝑄 1 ).Create a new instance  ′ by duplicating the contents of  in the same table (i.e., doubling the multiplicity of each tuple in  ) while keeping all other tables unchanged.Since  1 has no GROUP BY or aggregation, the multiplicity of each tuple in  1 ( ′ ) will be increased by a factor of 2  compared with that in  1 (), where  > 1 is the count of  in Tables (𝑄 1 ) (which is multiset).Hence, the size of  1 ( ′ ) is strictly larger than  1 ().On the other hand, consider  2 , which has GROUP BY and/or aggregation.Between  2 ( ′ ) and  2 (), the grouping of intermediate join result tuples remains the same, except the number of duplicates within each group.Hence, the size of  2 ( ′ ) remains the same as  2 (), which is the total number of groups.Therefore,

E HAVING STAGE SUPPLEMENT
We use the following base context as default for testing satisfiability for HAVING.For brevity, the following assumes all values are integers; if there are columns and literals of different domains, additional constraints will be added analogously.We note that these constraints are not intended to define the aggregation functions precisely; rather, they encode only a subset of their properties that allow SMT solvers to deduce useful equivalences reasonably efficiently.

Algorithm 1 : 4 if
RepairWhere(,  ★ , )Input : a wrong predicate , a correct predicate  ★ , and a cap  on the number of repair sites Output : a repair ( S, F) with minimum cost 1 let S • = ∅, F • = ∅; 2 let  • denote the minimum cost so far, and ∞ initially; 3 foreach set S of ≤ disjoint subtrees in , in ascending | S | order do Cost( S, •) ≥  • then // cost due to # sites alone is already too big 5 return (S • , F • ); // safe to stop now

Example 6 . 1 .
Consider two queries over tables R(A, B) and S(C, D): SELECT B FROM R, S WHERE B=C GROUP BY B, D; -- ★ SELECT C FROM R, S WHERE B=C GROUP BY C+D, C; --

Example 10 .
Consider two queries over R(A, B) and S(C, D): SELECT A FROM R, S WHERE A=C AND A>4 GROUP BY A, B HAVING A > B + 3 AND 2*SUM(D) > 10; -- ★ SELECT A FROM R, S WHERE A=C GROUP BY A, B, C HAVING C > B + 3 AND SUM(D * 2) > 10 AND A>4; -- The two queries are equivalent because A=C in WHERE, because 2* distributes over SUM, and because A>4 can be either in WHERE or HAVING.
g., for SUM(D*2) we introduce array-valued D 2 to represent D*2 values, and we related it to the array-valued D representing D values by asserting ∀ ∈ N : D 2 [] = D[] × 2. Such assertions, along with the WHERE condition and additional inference rules for aggregate functions, go into a context as discussed in Section 3 and illustrated in Example 3. Example 11.For Example 10, HAVING formulae for  ★ ,  are:

Figure 4 :
Figure 4: Cost of repairs found during course of execution predicate from TPC-H Query 5.For each query, we then introduced errors into two atomic predicates to make the wrong query, which remained conjunctive.Thus, each pair of wrong and reference queries has 6-13 unique atomic predicates.Furthermore, to test cases beyond conjunctive WHERE conditions, we chose TPC-H Query 7, whose WHERE contains multiple nested AND and OR, and created 5 wrong queries by injecting 1-5 errors by changing atomic predicates or logical operators.For fair comparison, we ensured that the number of unique atomic predicates is always 10 between the reference query and each wrong query.
Induction step.Suppose  is not atomic and is not itself a repair site.Let Θ = op() denote the logical operator at the root of .Every repair on  (with the given S []) is obtained by (potentially) repairing each child of , but without changing Θ.In other words, every repair  results in  ′ = Θ  ∈Children( )  ′ , where  ′ is the result of some repair of  at sites S[].By the inductive hypothesis, ∀ ∈ Children() :  ′ ∈ [  ,   ] = CreateBounds(, S []).

Table Mappings .
To facilitate analysis in subsequent stages, Qr-Hint needs a way to "unify" table and column references in  and  ★ so that all of them use the same set of table aliases.Definition 1.Given queries  ★ and  over the same schema where Tables( ★ ) □ = Tables(), a table mapping from  ★ to  is a bijective function  : Aliases( ★ ) → Aliases() with the property that two corresponding aliases are always associated with the same table, i.e., ∀ ∈ Aliases( ★ 4. Continuing with Example 1, the following are signatures (one per column) for S1 and S2 in  ★ and s1 and s2 in .
the only child of 6 let Θ = op( ) ; // either ∧ or ∨ at this point 7 foreach  ∈ Children( ) do 8 let [  ,   ] = CreateBounds(, S [ ] ); 9 let R = Children( ) ∩ S ; // children of  being repaired 10 if R = ∅ then let  = ∅ and C = Children( ); 11 else // treat all children being repaired as one 12 let  = Θ  ∈R  and [  ,   ] = [false, true]; 13 let C = Children( ) \ R ∪ { }; 14 let F = ∅ ; // result set of (,  ) pairs to be computed 15 foreach  ∈ C do // Combine bounds from all other children: Here,  1 and  2 denote intermediate result tuples, which are known to satisfy ; we use  [] to denote evaluating  over . 7This approach underlines our algorithm FixGrouping (Algorithm 4).Example 9. Consider the two queries in Example 6.1.The table mapping is trivial and we simply use column names to name variables.We have:  is  = , ì  ★ = [, ], and ì  = [ + , ].The logical statement that establishes the equivalence of grouping is in  WHERE & bar: ={Frequents} ={Frequents} None None HAVING beer: ={Likes, Serves} ={Likes, Serves} ={Likes, Serves} ={Likes, table.We first replace each repair site with a unique Boolean variable, forming a new Boolean predicate  ′ at the LCA, and the goal is to make  ′ ≡  .Because two Boolean formulas are equivalent if they share the same truth table, we then want to observe under what value assignments  and  ′ are consistent (i.e., evaluated to the same truth value) before determining how to construct each individual fix.Therefore, Algorithm 7 first generates a "consistency table" (line 4) where the inputs are all Algorithm 7: MinFixMult(, S,  ★ ,  ★ ) Input : a formula , a set S of disjoint subtrees (repair sites) of , and a target bound [ ★ ,  ★ ] for  to achieve by fixes Output : a repair ( S,  ), where  maps each site in S to a formula 1 let U denote the set of atomic formulas in  that belong to none of the subtrees in S;  = MapAtomPreds( U ∪ { ★ ,  ★ } ); 3 let   =  ( ★ ) and   =  ( ★ );  = [ 1 ,  2 , ...] be the list of subexpressions in S, denoted by the list of Boolean variables ì s = [s 1 , s 2 , ...]; 6 let   be a Boolean function with variables ì a ∥ ì s, obtained from  by replacing each subexpression   with variable s  , and replacing each atomic formula  ∈ U by  ( ); 7 let ℭ = InitFeasibility(ì a, ì s,   ,  ★ ); // Fix one site at a time, and incrementally update feasibility: 8 let  = empty mapping, and I = [1.. dim(ì s) ];   ) = formula obtained from   by replacing each variable a  with atomic formula   ;  = 2  = 3  1 ,  2 Figure 11: Constraint table for Example 15unique Boolean variables and atomic predicates from  ,  ′ , and the formulas being evaluated are  ★ ,  ★ , ,  ′ , here  ★ ,  ★ are present for the derivation of  which follows the same procedures as in MinFix (Algorithm 6).
13 let ℭ = UpdateFeasibility(ì a, ì s, ℭ, ,   ); 14 let I = I \ { }; 15 return ( S,  );  = 1 ) Iterate over each row in the last column of the constraint table and give  the most flexible assignment (i.e.Update the available options in the last column of the constraint table based on the truth assignment of  .Example 17. Continue from Example 16, the derivation process for  1 and  2 are shown in an extended constraint table in Figure as it does not contain any repair site.Running Quine-McCluskey's method over the target bounds of   yields a formula  , which is guaranteed to be in minimal DNF.Since we know  ★ ⇔  and  is in minimal DNF, removing any of the clauses in  would cause  ∧  ⇒  ★ but not vice versa.Case 2.  is in the form of  1 ∨...∨  , where   is the repair site.The target bounds for   are derived to be [⊥ ∨ ( ★ ∧ ¬ Running Quine-McCluskey's method over the target bounds of   yields a formula  , which is guaranteed to be in minimal DNF.Since we know  ★ ⇔   ∨  and  is in minimal DNF, removing any of the clauses in  would cause Case 3.  is in the form of ¬.Here  ★ ⇔ ¬ .Since  is in minimal DNF, removing a clause results in  ★ ⇎ ¬ .Proof of Lemma 5.3.We use induction over the structure of  to prove that for any subtree  in  (for which CreateBounds is invoked), CreateBounds(, S []) returns a correct bound for : i.e., applying any repair to S [] in  will result in a formula  ′ bounded by CreateBounds(, S []).case.Suppose  is an atomic formula, CreateBounds returns [, ] which bounds .When  is a repair site, CreateBounds returns [false, true], which certainly bounds  or any Boolean expression with which we can replace .

Table 4 :
Student Query Statistics

Table 5 :
List of Semantic Errors categorized by Qr-Hint