skip to main content
research-article
Open access

Better Together: Unifying Datalog and Equality Saturation

Published: 06 June 2023 Publication History

Abstract

We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, egglog supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, egglog supports term rewriting, efficient congruence closure, and extraction of optimized terms.
We identify two recent applications -- a unification-based pointer analysis in Datalog and an EqSat-based floating-point term rewriter -- that have been hampered by features missing from Datalog but found in EqSat or vice-versa. We evaluate our system by reimplementing those projects in egglog. The resulting systems in egglog are faster, simpler, and fix bugs found in the original systems.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. isbn:0-201-53771-0 http://webdam.inria.fr/Alice/
[2]
Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu, and Yisu Remy Wang. 2022. Convergence of Datalog over (Pre-) Semirings. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’22). Association for Computing Machinery, New York, NY, USA. 105–117. isbn:9781450392600 https://doi.org/10.1145/3517804.3524140
[3]
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). Association for Computing Machinery, New York, NY, USA. 1371–1382. isbn:9781450327589 https://doi.org/10.1145/2723372.2742796
[4]
George Balatsouras and Yannis Smaragdakis. 2016. Structure-Sensitive Points-To Analysis for C and C++. In Static Analysis - 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Proceedings, Xavier Rival (Ed.) (Lecture Notes in Computer Science, Vol. 9837). Springer, 84–104. https://doi.org/10.1007/978-3-662-53413-7_5
[5]
Isaac Balbin and Kotagiri Ramamohanarao. 1987. A Generalization of the Differential Approach to Recursive Query Evaluation. J. Log. Program., 4, 3 (1987), sep, 259–262. issn:0743-1066 https://doi.org/10.1016/0743-1066(87)90004-5
[6]
Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In Proceedings of the 23rd International Conference on Computer Aided Verification (CAV’11). Springer-Verlag, Berlin, Heidelberg. 171–177. isbn:9783642221095
[7]
Langston Barrett and Scott Moore. 2022. cclyzer++: Scalable and Precise Pointer Analysis for LLVM. https://galois.com/blog/2022/08/cclyzer-scalable-and-precise-pointer-analysis-for-llvm/
[8]
Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. 2018. The Vadalog System: Datalog-Based Reasoning for Knowledge Graphs. Proc. VLDB Endow., 11, 9 (2018), may, 975–987. issn:2150-8097 https://doi.org/10.14778/3213880.3213888
[9]
Aaron Bembenek, Michael Greenberg, and Stephen Chong. 2020. Formulog: Datalog for SMT-based static analysis. Proc. ACM Program. Lang., 4, OOPSLA (2020), 141:1–141:31. https://doi.org/10.1145/3428209
[10]
Michael Benedikt, George Konstantinidis, Giansalvatore Mecca, Boris Motik, Paolo Papotti, Donatello Santoro, and Efthymia Tsamoura. 2017. Benchmarking the Chase. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS ’17). Association for Computing Machinery, New York, NY, USA. 37–52. isbn:9781450341981 https://doi.org/10.1145/3034786.3034796
[11]
Martin E. Bidlingmaier. 2023. Algebraic Semantics of Datalog with Equality. arxiv:2302.03167.
[12]
Martin E. Bidlingmaier. 2023. An Evaluation Algorithm for Datalog with Equality. arxiv:2302.05792.
[13]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). Association for Computing Machinery, New York, NY, USA. 243–262. isbn:9781605587660 https://doi.org/10.1145/1640089.1640108
[14]
Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. 2009. A General Datalog-Based Framework for Tractable Query Answering over Ontologies. In Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’09). Association for Computing Machinery, New York, NY, USA. 77–86. isbn:9781605585536 https://doi.org/10.1145/1559795.1559809
[15]
Alessandro Cheli. 2021. Metatheory.jl: Fast and Elegant Algebraic Computation in Julia with Extensible Equality Saturation. Journal of Open Source Software, 6, 59 (2021), 3078. https://doi.org/10.21105/joss.03078
[16]
Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein, and David Maier. 2012. Logic and lattices for distributed programming. In ACM Symposium on Cloud Computing, SOCC ’12, San Jose, CA, USA, October 14-17, 2012. 1. https://doi.org/10.1145/2391229.2391230
[17]
Leonardo de Moura and Nikolaj Bjørner. 2007. Efficient E-Matching for SMT Solvers. In Automated Deduction – CADE-21, Frank Pfenning (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 183–198. isbn:978-3-540-73595-3
[18]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidelberg. 337–340. isbn:3-540-78799-2, 978-3-540-78799-0 http://dl.acm.org/citation.cfm?id=1792734.1792766
[19]
David Detlefs, Greg Nelson, and James B. Saxe. 2005. Simplify: A Theorem Prover for Program Checking. J. ACM, 52, 3 (2005), May, 365–473. issn:0004-5411 https://doi.org/10.1145/1066100.1066102
[20]
Alin Deutsch, Alan Nash, and Jeff Remmel. 2008. The Chase Revisited. In Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’08). Association for Computing Machinery, New York, NY, USA. 149–158. isbn:9781605581521 https://doi.org/10.1145/1376916.1376938
[21]
Rel developers. [n.d.]. Rel reference. https://docs.relational.ai/rel/ref/overview
[22]
Soufflé Developers. [n.d.]. Soufflé Algebraic Data Types. https://souffle-lang.github.io/types#algebraic-data-types-adt Accessed: 2022-11-01.
[23]
Peter J. Downey, Ravi Sethi, and Robert Endre Tarjan. 1980. Variations on the Common Subexpression Problem. J. ACM, 27, 4 (1980), 1 Oct., 758–771. issn:0004-5411 https://doi.org/10.1145/322217.322228
[24]
Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. 2003. Data Exchange: Semantics and Query Answering. In Database Theory — ICDT 2003, Diego Calvanese, Maurizio Lenzerini, and Rajeev Motwani (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 207–224. isbn:978-3-540-36285-2
[25]
Oliver Flatt, Samuel Coward, Max Willsey, Zachary Tatlock, and Pavel Panchekha. 2022. Small Proofs from Congruence Closure. In Proceedings of The 22nd Conference on Formal Methods in Computer-Aided Design (FMCAD ’22). 3, 75. https://doi.org/20.500.12708/81325
[26]
Thom Frühwirth. 1998. Theory and practice of constraint handling rules. The Journal of Logic Programming, 37, 1 (1998), 95–138. issn:0743-1066 https://doi.org/10.1016/S0743-1066(98)10005-5
[27]
Xiaowen Hu, Joshua Karp, David Zhao, Abdul Zreika, Xi Wu, and Bernhard Scholz. 2021. The Choice Construct in the Soufflé Language. In Programming Languages and Systems: 19th Asian Symposium, APLAS 2021, Chicago, IL, USA, October 17–18, 2021, Proceedings. Springer-Verlag, Berlin, Heidelberg. 163–181. isbn:978-3-030-89050-6 https://doi.org/10.1007/978-3-030-89051-3_10
[28]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430.
[29]
Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A Goal-directed Superoptimizer. SIGPLAN Not., 37, 5 (2002), May, 304–314. issn:0362-1340 https://doi.org/10.1145/543552.512566
[30]
Paris C. Kanellakis and Peter Z. Revesz. 1989. On the relationship of congruence closureand unification. Journal of Symbolic Computation, 7, 3 (1989), 427–444. issn:0747-7171 https://doi.org/10.1016/S0747-7171(89)80018-5 Unification: Part 1.
[31]
Phokion G. Kolaitis and Christos H. Papadimitriou. 1988. Why Not Negation by Fixpoint? In Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS ’88). Association for Computing Machinery, New York, NY, USA. 231–239. isbn:0897912632 https://doi.org/10.1145/308386.308446
[32]
Gerhard Köstler, Werner Kiessling, Helmut Thöne, and Ulrich Güntzer. 1995. Fixpoint Iteration with Subsumption in Deductive Databases. J. Intell. Inf. Syst., 4, 2 (1995), mar, 123–148. issn:0925-9902 https://doi.org/10.1007/BF00961871
[33]
Ravi Krishnamurthy and Shamim A. Naqvi. 1988. Non-Deterministic Choice in Datalog. In JCDKB.
[34]
Chris Lattner and Vikram Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. 75–86. https://doi.org/10.1109/CGO.2004.1281665
[35]
Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták. 2016. From Datalog to Flix: A Declarative Language for Fixed Points on Lattices. SIGPLAN Not., 51, 6 (2016), jun, 194–208. issn:0362-1340 https://doi.org/10.1145/2980983.2908096
[36]
Chandrakana Nandi, Max Willsey, Adam Anderson, James R. Wilcox, Eva Darulova, Dan Grossman, and Zachary Tatlock. 2020. Synthesizing Structured CAD Models with Equality Saturation and Inverse Transformations. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 31–44. isbn:9781450376136 https://doi.org/10.1145/3385412.3386012
[37]
Chandrakana Nandi, Max Willsey, Amy Zhu, Yisu Remy Wang, Brett Saiki, Adam Anderson, Adriana Schulz, Dan Grossman, and Zachary Tatlock. 2021. Rewrite Rule Inference Using Equality Saturation. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 119, oct, 28 pages. https://doi.org/10.1145/3485496
[38]
Patrick Nappa, David Zhao, Pavle Subotić, and Bernhard Scholz. 2019. Fast Parallel Equivalence Relations in a Datalog Compiler. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 82–96. https://doi.org/10.1109/PACT.2019.00015
[39]
Charles Gregory Nelson. 1980. Techniques for Program Verification. Ph.D. Dissertation. Stanford University. Stanford, CA, USA. AAI8011683.
[40]
Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-case optimal join algorithms. Journal of the ACM (JACM), 65, 3 (2018), 1–40.
[41]
Robert Nieuwenhuis and Albert Oliveras. 2005. Proof-Producing Congruence Closure. In Proceedings of the 16th International Conference on Term Rewriting and Applications (RTA’05). Springer-Verlag, Berlin, Heidelberg. 453–468. isbn:3540255966 https://doi.org/10.1007/978-3-540-32033-3_33
[42]
Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. SIGPLAN Not., 50, 6 (2015), June, 1–11. issn:0362-1340 https://doi.org/10.1145/2813885.2737959
[43]
Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. SIGPLAN Not., 50, 10 (2015), oct, 107–126. issn:0362-1340 https://doi.org/10.1145/2858965.2814310
[44]
Kenneth A. Ross and Yehoshua Sagiv. 1992. Monotonic Aggregation in Deductive Databases. In Proceedings of the Eleventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS ’92). Association for Computing Machinery, New York, NY, USA. 114–126. isbn:0897915194 https://doi.org/10.1145/137097.137852
[45]
Rust. [n.d.]. Rust programming language. https://www.rust-lang.org/
[46]
Arash Sahebolamri, Thomas Gilray, and Kristopher Micinski. 2022. Seamless deductive inference via macros. In Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction. 77–88.
[47]
Yannis Smaragdakis and Martin Bravenboer. 2010. Using Datalog for Fast and Easy Program Analysis. In Proceedings of the First International Conference on Datalog Reloaded (Datalog’10). Springer-Verlag, Berlin, Heidelberg. 245–251. isbn:9783642242052 https://doi.org/10.1007/978-3-642-24206-9_14
[48]
Bjarne Steensgaard. 1996. Points-to Analysis in Almost Linear Time. In Conference Record of POPL’96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Papers Presented at the Symposium, St. Petersburg Beach, Florida, USA, January 21-24, 1996, Hans-Juergen Boehm and Guy L. Steele Jr. (Eds.). ACM Press, 32–41. https://doi.org/10.1145/237721.237727
[49]
Tamás Szabó, Gábor Bergmann, Sebastian Erdweg, and Markus Voelter. 2018. Incrementalizing Lattice-Based Program Analyses in Datalog. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 139, oct, 29 pages. https://doi.org/10.1145/3276509
[50]
Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM, 22, 2 (1975), April, 215–225. issn:0004-5411 https://doi.org/10.1145/321879.321884
[51]
Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2009. Equality Saturation: A New Approach to Optimization. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’09). ACM, New York, NY, USA. 264–276. isbn:978-1-60558-379-2 https://doi.org/10.1145/1480881.1480915
[52]
Allen Van Gelder. 1992. The Well-Founded Semantics of Aggregation. In Proceedings of the Eleventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS ’92). Association for Computing Machinery, New York, NY, USA. 127–138. isbn:0897915194 https://doi.org/10.1145/137097.137854
[53]
Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. 2021. Vectorization for Digital Signal Processors via Equality Saturation. Association for Computing Machinery, New York, NY, USA. 874–886. isbn:9781450383172 https://doi.org/10.1145/3445814.3446707
[54]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Program Synthesis Using Abstraction Refinement. Proc. ACM Program. Lang., 2, POPL (2017), Article 63, dec, 30 pages. https://doi.org/10.1145/3158151
[55]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Synthesis of Data Completion Scripts Using Finite Tree Automata. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 62, oct, 26 pages. https://doi.org/10.1145/3133886
[56]
Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, and Dan Suciu. 2020. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. Proceedings of the VLDB Endowment.
[57]
John Whaley, Dzintars Avots, Michael Carbin, and Monica S. Lam. 2005. Using Datalog with Binary Decision Diagrams for Program Analysis. In Proceedings of the Third Asian Conference on Programming Languages and Systems (APLAS’05). Springer-Verlag, Berlin, Heidelberg. 97–118. isbn:3540297359 https://doi.org/10.1007/11575467_8
[58]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proc. ACM Program. Lang., 5, POPL (2021), Article 23, jan, 29 pages. https://doi.org/10.1145/3434304
[59]
Steven Wolfman, Pedro Domingos, and Daniel Weld. 2001. Programming By Demonstration Using Version Space Algebra. Machine Learning, 53 (2001), 12, https://doi.org/10.1023/A:1025671410623
[60]
Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. In Proceedings of Machine Learning and Systems. arxiv:2101.01332.
[61]
Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. 2023. Artifact of "Better Together: Unifying Datalog and Equality Saturation". Mar, https://doi.org/10.5281/zenodo.7709794
[62]
Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. 2023. Better Together: Unifying Datalog and Equality Saturation. arxiv:2304.04332.
[63]
Yihong Zhang, Yisu Remy Wang, Max Willsey, and Zachary Tatlock. 2022. Relational E-Matching. Proc. ACM Program. Lang., 6, POPL (2022), Article 35, jan, 22 pages. https://doi.org/10.1145/3498696

Cited By

View all
  • (2024)Fast and Optimal Extraction for Sparse Equality GraphsProceedings of the ACM on Programming Languages10.1145/36898018:OOPSLA2(2551-2577)Online publication date: 8-Oct-2024
  • (2024)A Typed Multi-level Datalog IR and Its Compiler FrameworkProceedings of the ACM on Programming Languages10.1145/36897678:OOPSLA2(1586-1614)Online publication date: 8-Oct-2024
  • (2024)PolyJuice: Detecting Mis-compilation Bugs in Tensor Compilers with Equality Saturation Based RewritingProceedings of the ACM on Programming Languages10.1145/36897578:OOPSLA2(1309-1335)Online publication date: 8-Oct-2024
  • Show More Cited By

Index Terms

  1. Better Together: Unifying Datalog and Equality Saturation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 7, Issue PLDI
      June 2023
      2020 pages
      EISSN:2475-1421
      DOI:10.1145/3554310
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution 4.0 International License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 June 2023
      Published in PACMPL Volume 7, Issue PLDI

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. Datalog
      2. Equality saturation
      3. Program optimization
      4. Rewrite systems

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,215
      • Downloads (Last 6 weeks)151
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Fast and Optimal Extraction for Sparse Equality GraphsProceedings of the ACM on Programming Languages10.1145/36898018:OOPSLA2(2551-2577)Online publication date: 8-Oct-2024
      • (2024)A Typed Multi-level Datalog IR and Its Compiler FrameworkProceedings of the ACM on Programming Languages10.1145/36897678:OOPSLA2(1586-1614)Online publication date: 8-Oct-2024
      • (2024)PolyJuice: Detecting Mis-compilation Bugs in Tensor Compilers with Equality Saturation Based RewritingProceedings of the ACM on Programming Languages10.1145/36897578:OOPSLA2(1309-1335)Online publication date: 8-Oct-2024
      • (2024)Modeling Erlang Compiler IR as SMT FormulasProceedings of the 23rd ACM SIGPLAN International Workshop on Erlang10.1145/3677995.3678193(45-54)Online publication date: 28-Aug-2024
      • (2024)SpEQ: Translation of Sparse Codes using EquivalencesProceedings of the ACM on Programming Languages10.1145/36564458:PLDI(1680-1703)Online publication date: 20-Jun-2024
      • (2024)Transforming Optimization Problems into Disciplined Convex Programming FormIntelligent Computer Mathematics10.1007/978-3-031-66997-2_11(183-202)Online publication date: 5-Aug-2024
      • (2023)Bring Your Own Data Structures to DatalogProceedings of the ACM on Programming Languages10.1145/36228407:OOPSLA2(1198-1223)Online publication date: 16-Oct-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media