skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Reusable

Provenance-guided synthesis of Datalog programs

Published:20 December 2019Publication History
Skip Abstract Section

Abstract

We propose a new approach to synthesize Datalog programs from input-output specifications. Our approach leverages query provenance to scale the counterexample-guided inductive synthesis (CEGIS) procedure for program synthesis. In each iteration of the procedure, a SAT solver proposes a candidate Datalog program, and a Datalog solver evaluates the proposed program to determine whether it meets the desired specification. Failure to satisfy the specification results in additional constraints to the SAT solver. We propose efficient algorithms to learn these constraints based on “why” and “why not” provenance information obtained from the Datalog solver. We have implemented our approach in a tool called ProSynth and present experimental results that demonstrate significant improvements over the state-of-the-art, including in synthesizing invented predicates, reducing running times, and in decreasing variances in synthesis performance. On a suite of 40 synthesis tasks from three different domains, ProSynth is able to synthesize the desired program in 10 seconds on average per task—an order of magnitude faster than baseline approaches—and takes only under a second each for 28 of them.

Skip Supplemental Material Section

Supplemental Material

a62-raghothaman.webm

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of Databases: The Logical Level (1st ed.). Pearson.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aws Albarghouthi, Paraschos Koutris, Mayur Naik, and Calvin Smith. 2017. Constraint-Based Synthesis of Datalog Programs. In Principles and Practice of Constraint Programming (CP 2017). Springer, 689–706.Google ScholarGoogle ScholarCross RefCross Ref
  3. Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo Martin, Mukund Raghothaman, Sanjit Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-Guided Synthesis. In Formal Methods in Computer-Aided Design (FMCAD 2013) . IEEE, 1–8. Google ScholarGoogle ScholarCross RefCross Ref
  4. Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. DIKU, University of Copenhagen.Google ScholarGoogle Scholar
  5. Molham Aref, Balder ten Cate, Todd Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2015) . ACM, 1371–1382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In Proceedings of the International Conference on Database Theory (ICDT 2001) . Springer, 316–330.Google ScholarGoogle ScholarCross RefCross Ref
  7. Donald Chamberlin and Raymond Boyce. 1974. SEQUEL: A Structured English Query Language. In Proceedings of the 1974 ACM SIGFIDET Workshop on Data Description, Access and Control (SIGFIDET 1974) . ACM, 249–264.Google ScholarGoogle Scholar
  8. James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1, 4 (2009), 379–474.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Laura Chiticariu and Wang-Chiew Tan. 2006. Debugging Schema Mappings with Routes. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006) . VLDB Endowment, 79–90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrew Cropper and Stephen Muggleton. 2015. Logical Minimisation of Meta-Rules Within Meta-Interpretive Learning. In Inductive Logic Programming . Springer, 62–75.Google ScholarGoogle Scholar
  11. Jacek Czerniak and Hubert Zarzycki. 2003. Application of Rough Sets in the Presumptive Diagnosis of Urinary System Diseases. In Artificial Intelligence and Security in Computing Systems. Springer, 41–51.Google ScholarGoogle Scholar
  12. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2008) . Springer, 337–340.Google ScholarGoogle Scholar
  13. Daniel Deutch, Amir Gilad, and Yuval Moskovitch. 2015. Selective Provenance for Datalog Programs Using Topk Queries. Proceedings of the VLDB Endowment 8, 12 (Aug. 2015), 1394–1405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Daniel Deutch, Tova Milo, Sudeepa Roy, and Val Tannen. 2014. Circuits for Datalog Provenance. In Proceedings of the 17th International Conference on Database Theory (ICDT 2014) . OpenProceedings.org, 201–212. Google ScholarGoogle ScholarCross RefCross Ref
  15. Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, 422–436.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pierre Flener and Serap Yilmaz. 1999. Inductive Synthesis of Recursive Logic Programs: Achievements and Prospects. The Journal of Logic Programming 41, 2 (1999), 141–195. Google ScholarGoogle ScholarCross RefCross Ref
  17. Todd Green, Grigoris Karvounarakis, Zachary Ives, and Val Tannen. 2007b. Update Exchange with Mappings and Provenance. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007). VLDB Endowment, 675–686. http://dl.acm.org/citation.cfm?id=1325851.1325929Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Todd Green, Grigoris Karvounarakis, and Val Tannen. 2007a. Provenance Semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2007) . ACM, 31–40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. In Proceedings of the International Conference on Computer Aided Verification (CAV 2016) . Springer, 422–430.Google ScholarGoogle ScholarCross RefCross Ref
  20. Grigoris Karvounarakis, Zachary Ives, and Val Tannen. 2010. Querying Data Provenance. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2010) . ACM, 951–962.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Emanuel Kitzelmann. 2010. Inductive Programming: A Survey of Program Synthesis Techniques. In Approaches and Applications of Inductive Programming . Springer, 50–73.Google ScholarGoogle Scholar
  22. Sven Köhler, Bertram Ludäscher, and Yannis Smaragdakis. 2012. Declarative Datalog Debugging for Mere Mortals. In Datalog in Academia and Industry . Springer, 111–122.Google ScholarGoogle Scholar
  23. Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2019. PUG: A Framework and Practical Implementation for Why and Why-not Provenance. The VLDB Journal 28, 1 (Feb. 2019), 47–71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ana Milanova, Atanas Rountev, and Barbara Ryder. 2002. Parameterized Object Sensitivity for Points-to and Side-effect Analyses for Java. In Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2002) . ACM, 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Stephen Muggleton. 1995. Inverse Entailment and Progol. New Generation Computing 13, 3 (Dec. 1995), 245–286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stephen Muggleton. 1999. Scientific Knowledge Discovery Using Inductive Logic Programming. Commun. ACM 42, 11 (Nov. 1999), 42–46.Google ScholarGoogle Scholar
  27. Stephen Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad. 2015. Meta-interpretive Learning of Higher-order Dyadic Datalog: Predicate Invention Revisited. Machine Learning 100, 1 (01 July 2015), 49–73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2005. Abstract DPLL and Abstract DPLL Modulo Theories. In International Conference on Logic for Programming Artificial Intelligence and Reasoning . Springer, 36–50.Google ScholarGoogle Scholar
  29. J. Ross Quinlan and Mike Cameron-Jones. 1995. Induction of Logic Programs: FOIL and Related Systems. New Generation Computing 13, 3 (Dec. 1995), 287–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Luc De Raedt. 2008. Logical and Relational Learning. Springer.Google ScholarGoogle Scholar
  31. Luc De Raedt and Kristian Kersting. 2008. Probabilistic Inductive Logic Programming. Springer, 1–27. Google ScholarGoogle ScholarCross RefCross Ref
  32. Anish Das Sarma, Martin Theobald, and Jennifer Widom. 2008. Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE 2008). IEEE, 1023–1032.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-guided Synthesis of Datalog Programs. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, 515–527.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. 2019. Synthesizing Datalog Programs Using Numerical Relaxation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). AAAI Press, 6117–6124.Google ScholarGoogle ScholarCross RefCross Ref
  35. Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-sensitivity. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011) . ACM, 17–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII) . ACM, 404–415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-output Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, 452–466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2018. Speeding Up Symbolic Reasoning for Relational Queries. Proceedings of the ACM on Programming Languages 2, OOPSLA, Article 157 (Oct. 2018), 25 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. John Whaley and Monica Lam. 2004. Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2004) . ACM, 131–144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISan: Sanitizing API Usages Through Semantic Cross-Checking. In Proceedings of the 25th USENIX Security Symposium. USENIX Association, 363–378. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/yunGoogle ScholarGoogle Scholar
  41. Andreas Zeller. 1999. Yesterday, My Program Worked. Today, It Does Not. Why?. In Proceedings of the Joint 7th European Software Engineering Conference and the 7th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-7) . Springer, 253–267.Google ScholarGoogle ScholarCross RefCross Ref
  42. Qiang Zeng, Jignesh Patel, and David Page. 2014. QuickFOIL: Scalable Inductive Logic Programming. Proceedings of the VLDB Endowment 8, 3 (Nov. 2014), 197–208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On Abstraction Refinement for Program Analyses in Datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2014) . ACM, 239–248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. David Zhao, Pavle Subotić, and Bernhard Scholz. 2019. Provenance for Large-scale Datalog. arXiv: 1907.05045 In submission.Google ScholarGoogle Scholar
  45. Moshé Zloof. 1975. Query by Example. In Proceedings of the National Computer Conference and Exposition (AFIPS 1975). ACM, 431–438.Google ScholarGoogle Scholar

Index Terms

  1. Provenance-guided synthesis of Datalog programs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!