Abstract
We propose a new approach to synthesize Datalog programs from input-output specifications. Our approach leverages query provenance to scale the counterexample-guided inductive synthesis (CEGIS) procedure for program synthesis. In each iteration of the procedure, a SAT solver proposes a candidate Datalog program, and a Datalog solver evaluates the proposed program to determine whether it meets the desired specification. Failure to satisfy the specification results in additional constraints to the SAT solver. We propose efficient algorithms to learn these constraints based on “why” and “why not” provenance information obtained from the Datalog solver. We have implemented our approach in a tool called ProSynth and present experimental results that demonstrate significant improvements over the state-of-the-art, including in synthesizing invented predicates, reducing running times, and in decreasing variances in synthesis performance. On a suite of 40 synthesis tasks from three different domains, ProSynth is able to synthesize the desired program in 10 seconds on average per task—an order of magnitude faster than baseline approaches—and takes only under a second each for 28 of them.
Supplemental Material
Available for Download
Appendix, containing one additional table of experimental data, and a detailed log file for the working example.
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of Databases: The Logical Level (1st ed.). Pearson.Google Scholar
Digital Library
- Aws Albarghouthi, Paraschos Koutris, Mayur Naik, and Calvin Smith. 2017. Constraint-Based Synthesis of Datalog Programs. In Principles and Practice of Constraint Programming (CP 2017). Springer, 689–706.Google Scholar
Cross Ref
- Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo Martin, Mukund Raghothaman, Sanjit Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-Guided Synthesis. In Formal Methods in Computer-Aided Design (FMCAD 2013) . IEEE, 1–8. Google Scholar
Cross Ref
- Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. DIKU, University of Copenhagen.Google Scholar
- Molham Aref, Balder ten Cate, Todd Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2015) . ACM, 1371–1382.Google Scholar
Digital Library
- Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In Proceedings of the International Conference on Database Theory (ICDT 2001) . Springer, 316–330.Google Scholar
Cross Ref
- Donald Chamberlin and Raymond Boyce. 1974. SEQUEL: A Structured English Query Language. In Proceedings of the 1974 ACM SIGFIDET Workshop on Data Description, Access and Control (SIGFIDET 1974) . ACM, 249–264.Google Scholar
- James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1, 4 (2009), 379–474.Google Scholar
Digital Library
- Laura Chiticariu and Wang-Chiew Tan. 2006. Debugging Schema Mappings with Routes. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006) . VLDB Endowment, 79–90.Google Scholar
Digital Library
- Andrew Cropper and Stephen Muggleton. 2015. Logical Minimisation of Meta-Rules Within Meta-Interpretive Learning. In Inductive Logic Programming . Springer, 62–75.Google Scholar
- Jacek Czerniak and Hubert Zarzycki. 2003. Application of Rough Sets in the Presumptive Diagnosis of Urinary System Diseases. In Artificial Intelligence and Security in Computing Systems. Springer, 41–51.Google Scholar
- Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2008) . Springer, 337–340.Google Scholar
- Daniel Deutch, Amir Gilad, and Yuval Moskovitch. 2015. Selective Provenance for Datalog Programs Using Topk Queries. Proceedings of the VLDB Endowment 8, 12 (Aug. 2015), 1394–1405. Google Scholar
Digital Library
- Daniel Deutch, Tova Milo, Sudeepa Roy, and Val Tannen. 2014. Circuits for Datalog Provenance. In Proceedings of the 17th International Conference on Database Theory (ICDT 2014) . OpenProceedings.org, 201–212. Google Scholar
Cross Ref
- Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, 422–436.Google Scholar
Digital Library
- Pierre Flener and Serap Yilmaz. 1999. Inductive Synthesis of Recursive Logic Programs: Achievements and Prospects. The Journal of Logic Programming 41, 2 (1999), 141–195. Google Scholar
Cross Ref
- Todd Green, Grigoris Karvounarakis, Zachary Ives, and Val Tannen. 2007b. Update Exchange with Mappings and Provenance. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007). VLDB Endowment, 675–686. http://dl.acm.org/citation.cfm?id=1325851.1325929Google Scholar
Digital Library
- Todd Green, Grigoris Karvounarakis, and Val Tannen. 2007a. Provenance Semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2007) . ACM, 31–40. Google Scholar
Digital Library
- Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. In Proceedings of the International Conference on Computer Aided Verification (CAV 2016) . Springer, 422–430.Google Scholar
Cross Ref
- Grigoris Karvounarakis, Zachary Ives, and Val Tannen. 2010. Querying Data Provenance. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2010) . ACM, 951–962.Google Scholar
Digital Library
- Emanuel Kitzelmann. 2010. Inductive Programming: A Survey of Program Synthesis Techniques. In Approaches and Applications of Inductive Programming . Springer, 50–73.Google Scholar
- Sven Köhler, Bertram Ludäscher, and Yannis Smaragdakis. 2012. Declarative Datalog Debugging for Mere Mortals. In Datalog in Academia and Industry . Springer, 111–122.Google Scholar
- Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2019. PUG: A Framework and Practical Implementation for Why and Why-not Provenance. The VLDB Journal 28, 1 (Feb. 2019), 47–71.Google Scholar
Digital Library
- Ana Milanova, Atanas Rountev, and Barbara Ryder. 2002. Parameterized Object Sensitivity for Points-to and Side-effect Analyses for Java. In Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2002) . ACM, 1–11.Google Scholar
Digital Library
- Stephen Muggleton. 1995. Inverse Entailment and Progol. New Generation Computing 13, 3 (Dec. 1995), 245–286. Google Scholar
Digital Library
- Stephen Muggleton. 1999. Scientific Knowledge Discovery Using Inductive Logic Programming. Commun. ACM 42, 11 (Nov. 1999), 42–46.Google Scholar
- Stephen Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad. 2015. Meta-interpretive Learning of Higher-order Dyadic Datalog: Predicate Invention Revisited. Machine Learning 100, 1 (01 July 2015), 49–73. Google Scholar
Digital Library
- Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2005. Abstract DPLL and Abstract DPLL Modulo Theories. In International Conference on Logic for Programming Artificial Intelligence and Reasoning . Springer, 36–50.Google Scholar
- J. Ross Quinlan and Mike Cameron-Jones. 1995. Induction of Logic Programs: FOIL and Related Systems. New Generation Computing 13, 3 (Dec. 1995), 287–312. Google Scholar
Digital Library
- Luc De Raedt. 2008. Logical and Relational Learning. Springer.Google Scholar
- Luc De Raedt and Kristian Kersting. 2008. Probabilistic Inductive Logic Programming. Springer, 1–27. Google Scholar
Cross Ref
- Anish Das Sarma, Martin Theobald, and Jennifer Widom. 2008. Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE 2008). IEEE, 1023–1032.Google Scholar
Digital Library
- Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-guided Synthesis of Datalog Programs. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, 515–527.Google Scholar
Digital Library
- Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. 2019. Synthesizing Datalog Programs Using Numerical Relaxation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). AAAI Press, 6117–6124.Google Scholar
Cross Ref
- Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-sensitivity. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011) . ACM, 17–30. Google Scholar
Digital Library
- Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII) . ACM, 404–415. Google Scholar
Digital Library
- Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-output Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, 452–466.Google Scholar
Digital Library
- Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2018. Speeding Up Symbolic Reasoning for Relational Queries. Proceedings of the ACM on Programming Languages 2, OOPSLA, Article 157 (Oct. 2018), 25 pages.Google Scholar
Digital Library
- John Whaley and Monica Lam. 2004. Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2004) . ACM, 131–144. Google Scholar
Digital Library
- Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISan: Sanitizing API Usages Through Semantic Cross-Checking. In Proceedings of the 25th USENIX Security Symposium. USENIX Association, 363–378. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/yunGoogle Scholar
- Andreas Zeller. 1999. Yesterday, My Program Worked. Today, It Does Not. Why?. In Proceedings of the Joint 7th European Software Engineering Conference and the 7th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-7) . Springer, 253–267.Google Scholar
Cross Ref
- Qiang Zeng, Jignesh Patel, and David Page. 2014. QuickFOIL: Scalable Inductive Logic Programming. Proceedings of the VLDB Endowment 8, 3 (Nov. 2014), 197–208.Google Scholar
Digital Library
- Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On Abstraction Refinement for Program Analyses in Datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2014) . ACM, 239–248.Google Scholar
Digital Library
- David Zhao, Pavle Subotić, and Bernhard Scholz. 2019. Provenance for Large-scale Datalog. arXiv: 1907.05045 In submission.Google Scholar
- Moshé Zloof. 1975. Query by Example. In Proceedings of the National Computer Conference and Exposition (AFIPS 1975). ACM, 431–438.Google Scholar
Index Terms
Provenance-guided synthesis of Datalog programs
Recommendations
Syntax-guided synthesis of Datalog programs
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringDatalog has witnessed promising applications in a variety of domains. We propose a programming-by-example system, ALPS, to synthesize Datalog programs from input-output examples. Scaling synthesis to realistic programs in this manner is challenging due ...
Algorithmic program synthesis: introduction
Program synthesis is a process of producing an executable program from a specification. Algorithmic synthesis produces the program automatically, without an intervention from an expert. While classical compilation falls under the definition of ...
Can reactive synthesis and syntax-guided synthesis be friends?
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationWhile reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...






Comments