Abstract
Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in the input.
We propose library learning modulo theory (LLMT), a new library learning algorithm that additionally takes as input an equational theory for a given problem domain. LLMT uses e-graphs and equality saturation to compactly represent the space of programs equivalent modulo the theory, and uses a novel e-graph anti-unification technique to find common patterns in the corpus more directly and efficiently.
We implemented LLMT in a tool named babble. Our evaluation shows that babble achieves better compression orders of magnitude faster than the state of the art. We also provide a qualitative evaluation showing that babble learns reusable functions on inputs previously out of reach for library learning.
- Matt Bowers. 2022. Compression Benchmark. https://github.com/mlb2251/compression_benchmark
Google Scholar
- Matthew Bowers, Theo X. Olausson, Catherine Wong, Gabriel Grand, Joshua B. Tenenbaum, Kevin Ellis, and Armando Solar-Lezama. 2023. Top-Down Synthesis For Library Learning. Proceedings of the ACM on Programming Languages, 7, POPL (2023), https://doi.org/10.1145/3571234
Google Scholar
Digital Library
- Peter E. Bulychev, Egor V. Kostylev, and Vladimir A. Zakharov. 2010. Anti-unification Algorithms and Their Applications in Program Analysis. In Perspectives of Systems Informatics, Amir Pnueli, Irina Virbitskaite, and Andrei Voronkov (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 413–423. isbn:978-3-642-11486-1
Google Scholar
- David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, and Nadia Polikarpova. 2022. Artifact for “: Learning Better Abstractions with E-Graphs and Anti-unification”. https://doi.org/10.5281/zenodo.7120897 Canonical source is on Github: https://github.com/ dcao/babble/blob/popl23/POPL23.md
Google Scholar
Digital Library
- Andrew Cropper and Sebastijan Dumancic. 2022. Inductive Logic Programming At 30: A New Introduction. J. Artif. Intell. Res., 74 (2022), 765–850. https://doi.org/10.1613/jair.1.13507
Google Scholar
Digital Library
- Eyal Dechter, Jon Malmaud, Ryan P. Adams, and Joshua B. Tenenbaum. 2013. Bootstrap Learning via Modular Concept Discovery. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI ’13). AAAI Press, 1302–1309. isbn:9781577356332
Google Scholar
- Rui Dong, Zhicheng Huang, Ian Iong Lam, Yan Chen, and Xinyu Wang. 2022. WebRobot: Web Robotic Process Automation Using Interactive Programming-by-Demonstration. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 152–167. isbn:9781450392655 https://doi.org/10.1145/3519939.3523711
Google Scholar
Digital Library
- Sebastijan Dumancic, Tias Guns, and Andrew Cropper. 2021. Knowledge Refactoring for Inductive Program Synthesis. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 7271–7278. https://ojs.aaai.org/index.php/AAAI/article/view/16893
Google Scholar
Cross Ref
- Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2017. Learning to Infer Graphics Programs from Hand-Drawn Images. https://doi.org/10.48550/ARXIV.1707.09627
Google Scholar
- Kevin Ellis, Catherine Wong, Maxwell I. Nye, Mathias Sablé-Meyer, Lucas Morales, Luke B. Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2021. DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20–25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, 835–850. https://doi.org/10.1145/3453483.3454080
Google Scholar
Digital Library
- Srinivasan Iyer, Alvin Cheung, and Luke Zettlemoyer. 2019. Learning Programmatic Idioms for Scalable Semantic Parsing. https://doi.org/10.48550/ARXIV.1904.09086
Google Scholar
- R. Kenny Jones, David Charatan, Paul Guerrero, Niloy J. Mitra, and Daniel Ritchie. 2021. ShapeMOD: Macro Operation Discovery for 3D Shape Programs. ACM Trans. Graph., 40, 4 (2021), Article 153, jul, 16 pages. issn:0730-0301 https://doi.org/10.1145/3450626.3459821
Google Scholar
Digital Library
- Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2001. Programming By Demonstration Using Version Space Algebra.
Google Scholar
Digital Library
- Miguel Lázaro-Gredilla, Dianhuan Lin, J. Swaroop Guntupalli, and Dileep George. 2018. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. https://doi.org/10.48550/ARXIV.1812.02788
Google Scholar
- Na Meng, Miryung Kim, and Kathryn S. McKinley. 2013. Lase: Locating and applying systematic edits by learning from examples. In 2013 35th International Conference on Software Engineering (ICSE). 502–511. https://doi.org/10.1109/ICSE.2013.6606596
Google Scholar
Cross Ref
- Tom Michael Mitchell. 1977. Version Spaces: A Candidate Elimination Approach to Rule Learning. In IJCAI.
Google Scholar
- Chandrakana Nandi, James R. Wilcox, Pavel Panchekha, Taylor Blau, Dan Grossman, and Zachary Tatlock. 2018. Functional Programming for Compiling and Decompiling Computer-Aided Design. Proc. ACM Program. Lang., 2, ICFP (2018), Article 99, jul, 31 pages. https://doi.org/10.1145/3236794
Google Scholar
Digital Library
- Chandrakana Nandi, Max Willsey, Adam Anderson, James R. Wilcox, Eva Darulova, Dan Grossman, and Zachary Tatlock. 2020. Synthesizing structured CAD models with equality saturation and inverse transformations. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15–20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 31–44. https://doi.org/10.1145/3385412.3386012
Google Scholar
Digital Library
- Chandrakana Nandi, Max Willsey, Amy Zhu, Yisu Remy Wang, Brett Saiki, Adam Anderson, Adriana Schulz, Dan Grossman, and Zachary Tatlock. 2021. Rewrite Rule Inference Using Equality Saturation. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 119, oct, 28 pages. https://doi.org/10.1145/3485496
Google Scholar
Digital Library
- Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. SIGPLAN Not., 50, 6 (2015), jun, 1–11. issn:0362-1340 https://doi.org/10.1145/2813885.2737959
Google Scholar
Digital Library
- Gordon Plotkin. 1970. Lattice Theoretic Properties of Subsumption. Edinburgh University, Department of Machine Intelligence and Perception. https://books.google.com/books?id=2p09cgAACAAJ
Google Scholar
- Mohammad Raza, Natasa Milic-Frayling, and Sumit Gulwani. 2014. Programming by Example using Least General Generalizations. AAAI - Association for the Advancement of Artificial Intelligence. https://www.microsoft.com/en-us/research/publication/programming-by-example-using-least-general-generalizations/
Google Scholar
- John C. Reynolds. 1969. Transformational systems and the algebraic structure of atomic formulas.
Google Scholar
- Rodrigo C. O. Rocha, Pavlos Petoumenos, Björn Franke, Pramod Bhatotia, and Michael O’ Boyle. 2022. Loop Rolling for Code Size Reduction. In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Jae W. Lee, Sebastian Hack, and Tatiana Shpeisman (Eds.). IEEE, 217–229. isbn:978-1-6654-0585-0 https://doi.org/10.1109/CGO53902.2022.9741256
Google Scholar
Digital Library
- Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, 404–415. isbn:9781538638682 https://doi.org/10.1109/ICSE.2017.44
Google Scholar
Digital Library
- Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. 2017. CSGNet: Neural Shape Parser for Constructive Solid Geometry. https://doi.org/10.48550/ARXIV.1712.08290
Google Scholar
- Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. 2019. Program Synthesis and Semantic Parsing with Learned Code Idioms. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). 32, Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2019/file/cff34ad343b069ea6920464ad17d4bcf-Paper.pdf
Google Scholar
- Eytan Singher and Shachar Itzhaky. 2021. Theory Exploration Powered by Deductive Synthesis. In Computer Aided Verification, Alexandra Silva and K. Rustan M. Leino (Eds.). Springer International Publishing, Cham. 125–148. isbn:978-3-030-81688-9
Google Scholar
- G. Stiff and F. Vahid. 2005. New Decompilation Techniques for Binary-Level Co-Processor Generation. In Proceedings of the 2005 IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’05). IEEE Computer Society, USA. 547–554. isbn:078039254X
Google Scholar
- Bogong Su, Shiyuan Ding, and Lan Jin. 1984. An Improvement of Trace Scheduling for Global Microcode Compaction. SIGMICRO Newsl., 15, 4 (1984), dec, 78–85. issn:1050-916X https://doi.org/10.1145/384281.808217
Google Scholar
Digital Library
- Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2009. Equality Saturation: A New Approach to Optimization. In Proceedings of the 36th annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 264–276.
Google Scholar
Digital Library
- Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, and Jiajun Wu. 2019. Learning to Infer and Execute 3D Shape Programs. https://doi.org/10.48550/ARXIV.1901.02875
Google Scholar
- Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. 2021. Vectorization for Digital Signal Processors via Equality Saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA. 874–886. isbn:9781450383172 https://doi.org/10.1145/3445814.3446707
Google Scholar
Digital Library
- Haoliang Wang, Nadia Polikarpova, and Judith E. Fan. 2021. Learning part-based abstractions for visual object concepts. In Proceedings of the Annual Meeting of the Cognitive Science Society. 43, https://escholarship.org/uc/item/9009w415
Google Scholar
- Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, and Dan Suciu. 2020. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. Proc. VLDB Endow., 13, 12 (2020), jul, 1919–1932. issn:2150-8097 https://doi.org/10.14778/3407790.3407799
Google Scholar
Digital Library
- Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proceedings of the ACM on Programming Languages, 5, POPL (2021), 1–29.
Google Scholar
Digital Library
- Catherine Wong, Kevin Ellis, Joshua B. Tenenbaum, and Jacob Andreas. 2021. Leveraging Language to Learn Program Abstractions and Search Heuristics. https://doi.org/10.48550/ARXIV.2106.11053
Google Scholar
- Catherine Wong, William P. McCarthy, Gabriel Grand, Yoni Friedman, Joshua B. Tenenbaum, Jacob Andreas, Robert D. Hawkins, and Judith E. Fan. 2022. Identifying concept libraries from language about object structure. In Proceedings of the Annual Meeting of the Cognitive Science Society.
Google Scholar
- Chenming Wu, Haisen Zhao, Chandrakana Nandi, Jeffrey I. Lipton, Zachary Tatlock, and Adriana Schulz. 2019. Carpentry Compiler. ACM Trans. Graph., 38, 6 (2019), Article 195, nov, 14 pages. issn:0730-0301 https://doi.org/10.1145/3355089.3356518
Google Scholar
Digital Library
- Yichen Yang, Phitchaya Phothilimthana, Yisu Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. 2021. Equality Saturation for Tensor Graph Superoptimization. In Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica (Eds.). 3, 255–268. https://proceedings.mlsys.org/paper/2021/file/65ded5353c5ee48d0b7d48c591b8f430-Paper.pdf
Google Scholar
Index Terms
babble: Learning Better Abstractions with E-Graphs and Anti-unification
Recommendations
Top-Down Synthesis for Library Learning
This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from ...
Babble Noise: Modeling, Analysis, and Applications
Speech babble is one of the most challenging noise interference for all speech systems. Here, a systematic approach to model its underlying structure is proposed to further the existing knowledge of speech processing in noisy environments. This paper ...
Robust speaker identification in babble noise
ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in TechnologyPerformance of speaker recognition systems strongly degrades in the presence of background noise, like the babble noise. Speech babble is one of the most challenging noise interference due to its speaker/speech like characteristics. In contrast to ...






Comments