Abstract
We present a method for synthesizing regular expressions for introductory automata assignments. Given a set of positive and negative examples, the method automatically synthesizes the simplest possible regular expression that accepts all the positive examples while rejecting all the negative examples. The key novelty is the search-based synthesis algorithm that leverages ideas from over- and under-approximations to effectively prune out a large search space. We have implemented our technique in a tool and evaluated it with non-trivial benchmark problems that students often struggle with. The results show that our system can synthesize desired regular expressions in 6.7 seconds on the average, so that it can be interactively used by students to enhance their understanding of regular expressions.
- Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In Proceedings of the 25th International Conference on Computer Aided Verification - Volume 8044, CAV 2013, pages 934–950, New York, NY, USA, 2013. Springer-Verlag New York, Inc. Google Scholar
Digital Library
- Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87–106, November 1987. Google Scholar
Digital Library
- Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin Zorn. Flashrelate: Extracting relational data from semistructured spreadsheets using examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pages 218–228, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- David F. Barrero, Mar´ıa D. R-Moreno, and David Camacho. Adapting searchy to extract data using evolved wrappers. Expert Syst. Appl., 39(3):3061–3070, February 2012. Google Scholar
Digital Library
- Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Marco Mauri, Eric Medvet, and Enrico Sorio. Automatic generation of regular expressions from examples with genetic programming. In Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’12, pages 1477–1478, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Eric Medvet, and Enrico Sorio. Automatic synthesis of regular expressions from examples. Computer, 47(12):72–80, December 2014. Google Scholar
Digital Library
- Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. Playing regex golf with genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14, pages 1063–1070, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. Inference of regular expressions for text extraction from examples. IEEE Trans. on Knowl. and Data Eng., 28(5):1217–1230, May 2016. Google Scholar
Digital Library
- Josh Bongard and Hod Lipson. Active coevolutionary learning of deterministic finite automata. J. Mach. Learn. Res., 6:1651–1678, December 2005. Google Scholar
Digital Library
- Falk Brauer, Robert Rieger, Adrian Mocan, and Wojciech M. Barczynski. Enabling information extraction by inference of regular expressions from sample entities. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages 1285–1294, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- Duy Duc An Bui and Qing Zeng-Treitler. Learning regular expressions for clinical text classification. Journal of the American Medical Informatics Association, 21(5):850–857, 2014.Google Scholar
Cross Ref
- Ahmet Cetinkaya. Regular expression generation through grammatical evolution. In Proceedings of the 9th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’07, pages 2643–2646, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- B. D. Dunay, F. E. Petry, and B. P. Buckles. Regular language induction with genetic programming. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 396–400 vol.1, Jun 1994.Google Scholar
Cross Ref
- Henning Fernau. Algorithms for learning regular expressions from positive data. Inf. Comput., 207(4):521–541, April 2009. Google Scholar
Digital Library
- John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pages 229–239, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’11, pages 317–330, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- John E. Hopcroft. Introduction to Automata Theory, Languages, and Computation. Pearson Addison Wesley, 3rd edition, 2007. Google Scholar
Digital Library
- Efim Kinber. Learning regular expressions from representative examples and membership queries. In Proceedings of the 10th International Colloquium Conference on Grammatical Inference: Theoretical Results and Applications, ICGI’10, pages 94–108, Berlin, Heidelberg, 2010. Springer-Verlag. Google Scholar
Digital Library
- John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992. Google Scholar
Digital Library
- Peter Linz. An Introduction to Formal Language and Automata. Jones and Bartlett Publishers, Inc., USA, 2006. Google Scholar
Digital Library
- A. R. Meyer and L. J. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In Proceedings of the 13th Annual Symposium on Switching and Automata Theory (Swat 1972), SWAT ’72, pages 125–129, Washington, DC, USA, 1972. IEEE Computer Society. Google Scholar
Digital Library
- Rajesh Parekh and Vasant Honavar. An incremental interactive algorithm for regular grammar inference. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2, AAAI’96, pages 1397–1397. AAAI Press, 1996. Google Scholar
Digital Library
- Rajesh Parekh and Vasant G. Honavar. Learning dfa from simple examples. Mach. Learn., 44(1-2):9–35, July 2001. Google Scholar
Digital Library
- Paul Prasse, Christoph Sawade, Niels Landwehr, and Tobias Scheffer. Learning to identify concise regular expressions that describe email campaigns. J. Mach. Learn. Res., 16(1):3687– 3720, January 2015. Google Scholar
Digital Library
- Rishabh Singh and Sumit Gulwani. Synthesizing number transformations from input-output examples. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV’12, pages 634–651, Berlin, Heidelberg, 2012. Springer-Verlag. Google Scholar
Digital Library
- Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 3rd edition, 2012.Google Scholar
- Borge Svingen. Learning regular languages using genetic programming. In John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 374–376, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann.Google Scholar
- Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. Spamming botnets: Signatures and characteristics. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, SIGCOMM ’08, pages 171–182, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
Index Terms
Synthesizing regular expressions from examples for introductory automata assignments
Recommendations
Multi-modal synthesis of regular expressions
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationIn this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions (regexes) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language ...
Automatic repair of regular expressions
We introduce RFixer, a tool for repairing complex regular expressions using examples and only consider regular expressions without non-regular operators (e.g., negative lookahead). Given an incorrect regular expression and sets of positive and negative ...
Synthesizing regular expressions from examples for introductory automata assignments
GPCE 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesWe present a method for synthesizing regular expressions for introductory automata assignments. Given a set of positive and negative examples, the method automatically synthesizes the simplest possible regular expression that accepts all the positive ...







Comments