skip to main content
article

Synthesizing regular expressions from examples for introductory automata assignments

Published:20 October 2016Publication History
Skip Abstract Section

Abstract

We present a method for synthesizing regular expressions for introductory automata assignments. Given a set of positive and negative examples, the method automatically synthesizes the simplest possible regular expression that accepts all the positive examples while rejecting all the negative examples. The key novelty is the search-based synthesis algorithm that leverages ideas from over- and under-approximations to effectively prune out a large search space. We have implemented our technique in a tool and evaluated it with non-trivial benchmark problems that students often struggle with. The results show that our system can synthesize desired regular expressions in 6.7 seconds on the average, so that it can be interactively used by students to enhance their understanding of regular expressions.

References

  1. Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In Proceedings of the 25th International Conference on Computer Aided Verification - Volume 8044, CAV 2013, pages 934–950, New York, NY, USA, 2013. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87–106, November 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin Zorn. Flashrelate: Extracting relational data from semistructured spreadsheets using examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pages 218–228, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David F. Barrero, Mar´ıa D. R-Moreno, and David Camacho. Adapting searchy to extract data using evolved wrappers. Expert Syst. Appl., 39(3):3061–3070, February 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Marco Mauri, Eric Medvet, and Enrico Sorio. Automatic generation of regular expressions from examples with genetic programming. In Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’12, pages 1477–1478, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Eric Medvet, and Enrico Sorio. Automatic synthesis of regular expressions from examples. Computer, 47(12):72–80, December 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. Playing regex golf with genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14, pages 1063–1070, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. Inference of regular expressions for text extraction from examples. IEEE Trans. on Knowl. and Data Eng., 28(5):1217–1230, May 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Josh Bongard and Hod Lipson. Active coevolutionary learning of deterministic finite automata. J. Mach. Learn. Res., 6:1651–1678, December 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Falk Brauer, Robert Rieger, Adrian Mocan, and Wojciech M. Barczynski. Enabling information extraction by inference of regular expressions from sample entities. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages 1285–1294, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Duy Duc An Bui and Qing Zeng-Treitler. Learning regular expressions for clinical text classification. Journal of the American Medical Informatics Association, 21(5):850–857, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ahmet Cetinkaya. Regular expression generation through grammatical evolution. In Proceedings of the 9th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’07, pages 2643–2646, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. D. Dunay, F. E. Petry, and B. P. Buckles. Regular language induction with genetic programming. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, pages 396–400 vol.1, Jun 1994.Google ScholarGoogle ScholarCross RefCross Ref
  14. Henning Fernau. Algorithms for learning regular expressions from positive data. Inf. Comput., 207(4):521–541, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pages 229–239, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’11, pages 317–330, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. John E. Hopcroft. Introduction to Automata Theory, Languages, and Computation. Pearson Addison Wesley, 3rd edition, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Efim Kinber. Learning regular expressions from representative examples and membership queries. In Proceedings of the 10th International Colloquium Conference on Grammatical Inference: Theoretical Results and Applications, ICGI’10, pages 94–108, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Peter Linz. An Introduction to Formal Language and Automata. Jones and Bartlett Publishers, Inc., USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. R. Meyer and L. J. Stockmeyer. The equivalence problem for regular expressions with squaring requires exponential space. In Proceedings of the 13th Annual Symposium on Switching and Automata Theory (Swat 1972), SWAT ’72, pages 125–129, Washington, DC, USA, 1972. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rajesh Parekh and Vasant Honavar. An incremental interactive algorithm for regular grammar inference. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2, AAAI’96, pages 1397–1397. AAAI Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rajesh Parekh and Vasant G. Honavar. Learning dfa from simple examples. Mach. Learn., 44(1-2):9–35, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Paul Prasse, Christoph Sawade, Niels Landwehr, and Tobias Scheffer. Learning to identify concise regular expressions that describe email campaigns. J. Mach. Learn. Res., 16(1):3687– 3720, January 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rishabh Singh and Sumit Gulwani. Synthesizing number transformations from input-output examples. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV’12, pages 634–651, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 3rd edition, 2012.Google ScholarGoogle Scholar
  27. Borge Svingen. Learning regular languages using genetic programming. In John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 374–376, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann.Google ScholarGoogle Scholar
  28. Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. Spamming botnets: Signatures and characteristics. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, SIGCOMM ’08, pages 171–182, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Synthesizing regular expressions from examples for introductory automata assignments

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 3
        GPCE '16
        March 2017
        212 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3093335
        Issue’s Table of Contents
        • cover image ACM Conferences
          GPCE 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences
          October 2016
          212 pages
          ISBN:9781450344463
          DOI:10.1145/2993236

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 October 2016

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!