skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Automatic repair of regular expressions

Published:10 October 2019Publication History
Skip Abstract Section

Abstract

We introduce RFixer, a tool for repairing complex regular expressions using examples and only consider regular expressions without non-regular operators (e.g., negative lookahead). Given an incorrect regular expression and sets of positive and negative examples, RFixer synthesizes the closest regular expression to the original one that is consistent with the examples. Automatically repairing regular expressions requires exploring a large search space because practical regular expressions: i) are large, ii) operate over very large alphabets---e.g., UTF-16 and ASCII---and iii) employ complex constructs---e.g., character classes and numerical quantifiers. RFixer's repair algorithm achieves scalability by taking advantage of structural properties of regular expressions to effectively prune the search space, and it employs satisfiability modulo theory solvers to efficiently and symbolically explore the sets of possible character classes and numerical quantifiers. RFixer could successfully compute minimal repairs for regular expressions collected from a variety of sources, whereas existing tools either failed to produce any repair or produced overly complex repairs.

References

  1. 2018. COMPSCI 194 - LEC 016, https://bcourses.berkeley.edu/courses/1267848/pages/regex. https://bcourses.berkeley.edu/ courses/1267848/pages/regexGoogle ScholarGoogle Scholar
  2. R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300.Google ScholarGoogle Scholar
  3. Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Eric Medvet, and Enrico Sorio. 2014. Automatic Synthesis of Regular Expressions from Examples. Computer 47, 12 (Dec. 2014), 72–80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. 2016. Inference of Regular Expressions for Text Extraction from Examples. IEEE Trans. Knowl. Data Eng. 28, 5 (2016), 1217–1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Philip Bille. 2005. A survey on tree edit distance and related problems. Theoretical Computer Science 337, 1 (2005), 217 – 239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Robert A. Cochran, Loris D’Antoni, Benjamin Livshits, David Molnar, and Margus Veanes. 2015. Program Boosting: Program Synthesis via Crowd-Sourcing. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015 . 677–688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Loris D’Antoni, Dileep Kini, Rajeev Alur, Sumit Gulwani, Mahesh Viswanathan, and Björn Hartmann. 2015a. How Can Automatic Feedback Help Students Construct Automata? ACM Trans. Comput.-Hum. Interact. 22, 2 (2015), 9:1–9:24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Loris D’Antoni, Matthew Weavery, Alexander Weinert, and Rajeev Alur. 2015b. Automata Tutor and what we learned from building an online teaching tool. Bulletin of the EATCS 117 (2015). http://eatcs.org/beatcs/index.php/beatcs/article/view/ 365Google ScholarGoogle Scholar
  9. Pierre Dupont. 1996. Incremental regular inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 222–237.Google ScholarGoogle Scholar
  10. Henning Fernau. 2005. Algorithms for Learning Regular Expressions. In Proceedings of the 16th International Conference on Algorithmic Learning Theory (ALT’05) . Springer-Verlag, Berlin, Heidelberg, 297–311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ugo Galassi and Attilio Giordana. 2005. Learning Regular Expressions from Noisy Sequences. In Abstraction, Reformulation and Approximation , Jean-Daniel Zucker and Lorenza Saitta (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 92–106.Google ScholarGoogle Scholar
  12. E Mark Gold. 1978. Complexity of automaton identification from given data. Information and Control 37, 3 (1978), 302 – 320. Google ScholarGoogle ScholarCross RefCross Ref
  13. Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011 . 317–330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sumit Gulwani, William R. Harris, and Rishabh Singh. 2012. Spreadsheet data manipulation using examples. Commun. ACM 55, 8 (2012), 97–105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis. Foundations and Trends in Programming Languages 4, 1-2 (2017), 1–119. Google ScholarGoogle Scholar
  16. A.J.G. Hey, S. Tansley, and K.M. Tolle. 2009. The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research. https://books.google.com/books?id=oGs_AQAAIAAJGoogle ScholarGoogle Scholar
  17. Pekka Kilpeläinen and Rauno Tuhkanen. 2003. Regular Expressions with Numerical Occurrence Indicators - preliminary results. In Proceedings of the Eighth Symposium on Programming Languages and Software Tools, SPLST’03, Kuopio, Finland, June 17-18, 2003 , Pekka Kilpeläinen and Niina Päivinen (Eds.). University of Kuopio, Department of Computer Science, 163–173.Google ScholarGoogle Scholar
  18. Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, Amsterdam, The Netherlands, October 31 - November 1, 2016 , Bernd Fischer and Ina Schaefer (Eds.). ACM, 70–80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yunyao Li, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, and H. V. Jagadish. 2008. Regular Expression Learning for Information Extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’08) . Association for Computational Linguistics, Stroudsburg, PA, USA, 21–30. http: //dl.acm.org/citation.cfm?id=1613715.1613719Google ScholarGoogle Scholar
  20. Thomas Rebele, Katerina Tzompanaki, and Fabian M. Suchanek. 2018. Adding Missing Words to Regular Expressions. In Advances in Knowledge Discovery and Data Mining , Dinh Phung, Vincent S. Tseng, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, and Lida Rashidi (Eds.). Springer International Publishing, Cham, 67–79.Google ScholarGoogle Scholar
  21. RegExLib. 2017. Regular Expression Library. http://regexlib.com/ .Google ScholarGoogle Scholar
  22. Rishabh Singh and Sumit Gulwani. 2016. Transforming spreadsheet data types using examples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016 . 343–356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ken Thompson. 1968. Programming Techniques: Regular Expression Search Algorithm. Commun. ACM 11, 6 (June 1968), 419–422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Automata Tutor. 2015. Data from the tool Automata Tutor. https://github.com/AutomataTutor/automatatutor-data .Google ScholarGoogle Scholar
  25. Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M.K. Martin, and Rajeev Alur. 2013. TRANSIT: Specifying Protocols with Concolic Snippets. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13) . 287–296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mihalis Yannakakis. 1991. Testing Finite State Machines. In Proceedings of the Twenty-third Annual ACM Symposium on Theory of Computing (STOC ’91) , David Lee (Ed.). ACM, New York, NY, USA, 476–485. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic repair of regular expressions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!