Abstract
We introduce RFixer, a tool for repairing complex regular expressions using examples and only consider regular expressions without non-regular operators (e.g., negative lookahead). Given an incorrect regular expression and sets of positive and negative examples, RFixer synthesizes the closest regular expression to the original one that is consistent with the examples. Automatically repairing regular expressions requires exploring a large search space because practical regular expressions: i) are large, ii) operate over very large alphabets---e.g., UTF-16 and ASCII---and iii) employ complex constructs---e.g., character classes and numerical quantifiers. RFixer's repair algorithm achieves scalability by taking advantage of structural properties of regular expressions to effectively prune the search space, and it employs satisfiability modulo theory solvers to efficiently and symbolically explore the sets of possible character classes and numerical quantifiers. RFixer could successfully compute minimal repairs for regular expressions collected from a variety of sources, whereas existing tools either failed to produce any repair or produced overly complex repairs.
- 2018. COMPSCI 194 - LEC 016, https://bcourses.berkeley.edu/courses/1267848/pages/regex. https://bcourses.berkeley.edu/ courses/1267848/pages/regexGoogle Scholar
- R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300.Google Scholar
- Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Eric Medvet, and Enrico Sorio. 2014. Automatic Synthesis of Regular Expressions from Examples. Computer 47, 12 (Dec. 2014), 72–80. Google Scholar
Digital Library
- Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. 2016. Inference of Regular Expressions for Text Extraction from Examples. IEEE Trans. Knowl. Data Eng. 28, 5 (2016), 1217–1230. Google Scholar
Digital Library
- Philip Bille. 2005. A survey on tree edit distance and related problems. Theoretical Computer Science 337, 1 (2005), 217 – 239. Google Scholar
Digital Library
- Robert A. Cochran, Loris D’Antoni, Benjamin Livshits, David Molnar, and Margus Veanes. 2015. Program Boosting: Program Synthesis via Crowd-Sourcing. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015 . 677–688. Google Scholar
Digital Library
- Loris D’Antoni, Dileep Kini, Rajeev Alur, Sumit Gulwani, Mahesh Viswanathan, and Björn Hartmann. 2015a. How Can Automatic Feedback Help Students Construct Automata? ACM Trans. Comput.-Hum. Interact. 22, 2 (2015), 9:1–9:24. Google Scholar
Digital Library
- Loris D’Antoni, Matthew Weavery, Alexander Weinert, and Rajeev Alur. 2015b. Automata Tutor and what we learned from building an online teaching tool. Bulletin of the EATCS 117 (2015). http://eatcs.org/beatcs/index.php/beatcs/article/view/ 365Google Scholar
- Pierre Dupont. 1996. Incremental regular inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 222–237.Google Scholar
- Henning Fernau. 2005. Algorithms for Learning Regular Expressions. In Proceedings of the 16th International Conference on Algorithmic Learning Theory (ALT’05) . Springer-Verlag, Berlin, Heidelberg, 297–311. Google Scholar
Digital Library
- Ugo Galassi and Attilio Giordana. 2005. Learning Regular Expressions from Noisy Sequences. In Abstraction, Reformulation and Approximation , Jean-Daniel Zucker and Lorenza Saitta (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 92–106.Google Scholar
- E Mark Gold. 1978. Complexity of automaton identification from given data. Information and Control 37, 3 (1978), 302 – 320. Google Scholar
Cross Ref
- Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011 . 317–330. Google Scholar
Digital Library
- Sumit Gulwani, William R. Harris, and Rishabh Singh. 2012. Spreadsheet data manipulation using examples. Commun. ACM 55, 8 (2012), 97–105. Google Scholar
Digital Library
- Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis. Foundations and Trends in Programming Languages 4, 1-2 (2017), 1–119. Google Scholar
- A.J.G. Hey, S. Tansley, and K.M. Tolle. 2009. The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research. https://books.google.com/books?id=oGs_AQAAIAAJGoogle Scholar
- Pekka Kilpeläinen and Rauno Tuhkanen. 2003. Regular Expressions with Numerical Occurrence Indicators - preliminary results. In Proceedings of the Eighth Symposium on Programming Languages and Software Tools, SPLST’03, Kuopio, Finland, June 17-18, 2003 , Pekka Kilpeläinen and Niina Päivinen (Eds.). University of Kuopio, Department of Computer Science, 163–173.Google Scholar
- Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, Amsterdam, The Netherlands, October 31 - November 1, 2016 , Bernd Fischer and Ina Schaefer (Eds.). ACM, 70–80. Google Scholar
Digital Library
- Yunyao Li, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, and H. V. Jagadish. 2008. Regular Expression Learning for Information Extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’08) . Association for Computational Linguistics, Stroudsburg, PA, USA, 21–30. http: //dl.acm.org/citation.cfm?id=1613715.1613719Google Scholar
- Thomas Rebele, Katerina Tzompanaki, and Fabian M. Suchanek. 2018. Adding Missing Words to Regular Expressions. In Advances in Knowledge Discovery and Data Mining , Dinh Phung, Vincent S. Tseng, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, and Lida Rashidi (Eds.). Springer International Publishing, Cham, 67–79.Google Scholar
- RegExLib. 2017. Regular Expression Library. http://regexlib.com/ .Google Scholar
- Rishabh Singh and Sumit Gulwani. 2016. Transforming spreadsheet data types using examples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016 . 343–356. Google Scholar
Digital Library
- Ken Thompson. 1968. Programming Techniques: Regular Expression Search Algorithm. Commun. ACM 11, 6 (June 1968), 419–422. Google Scholar
Digital Library
- Automata Tutor. 2015. Data from the tool Automata Tutor. https://github.com/AutomataTutor/automatatutor-data .Google Scholar
- Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M.K. Martin, and Rajeev Alur. 2013. TRANSIT: Specifying Protocols with Concolic Snippets. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13) . 287–296.Google Scholar
Digital Library
- Mihalis Yannakakis. 1991. Testing Finite State Machines. In Proceedings of the Twenty-third Annual ACM Symposium on Theory of Computing (STOC ’91) , David Lee (Ed.). ACM, New York, NY, USA, 476–485. Google Scholar
Digital Library
Index Terms
Automatic repair of regular expressions
Recommendations
Construction of fuzzy automata from fuzzy regular expressions
Li and Pedrycz have proved fundamental results that provide different equivalent ways to represent fuzzy languages with membership values in a lattice-ordered monoid, and generalize the well-known results of the classical theory of formal languages. In ...
Regular Expressions for Languages over Infinite Alphabets
In this paper we introduce a notion of a regular expression over infinite alphabets and show that a language is definable by an infinite alphabet regular expression if and only if it is accepted by finite-state unification based automaton - a model of ...
Regular Expressions for Languages over Infinite Alphabets
In this paper we introduce a notion of a regular expression over infinite alphabets and show that a language is definable by an infinite alphabet regular expression if and only if it is accepted by finite-state unification based automaton - a model of ...






Comments