Abstract
Most modern software languages enjoy relatively free and relaxed concrete syntax, with significant flexibility of formatting of the program/model/sheet text. Yet, in the dark legacy corners of software engineering there are still languages with a strict fixed column-based structure — the compromises of times long gone, attempting to combine some human readability with some ease of machine processing. In this paper, we consider an industrial case study for retirement of a legacy domain-specific language, completed under extreme circumstances: absolute lack of documentation, varying line structure, hierarchical blocks within one file, scalability demands for millions of lines of code, performance demands for manipulating tens of thousands multi-megabyte files, etc. However, the regularity of the language allowed to infer its structure from the available examples, automatically, and produce highly efficient parsers for it.
- Dana Angluin. 1980. Finding Patterns Common to a Set of Strings. Journal of Computer and System Sciences 21, 1 (1980), 46-62.Google Scholar
- Dana Angluin. 1980. Inductive Inference of Formal Languages from Positive Data. Information and Control 45, 2 (1980), 117-135.Google Scholar
Cross Ref
- Dana Angluin. 1982. Inference of Reversible Languages. Journal of the ACM 29, 3 (1982), 741-765. Google Scholar
Digital Library
- Dana Angluin. 1987. Queries and Concept Learning. Machine Learning 2, 4 (1987), 319-342. Google Scholar
Digital Library
- Hiroki Arimura, Takeshi Shinohara, and Setsuko Otsuki. 1994. Finding Minimal Generalizations for Unions of Pattern Languages and Its Application to Inductive Inference from Positive Data. In Proceedings of the 11th Annual Symposium on Theoretical Aspects of Computer Science (STACS) (LNCS), Patrice Enjalbert, ErnstW. Mayr, and Klaus W. Wagner (Eds.), Vol. 775. Springer, 649-660. Google Scholar
Digital Library
- Vipin Balachandran. 2015. Query by Example in Large-Scale Code Repositories. In Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME). IEEE, 467-476. Google Scholar
Digital Library
- Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Albert Cohen and Martin T. Vechev (Eds.). ACM, 95-110. Google Scholar
Digital Library
- Alan W. Biermann. 1978. The Inference of Regular LISP Programs from Examples. IEEE Transactions on Systems, Man, and Cybernetics 8, 8 (Aug 1978), 585-600.Google Scholar
Cross Ref
- Noam Chomsky and George Armitage Miller. 1957. Pattern Conception. Technical Report AD110076. ASTIA.Google Scholar
- Allen Cypher. 1991. EAGER: Programming Repetitive Tasks by Example. In Proceedings of the Ninth ACM SIGCHI Conference on Human Factors in Computing Systems. ACM, 33-39. Google Scholar
Digital Library
- Bryan Ford. 2002. Packrat Parsing: Simple, Powerful, Lazy, Linear Time, Functional Pearl. In Proceedings of the Seventh International Conference on Functional Programming (ICFP), MitchellWand and Simon L. Peyton Jones (Eds.). ACM, 36-47. Google Scholar
Digital Library
- Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-based Syntactic Foundation. In Proceedings of the 31st Symposium on Principles of Programming Languages (POPL), Neil D. Jones and Xavier Leroy (Eds.). ACM, 111-122. Google Scholar
Digital Library
- Iván García-Magariño, Jorge J. Gómez-Sanz, and Rubén Fuentes-Fernández. 2009. Model Transformation By-Example: An Algorithm for Generating Many-to-Many Transformation Rules in Several Model Transformation Languages. In Proceedings of the Second International Conference on Theory and Practice of Model Transformations (ICMT) (LNCS), Richard F. Paige (Ed.), Vol. 5563. Springer, 52-66. Google Scholar
Digital Library
- E. Mark Gold. 1967. Language Identification in the Limit. Information and Control 10, 5 (1967), 447-474.Google Scholar
Cross Ref
- Dick Grune and Ceriel J. H. Jacobs. 2008. Parsing Techniques -- A Practical Guide (second ed.). Addison-Wesley. https://dickgrune.com/Books/PTAPG_2nd_Edition/ Google Scholar
Digital Library
- Sumit Gulwani. 2011. Automating String Processing in Spreadsheets using Input-Output Examples. In Proceedings of the 38th Symposium on Principles of Programming Languages (POPL), Thomas Ball and Mooly Sagiv (Eds.). ACM, 317-330. Google Scholar
Digital Library
- Masami Hagiya. 1990. Programming by Example and Proving by Example Using Higher-order Unification. In Proceedings of the 10th International Conference on Automated Deduction (CADE) (LNCS), Vol. 449. Springer-Verlag, 588-602. Google Scholar
Digital Library
- William R. Harris and Sumit Gulwani. 2011. Spreadsheet Table Transformations from Examples. In Proceedings of the 32nd Conference on Programming Language Design and Implementation (PLDI), Mary W. Hall and David A. Padua (Eds.). ACM, 317-328. Google Scholar
Digital Library
- Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, 720- 725. Google Scholar
Digital Library
- Juriaan Kennedy van Dam and Vadim Zaytsev. 2016. Software Language Identification with Natural Language Classifiers. In Proceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER'16 ERA). IEEE, 624-628.Google Scholar
- Marouane Kessentini, Wael Kessentini, Houari A. Sahraoui, Mounir Boukadoum, and Ali Ouni. 2011. Design Defects Detection and Correction by Example. In Proceedings of the 19th International Conference on Program Comprehension. IEEE Computer Society, 81-90. Google Scholar
Digital Library
- Marouane Kessentini, Houari A. Sahraoui, Mounir Boukadoum, and Manuel Wimmer. 2011. Search-Based Design Defects Detection by Example. In Proceedings of the 14th International Conference on Fundamental Approaches to Software Engineering (LNCS), Vol. 6603. Springer, 401-415. Google Scholar
Digital Library
- Yves Kodratoff. 1979. A Class of Functions Synthesized from a Finite Number of Examples and a LISP Program Scheme. International Journal of Computer & Information Sciences 8, 6 (Dec 1979), 489-521.Google Scholar
Cross Ref
- David Kurlander. 1993. Graphical Editing by Example. In Proceedings of the 11th ACM SIGCHI Conference on Human Factors in Computing Systems, jointly organised with the IFIP TC13 International Conference on Human-Computer Interaction. ACM, 529. Google Scholar
Digital Library
- Alan Leung, John Sarracino, and Sorin Lerner. 2015. Interactive Parser Synthesis by Example. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 565-574. Google Scholar
Digital Library
- Neil A. M. Maiden and Alistair G. Sutcliffe. 1993. Requirements Engineering by Example: An Empirical Study. In Proceedings of IEEE International Symposium on Requirements Engineering. IEEE, 104-111.Google Scholar
- Vishv M. Malhotra, Sunanda Patro, and David Johnson. 2005. Synthesise Web Queries: Search the Web by Examples. In Proceedings of the Seventh International Conference on Enterprise Information Systems (ICEIS), Volume 2. SciTePress, 291-296.Google Scholar
- Mikaël Mayer, Jad Hamza, and Viktor Kuncak. 2017. Proactive Synthesis of Recursive Tree-to-String Functions from Examples. In Proceedings of the 31st European Conference on Object-Oriented Programming (ECOOP) (LIPIcs), Peter Müller (Ed.), Vol. 74. Schloss Dagstuhl, 19:1- 19:30.Google Scholar
- Dennis McLeod. 1976. The Translation and Compatibility of SEQUEL and Query by Example. In Proceedings of the Second International Conference on Software Engineering, Raymond T. Yeh and C. V. Ramamoorthy (Eds.). IEEE Computer Society, 520-526. Google Scholar
Digital Library
- Marjan Mernik, Goran Gerlic, Viljem Zumer, and Barrett R. Bryant. 2003. Can a Parser be Generated from Examples?. In Proceedings of the 18th Symposium on Applied Computing (SAC). ACM, 1063-1067. Google Scholar
Digital Library
- R. Mitchell and James C. McKim. 2001. Design by Contract, By Example. In Proceedings of the 39th International Conference on Technology of Object-Oriented Languages and Systems (TOOLS). IEEE Computer Society, 430-431. Google Scholar
Digital Library
- Oscar Nierstrasz, Markus Kobel, Tudor Gîrba, Michele Lanza, and Horst Bunke. 2007. Example-Driven Reconstruction of Software Models. In Proceedings of the 11th European Conference on Software Maintenance and Reengineering, René L. Krikhaar, Chris Verhoef, and Giuseppe Antonio Di Lucca (Eds.). IEEE Computer Society, 275-286. Google Scholar
Digital Library
- Robert P. Nix. 1984. Editing by Example. In Conference Record of the 11th Annual Symposium on Principles of Programming Languages, Ken Kennedy, Mary S. Van Deusen, and Larry Landweber (Eds.). ACM Press, 186-195. Google Scholar
Digital Library
- Dan R. Olsen, Brett Ahlstrom, and Douglas C. Kohlert. 1995. Building Geometry-Based Widgets by Example. In Proceedings of the 13th ACM SIGCHI Conference on Human Factors in Computing Systems. ACM/Addison-Wesley, 35-42. Google Scholar
Digital Library
- Terence Parr and Jurgen J. Vinju. 2016. Towards a Universal Code Formatter through Machine Learning. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering (SLE), Tijs van der Storm, Emilie Balland, and Dániel Varró (Eds.). ACM, 137-151. Google Scholar
Digital Library
- Raincode Labs. 2017. The Brain Challenge: Arithmetic Puzzle. (April 2017). http://www.raincodelabs.com/blog/brain-challenge-arithmetic-puzzle/Google Scholar
- Romain Robbes and Michele Lanza. 2008. Example-Based Program Transformation. In Proceedings of the 11th International Conference on Model Driven Engineering Languages and Systems (LNCS), Krzysztof Czarnecki, Ileana Ober, Jean-Michel Bruel, Axel Uhl, and Markus Völter (Eds.), Vol. 5301. Springer, 174-188. Google Scholar
Digital Library
- Reudismam Rolim, Gustavo Soares, Loris D'Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE), Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.). IEEE / ACM, 404-415. Google Scholar
Digital Library
- Ray J. Solomonoff. 1959. A New Method for Discovering the Grammars of Phrase Structure Languages. In International Conference on Information Processing. Zator Company, 285-289. http://raysolomonoff.com/publications/newgrammars.pdfGoogle Scholar
- Andrew Stevenson and James R. Cordy. 2014. A Survey of Grammatical Inference in Software Engineering. Science of Computer Programming 96 (2014), 444-459. Google Scholar
Digital Library
- Michael Strommer and Manuel Wimmer. 2008. A Framework for Model Transformation By-Example: Concepts and Tool Support. In Proceedings of the 46th International Conference on Technology of Object-Oriented Languages and Systems (Lecture Notes in Business Information Processing), Vol. 11. Springer, 372-391.Google Scholar
Cross Ref
- Phillip D. Summers. 1976. A Methodology for Lisp Program Construction from Examples. In Conference Record of the Third Symposium on Principles of Programming Languages, Susan L. Graham, Robert M. Graham, Michael A. Harrison, William I. Grosky, and Jeffrey D. Ullman (Eds.). ACM Press, 68-76. Google Scholar
Digital Library
- Leslie G. Valiant. 1984. A Theory of the Learnable. Communications of the ACM 27, 11 (1984), 1134-1142. Google Scholar
Digital Library
- Dániel Varró. 2006. Model Transformation by Example. In Proceedings of the Ninth International Conference on Model Driven Engineering Languages and Systems (LNCS), Oscar Nierstrasz, Jon Whittle, David Harel, and Gianna Reggio (Eds.), Vol. 4199. Springer, 410-424. Google Scholar
Digital Library
- Dániel Varró and Zoltan Balogh. 2007. Automating Model Transformation by Example Using Inductive Logic Programming. In Proceedings of the 22nd Symposium on Applied Computing (SAC), Yookun Cho, Roger L. Wainwright, Hisham Haddad, Sung Y. Shin, and Yong Wan Koo (Eds.). ACM, 978-984. Google Scholar
Digital Library
- Márcio L. A. Vidal, Altigran Soares da Silva, Edleno Silva de Moura, and João M. B. Cavalcanti. 2006. Structure-driven Crawler Generation by Example. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 292-299. Google Scholar
Digital Library
- Andrew J. Werth and Brad A. Myers. 1993. Tourmaline: Macrostyles by Example. In Proceedings of the 11th ACM SIGCHI Conference on Human Factors in Computing Systems, jointly organised with the IFIP TC13 International Conference on Human-Computer Interaction. ACM, 532. Google Scholar
Digital Library
- Vadim Zaytsev. 2014. Formal Foundations for Semi-parsing. In Proceedings of the Software Evolution Week (CSMR-WCRE'14 ERA). IEEE, 313-317.Google Scholar
- Vadim Zaytsev. 2017. Incremental Coverage of Legacy Software Languages. In Proceedings of the Third Edition of the Programming Experience Workshop (PX/17.2). ACM. In print.Google Scholar
- Moshé M. Zloof. 1975. Query-by-Example: the Invocation and Definition of Tables and Forms. In Proceedings of the First International Conference on Very Large Data Bases (VLDB). ACM, 1-24. Google Scholar
Digital Library
Index Terms
Parser generation by example for legacy pattern languages
Recommendations
Parser generation by example for legacy pattern languages
GPCE 2017: Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesMost modern software languages enjoy relatively free and relaxed concrete syntax, with significant flexibility of formatting of the program/model/sheet text. Yet, in the dark legacy corners of software engineering there are still languages with a ...
Improving maintenance by creating a DSL for configuring a fieldbus
DSM 2016: Proceedings of the International Workshop on Domain-Specific ModelingThe high-tech industry produces complex devices in which software plays an important role. Since these devices have been developed for many decades, an increasing part of the software can be classified as legacy which is difficult to maintain and to ...
Inferring context-free grammars for domain-specific languages
OOPSLA '05: Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applicationsWe propose a new application area for grammar inference which intends to make domain-specific language development easier and finds a second application in renovation tools for legacy systems. We use the genetic programming approach for grammatical ...







Comments