skip to main content
article

Parser generation by example for legacy pattern languages

Published:23 October 2017Publication History
Skip Abstract Section

Abstract

Most modern software languages enjoy relatively free and relaxed concrete syntax, with significant flexibility of formatting of the program/model/sheet text. Yet, in the dark legacy corners of software engineering there are still languages with a strict fixed column-based structure — the compromises of times long gone, attempting to combine some human readability with some ease of machine processing. In this paper, we consider an industrial case study for retirement of a legacy domain-specific language, completed under extreme circumstances: absolute lack of documentation, varying line structure, hierarchical blocks within one file, scalability demands for millions of lines of code, performance demands for manipulating tens of thousands multi-megabyte files, etc. However, the regularity of the language allowed to infer its structure from the available examples, automatically, and produce highly efficient parsers for it.

References

  1. Dana Angluin. 1980. Finding Patterns Common to a Set of Strings. Journal of Computer and System Sciences 21, 1 (1980), 46-62.Google ScholarGoogle Scholar
  2. Dana Angluin. 1980. Inductive Inference of Formal Languages from Positive Data. Information and Control 45, 2 (1980), 117-135.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dana Angluin. 1982. Inference of Reversible Languages. Journal of the ACM 29, 3 (1982), 741-765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dana Angluin. 1987. Queries and Concept Learning. Machine Learning 2, 4 (1987), 319-342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hiroki Arimura, Takeshi Shinohara, and Setsuko Otsuki. 1994. Finding Minimal Generalizations for Unions of Pattern Languages and Its Application to Inductive Inference from Positive Data. In Proceedings of the 11th Annual Symposium on Theoretical Aspects of Computer Science (STACS) (LNCS), Patrice Enjalbert, ErnstW. Mayr, and Klaus W. Wagner (Eds.), Vol. 775. Springer, 649-660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vipin Balachandran. 2015. Query by Example in Large-Scale Code Repositories. In Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME). IEEE, 467-476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Albert Cohen and Martin T. Vechev (Eds.). ACM, 95-110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alan W. Biermann. 1978. The Inference of Regular LISP Programs from Examples. IEEE Transactions on Systems, Man, and Cybernetics 8, 8 (Aug 1978), 585-600.Google ScholarGoogle ScholarCross RefCross Ref
  9. Noam Chomsky and George Armitage Miller. 1957. Pattern Conception. Technical Report AD110076. ASTIA.Google ScholarGoogle Scholar
  10. Allen Cypher. 1991. EAGER: Programming Repetitive Tasks by Example. In Proceedings of the Ninth ACM SIGCHI Conference on Human Factors in Computing Systems. ACM, 33-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bryan Ford. 2002. Packrat Parsing: Simple, Powerful, Lazy, Linear Time, Functional Pearl. In Proceedings of the Seventh International Conference on Functional Programming (ICFP), MitchellWand and Simon L. Peyton Jones (Eds.). ACM, 36-47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-based Syntactic Foundation. In Proceedings of the 31st Symposium on Principles of Programming Languages (POPL), Neil D. Jones and Xavier Leroy (Eds.). ACM, 111-122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Iván García-Magariño, Jorge J. Gómez-Sanz, and Rubén Fuentes-Fernández. 2009. Model Transformation By-Example: An Algorithm for Generating Many-to-Many Transformation Rules in Several Model Transformation Languages. In Proceedings of the Second International Conference on Theory and Practice of Model Transformations (ICMT) (LNCS), Richard F. Paige (Ed.), Vol. 5563. Springer, 52-66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Mark Gold. 1967. Language Identification in the Limit. Information and Control 10, 5 (1967), 447-474.Google ScholarGoogle ScholarCross RefCross Ref
  15. Dick Grune and Ceriel J. H. Jacobs. 2008. Parsing Techniques -- A Practical Guide (second ed.). Addison-Wesley. https://dickgrune.com/Books/PTAPG_2nd_Edition/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sumit Gulwani. 2011. Automating String Processing in Spreadsheets using Input-Output Examples. In Proceedings of the 38th Symposium on Principles of Programming Languages (POPL), Thomas Ball and Mooly Sagiv (Eds.). ACM, 317-330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Masami Hagiya. 1990. Programming by Example and Proving by Example Using Higher-order Unification. In Proceedings of the 10th International Conference on Automated Deduction (CADE) (LNCS), Vol. 449. Springer-Verlag, 588-602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. William R. Harris and Sumit Gulwani. 2011. Spreadsheet Table Transformations from Examples. In Proceedings of the 32nd Conference on Programming Language Design and Implementation (PLDI), Mary W. Hall and David A. Padua (Eds.). ACM, 317-328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, 720- 725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Juriaan Kennedy van Dam and Vadim Zaytsev. 2016. Software Language Identification with Natural Language Classifiers. In Proceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER'16 ERA). IEEE, 624-628.Google ScholarGoogle Scholar
  21. Marouane Kessentini, Wael Kessentini, Houari A. Sahraoui, Mounir Boukadoum, and Ali Ouni. 2011. Design Defects Detection and Correction by Example. In Proceedings of the 19th International Conference on Program Comprehension. IEEE Computer Society, 81-90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Marouane Kessentini, Houari A. Sahraoui, Mounir Boukadoum, and Manuel Wimmer. 2011. Search-Based Design Defects Detection by Example. In Proceedings of the 14th International Conference on Fundamental Approaches to Software Engineering (LNCS), Vol. 6603. Springer, 401-415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yves Kodratoff. 1979. A Class of Functions Synthesized from a Finite Number of Examples and a LISP Program Scheme. International Journal of Computer & Information Sciences 8, 6 (Dec 1979), 489-521.Google ScholarGoogle ScholarCross RefCross Ref
  24. David Kurlander. 1993. Graphical Editing by Example. In Proceedings of the 11th ACM SIGCHI Conference on Human Factors in Computing Systems, jointly organised with the IFIP TC13 International Conference on Human-Computer Interaction. ACM, 529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alan Leung, John Sarracino, and Sorin Lerner. 2015. Interactive Parser Synthesis by Example. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 565-574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Neil A. M. Maiden and Alistair G. Sutcliffe. 1993. Requirements Engineering by Example: An Empirical Study. In Proceedings of IEEE International Symposium on Requirements Engineering. IEEE, 104-111.Google ScholarGoogle Scholar
  27. Vishv M. Malhotra, Sunanda Patro, and David Johnson. 2005. Synthesise Web Queries: Search the Web by Examples. In Proceedings of the Seventh International Conference on Enterprise Information Systems (ICEIS), Volume 2. SciTePress, 291-296.Google ScholarGoogle Scholar
  28. Mikaël Mayer, Jad Hamza, and Viktor Kuncak. 2017. Proactive Synthesis of Recursive Tree-to-String Functions from Examples. In Proceedings of the 31st European Conference on Object-Oriented Programming (ECOOP) (LIPIcs), Peter Müller (Ed.), Vol. 74. Schloss Dagstuhl, 19:1- 19:30.Google ScholarGoogle Scholar
  29. Dennis McLeod. 1976. The Translation and Compatibility of SEQUEL and Query by Example. In Proceedings of the Second International Conference on Software Engineering, Raymond T. Yeh and C. V. Ramamoorthy (Eds.). IEEE Computer Society, 520-526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Marjan Mernik, Goran Gerlic, Viljem Zumer, and Barrett R. Bryant. 2003. Can a Parser be Generated from Examples?. In Proceedings of the 18th Symposium on Applied Computing (SAC). ACM, 1063-1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Mitchell and James C. McKim. 2001. Design by Contract, By Example. In Proceedings of the 39th International Conference on Technology of Object-Oriented Languages and Systems (TOOLS). IEEE Computer Society, 430-431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Oscar Nierstrasz, Markus Kobel, Tudor Gîrba, Michele Lanza, and Horst Bunke. 2007. Example-Driven Reconstruction of Software Models. In Proceedings of the 11th European Conference on Software Maintenance and Reengineering, René L. Krikhaar, Chris Verhoef, and Giuseppe Antonio Di Lucca (Eds.). IEEE Computer Society, 275-286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Robert P. Nix. 1984. Editing by Example. In Conference Record of the 11th Annual Symposium on Principles of Programming Languages, Ken Kennedy, Mary S. Van Deusen, and Larry Landweber (Eds.). ACM Press, 186-195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dan R. Olsen, Brett Ahlstrom, and Douglas C. Kohlert. 1995. Building Geometry-Based Widgets by Example. In Proceedings of the 13th ACM SIGCHI Conference on Human Factors in Computing Systems. ACM/Addison-Wesley, 35-42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Terence Parr and Jurgen J. Vinju. 2016. Towards a Universal Code Formatter through Machine Learning. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering (SLE), Tijs van der Storm, Emilie Balland, and Dániel Varró (Eds.). ACM, 137-151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Raincode Labs. 2017. The Brain Challenge: Arithmetic Puzzle. (April 2017). http://www.raincodelabs.com/blog/brain-challenge-arithmetic-puzzle/Google ScholarGoogle Scholar
  37. Romain Robbes and Michele Lanza. 2008. Example-Based Program Transformation. In Proceedings of the 11th International Conference on Model Driven Engineering Languages and Systems (LNCS), Krzysztof Czarnecki, Ileana Ober, Jean-Michel Bruel, Axel Uhl, and Markus Völter (Eds.), Vol. 5301. Springer, 174-188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Reudismam Rolim, Gustavo Soares, Loris D'Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE), Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.). IEEE / ACM, 404-415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ray J. Solomonoff. 1959. A New Method for Discovering the Grammars of Phrase Structure Languages. In International Conference on Information Processing. Zator Company, 285-289. http://raysolomonoff.com/publications/newgrammars.pdfGoogle ScholarGoogle Scholar
  40. Andrew Stevenson and James R. Cordy. 2014. A Survey of Grammatical Inference in Software Engineering. Science of Computer Programming 96 (2014), 444-459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Michael Strommer and Manuel Wimmer. 2008. A Framework for Model Transformation By-Example: Concepts and Tool Support. In Proceedings of the 46th International Conference on Technology of Object-Oriented Languages and Systems (Lecture Notes in Business Information Processing), Vol. 11. Springer, 372-391.Google ScholarGoogle ScholarCross RefCross Ref
  42. Phillip D. Summers. 1976. A Methodology for Lisp Program Construction from Examples. In Conference Record of the Third Symposium on Principles of Programming Languages, Susan L. Graham, Robert M. Graham, Michael A. Harrison, William I. Grosky, and Jeffrey D. Ullman (Eds.). ACM Press, 68-76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Leslie G. Valiant. 1984. A Theory of the Learnable. Communications of the ACM 27, 11 (1984), 1134-1142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Dániel Varró. 2006. Model Transformation by Example. In Proceedings of the Ninth International Conference on Model Driven Engineering Languages and Systems (LNCS), Oscar Nierstrasz, Jon Whittle, David Harel, and Gianna Reggio (Eds.), Vol. 4199. Springer, 410-424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dániel Varró and Zoltan Balogh. 2007. Automating Model Transformation by Example Using Inductive Logic Programming. In Proceedings of the 22nd Symposium on Applied Computing (SAC), Yookun Cho, Roger L. Wainwright, Hisham Haddad, Sung Y. Shin, and Yong Wan Koo (Eds.). ACM, 978-984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Márcio L. A. Vidal, Altigran Soares da Silva, Edleno Silva de Moura, and João M. B. Cavalcanti. 2006. Structure-driven Crawler Generation by Example. In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 292-299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Andrew J. Werth and Brad A. Myers. 1993. Tourmaline: Macrostyles by Example. In Proceedings of the 11th ACM SIGCHI Conference on Human Factors in Computing Systems, jointly organised with the IFIP TC13 International Conference on Human-Computer Interaction. ACM, 532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Vadim Zaytsev. 2014. Formal Foundations for Semi-parsing. In Proceedings of the Software Evolution Week (CSMR-WCRE'14 ERA). IEEE, 313-317.Google ScholarGoogle Scholar
  49. Vadim Zaytsev. 2017. Incremental Coverage of Legacy Software Languages. In Proceedings of the Third Edition of the Programming Experience Workshop (PX/17.2). ACM. In print.Google ScholarGoogle Scholar
  50. Moshé M. Zloof. 1975. Query-by-Example: the Invocation and Definition of Tables and Forms. In Proceedings of the First International Conference on Very Large Data Bases (VLDB). ACM, 1-24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parser generation by example for legacy pattern languages

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!