Abstract
It is a neat result from functional programming that libraries of parser combinators can support rapid construction of decoders for quite a range of formats. With a little more work, the same combinator program can denote both a decoder and an encoder. Unfortunately, the real world is full of gnarly formats, as with the packet formats that make up the standard Internet protocol stack. Most past parser-combinator approaches cannot handle these formats, and the few exceptions require redundancy – one part of the natural grammar needs to be hand-translated into hints in multiple parts of a parser program. We show how to recover very natural and nonredundant format specifications, covering all popular network packet formats and generating both decoders and encoders automatically. The catch is that we use the Coq proof assistant to derive both kinds of artifacts using tactics, automatically, in a way that guarantees that they form inverses of each other. We used our approach to reimplement packet processing for a full Internet protocol stack, inserting our replacement into the OCaml-based MirageOS unikernel, resulting in minimal performance degradation.
Supplemental Material
- 2013a. CVE-2012-5965: Stack-based buffer overflow in the unique_service_name function in ssdp/ssdp_server.c in the SSDP parser in the portable SDK for UPnP Devices 1.3.1 allows remote attackers to execute arbitrary code via a long DeviceType field in a UDP packet. (Jan. 2013). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2012- 5965Google Scholar
- 2013b. CVE-2013-1203: Cisco ASA CX Context-Aware Security Software allows remote attackers to cause a denial of service (device reload) via crafted TCP packets that appear to have been forwarded by a Cisco Adaptive Security Appliances device. (May 2013). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2013- 1203Google Scholar
- 2015. CVE-2015-0618: Cisco IOS XR 5.0.1 and 5.2.1 on Network Convergence System 6000 devices and 5.1.3 and 5.1.4 on Carrier Routing System X devices allows remote attackers to cause a denial of service via malformed IPv6 packets with extension headers. (Feb. 2015). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2015- 0618Google Scholar
- 2016. CVE-2016-5080: Integer overflow in the rtxMemHeapAlloc function in asn1rt_a.lib in Objective Systems ASN1C for C/C++ before 7.0.2 allows context-dependent attackers to execute arbitrary code or cause a denial of service, on a system running an application compiled by ASN1C, via crafted ASN.1 data. (July 2016). https://cve.mitre.org/cgibin/cvename.cgi?name=CVE- 2016- 5080Google Scholar
- Artem Alimarine, Sjaak Smetsers, Arjen van Weelden, Marko van Eekelen, and Rinus Plasmeijer. 2005. There and Back Again: Arrows for Invertible Programming. In Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell (Haskell ’05). ACM, New York, NY, USA, 86–97. Google Scholar
Digital Library
- Nada Amin and Tiark Rompf. 2017. LMS-Verify: Abstraction Without Regret for Verified Systems Programming. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 859–873. Google Scholar
Digital Library
- Apache Software Foundation. 2016. Apache Avro 1.8.0 Documentation. (2016). http://avro.apache.org/docs/current/ {Accessed May 04, 2016}.Google Scholar
- Godmar Back. 2002. DataScript - A Specification and Scripting Language for Binary Data. In Proceedings of the 1st ACM SIGPLAN/SIGSOFT Conference on Generative Programming and Component Engineering (GPCE ’02). Springer-Verlag, London, UK, UK, 66–77. http://dl.acm.org/citation.cfm?id=645435.652647 Google Scholar
Digital Library
- Julian Bangert and Nickolai Zeldovich. 2014. Nail: A Practical Tool for Parsing and Generating Data Formats. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6-8, 2014. 615–628. https://www.usenix.org/conference/osdi14/technical- sessions/presentation/bangert Google Scholar
Digital Library
- Aditi Barthwal and Michael Norrish. 2009. Verified, Executable Parsing. In Programming Languages and Systems, Giuseppe Castagna (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 160–174. Google Scholar
Digital Library
- Jean-Philippe Bernardy and Patrik Jansson. 2016. Certified Context-Free Parsing: A formalisation of Valiant’s Algorithm in Agda. Logical Methods in Computer Science Volume 12, Issue 2 (June 2016).Google Scholar
- Aaron Bohannon, J. Nathan Foster, Benjamin C. Pierce, Alexandre Pilkiewicz, and Alan Schmitt. 2008. Boomerang: Resourceful Lenses for String Data. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’08). ACM, 407–419. Google Scholar
Digital Library
- Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2012. Frama-C: A Software Analysis Perspective. In Proceedings of the 10th International Conference on Software Engineering and Formal Methods (SEFM’12). Springer-Verlag, Berlin, Heidelberg, 233–247. Google Scholar
Digital Library
- Nils Anders Danielsson. 2013. Correct-by-construction Pretty-printing. In Proceedings of the 2013 ACM SIGPLAN workshop on Dependently-typed programming, [email protected] 2013. 1–12. Google Scholar
Digital Library
- Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL ’15. ACM Press, 689–700. Google Scholar
Digital Library
- Edsger W. Dijkstra. 1967. A constructive approach to the problem of program correctness. (Aug. 1967). http://www.cs. utexas.edu/users/EWD/ewd02xx/EWD209.PDF Circulated privately.Google Scholar
- Robert Dockins, Adam Foltzer, Joe Hendrix, Brian Huffman, Dylan McNamee, and Aaron Tomb. 2016. Constructing Semantic Models of Programs with the Software Analysis Workbench. In Verified Software. Theories, Tools, and Experiments, Sandrine Blazy and Marsha Chechik (Eds.). Springer International Publishing, Cham, 56–72.Google Scholar
- Olivier Dubuisson. 2001. ASN. 1: communication between heterogeneous systems. Morgan Kaufmann. Google Scholar
Digital Library
- Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-Specific Language for Processing Ad Hoc Data. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005. 295–304. Google Scholar
Digital Library
- Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. 2006. The Next 700 Data Description Languages. In Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2006, Charleston, South Carolina, USA, January 11-13, 2006. 2–15. Google Scholar
Digital Library
- Pedro Fonseca, Kaiyuan Zhang, Xi Wang, and Arvind Krishnamurthy. 2017. An Empirical Study on the Correctness of Formally Verified Distributed Systems. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys ’17). ACM, New York, NY, USA, 328–343. Google Scholar
Digital Library
- J. Nathan Foster, Alexandre Pilkiewicz, and Benjamin C. Pierce. 2008. Quotient Lenses. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming (ICFP ’08). ACM, 383–396. Google Scholar
Digital Library
- Christopher S. Hardin and Roshan P. James. 2013. Core_bench: micro-benchmarking for OCaml. (2013). https://github. com/janestreet/core_benchGoogle Scholar
- John Hughes. 2000. Generalising Monads to Arrows. Sci. Comput. Program. 37, 1-3 (May 2000), 67–111. Google Scholar
Digital Library
- Shachar Itzhaky, Rohit Singh, Armando Solar-Lezama, Kuat Yessenov, Yongquan Lu, Charles Leiserson, and Rezaul Chowdhury. 2016. Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations. Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2016 (2016). Google Scholar
Digital Library
- Stephen C. Johnson. 1979. Yacc: Yet Another Compiler-Compiler. Technical Report.Google Scholar
- Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR(1) Parsers. In Programming Languages and Systems, Helmut Seidl (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 397–416. Google Scholar
Digital Library
- Andrew J. Kennedy. 2004. Functional Pearl: Pickler Combinators. J. Funct. Program. 14, 6 (Nov. 2004), 727–739. Google Scholar
Digital Library
- Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. 2013. Synthesis modulo recursive functions. In Proc. OOPSLA. 407–426. Google Scholar
Digital Library
- Hsiang-Shang Ko and Zhenjiang Hu. 2017. An Axiomatic Basis for Bidirectional Programming. Proceedings of the ACM on Programming Languages 2, POPL, Article 41 (Dec. 2017), 29 pages. Google Scholar
Digital Library
- Hsiang-Shang Ko, Tao Zan, and Zhenjiang Hu. 2016. BiGUL: a formally verified core language for putback-based bidirectional programming. Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2016 (2016), 61–72. Google Scholar
Digital Library
- Adam Koprowski and Henri Binsztok. 2011. TRX: A Formally Verified Parser Interpreter. Logical Methods in Computer Science 7, 2 (2011).Google Scholar
- Daan Leijen and Erik Meijer. 2001. Parsec: Direct style monadic parser combinators for the real world. (2001).Google Scholar
- Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’13). ACM, New York, NY, USA, 461–472. Google Scholar
Digital Library
- Z. Manna and R. Waldinger. 1979. Synthesis: Dreams ⇒ Programs. IEEE Trans. Softw. Eng. 5, 4 (July 1979), 294–328. Google Scholar
Digital Library
- Kazutaka Matsuda and Meng Wang. 2018. FliPpr: A System for Deriving Parsers from Pretty-Printers. New Generation Computing 36, 3 (01 Jul 2018), 173–202.Google Scholar
- Peter J. McCann and Satish Chandra. 2000. Packet Types: Abstract Specification of Network Protocol Messages. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM ’00). ACM, New York, NY, USA, 321–333. Google Scholar
Digital Library
- Anders Miltner, Kathleen Fisher, Benjamin C. Pierce, David Walker, and Steve Zdancewic. 2017. Synthesizing Bijective Lenses. Proceedings of the ACM on Programming Languages 2, POPL (Dec 2017), 1–30. Google Scholar
Digital Library
- P. Mockapetris. 1987. Domain names - implementation and specification. RFC 1035. Google Scholar
Digital Library
- Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. 2012. RockSalt: Better, Faster, Stronger SFI for the x86. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012. 395–404. Google Scholar
Digital Library
- Shin-Cheng Mu, Zhenjiang Hu, and Masato Takeichi. 2004. An Injective Language for Reversible Computation. In Mathematics of Program Construction, Dexter Kozen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 289–313.Google Scholar
- Ruoming Pang, Vern Paxson, Robin Sommer, and Larry Peterson. 2006. binpac: A yacc for writing application protocol parsers. In Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. ACM, 289–300. Google Scholar
Digital Library
- T. J. Parr and R. W. Quong. 1995. ANTLR: A Predicated-LL(k) Parser Generator. Software: Practice and Experience 25, 7 (July 1995), 789–810. Google Scholar
Digital Library
- Dusko Pavlovic, Peter Pepper, and Douglas R. Smith. 2010. Formal Derivation of Concurrent Garbage Collectors. In Mathematics of Program Construction. Springer Berlin Heidelberg, 353–376. Google Scholar
Digital Library
- Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago ZanellaBéguelin, Antoine Delignat-Lavaud, Catalin Hritcu, Karthikeyan Bhargavan, Cédric Fournet, and Nikhil Swamy. 2017. Verified Low-Level Programming Embedded in F*. PACMPL 1, ICFP (Sept. 2017), 17:1–17:29. Google Scholar
Digital Library
- Tillmann Rendel and Klaus Ostermann. 2010. Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing. In Proceedings of the Third ACM Haskell Symposium on Haskell (Haskell ’10). ACM, New York, NY, USA, 1–12. Google Scholar
Digital Library
- Tom Ridge. 2011. Simple, Functional, Sound and Complete Parsing for All Context-Free Grammars. In Certified Programs and Proofs, Jean-Pierre Jouannaud and Zhong Shao (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 103–118. Google Scholar
Digital Library
- Keith Simmons. 2016. Cheerios. (2016). https://courses.cs.washington.edu/courses/cse599w/16sp/projects/cheerios.pdf.Google Scholar
- Douglas R. Smith and Stephen J. Westfold. 2008. Synthesis of Propositional Satisfiability Solvers. (2008).Google Scholar
- Yellamraju V. Srinivas and Richard Jüllig. 1995. Specware: Formal support for composing software. In Mathematics of Program Construction, Bernhard Möller (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 399–422. Google Scholar
Digital Library
- Raj Srinivasan. 1995. XDR: External data representation standard. Technical Report. Google Scholar
Digital Library
- Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Jean-Karim Zinzindohoue, and Santiago ZanellaBéguelin. 2016. Dependent Types and Multi-monadic Effects in F*. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 256–270. Google Scholar
Digital Library
- Gang Tan and Greg Morrisett. 2018. Bidirectional Grammars for Machine-Code Decoding and Encoding. Journal of Automated Reasoning 60, 3 (01 Mar 2018), 257–277. Google Scholar
Digital Library
- The Coq Development Team. 2018. The Coq Proof Assistant, version 8.7.2. (Feb. 2018).Google Scholar
- Mark Tullsen, Lee Pike, Nathan Collins, and Aaron Tomb. 2018. Formal Verification of a Vehicle-to-Vehicle (V2V) Messaging System. In Computer Aided Verification, Hana Chockler and Georg Weissenbacher (Eds.). Springer International Publishing, Cham, 413–429.Google Scholar
- Marcell van Geest and Wouter Swierstra. 2017. Generic Packet Descriptions: Verified Parsing and Pretty Printing of Low-level Data. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2017). ACM, New York, NY, USA, 30–40. Google Scholar
Digital Library
- Kenton Varda. 2008. Protocol Buffers. https://developers.google.com/protocol-buffers/. (2008).Google Scholar
- Dimitrios Vytiniotis and Andrew J. Kennedy. 2010. Functional Pearl: Every bit counts. Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010 (2010), 15–26. Google Scholar
Digital Library
- Qianchuan Ye and Benjamin Delaware. 2019. A verified protocol buffer compiler. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, Cascais, Portugal, January 14-15, 2019. 222–233. Google Scholar
Digital Library
Index Terms
Narcissus: correct-by-construction derivation of decoders and encoders from binary formats
Recommendations
Correct-schema-guided synthesis of steadfast programs
ASE '97: Proceedings of the 12th international conference on Automated software engineering (formerly: KBSE)It can be argued that for (semi-)automated software development, program schemas are indispensable, since they capture not only structured program design principles but also domain knowledge, both of which are of crucial importance for hierarchical ...
Structure interpretation of text formats
Data repositories often consist of text files in a wide variety of standard formats, ad-hoc formats, as well as mixtures of formats where data in one format is embedded into a different format. It is therefore a significant challenge to parse these ...
Packrat parsing:: simple, powerful, lazy, linear time, functional pearl
ICFP '02: Proceedings of the seventh ACM SIGPLAN international conference on Functional programmingPackrat parsing is a novel technique for implementing parsers in a lazy functional programming language. A packrat parser provides the power and flexibility of top-down parsing with backtracking and unlimited lookahead, but nevertheless guarantees ...






Comments