skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Narcissus: correct-by-construction derivation of decoders and encoders from binary formats

Published:26 July 2019Publication History
Skip Abstract Section

Abstract

It is a neat result from functional programming that libraries of parser combinators can support rapid construction of decoders for quite a range of formats. With a little more work, the same combinator program can denote both a decoder and an encoder. Unfortunately, the real world is full of gnarly formats, as with the packet formats that make up the standard Internet protocol stack. Most past parser-combinator approaches cannot handle these formats, and the few exceptions require redundancy – one part of the natural grammar needs to be hand-translated into hints in multiple parts of a parser program. We show how to recover very natural and nonredundant format specifications, covering all popular network packet formats and generating both decoders and encoders automatically. The catch is that we use the Coq proof assistant to derive both kinds of artifacts using tactics, automatically, in a way that guarantees that they form inverses of each other. We used our approach to reimplement packet processing for a full Internet protocol stack, inserting our replacement into the OCaml-based MirageOS unikernel, resulting in minimal performance degradation.

Skip Supplemental Material Section

Supplemental Material

a82-pit-claudel.webm

References

  1. 2013a. CVE-2012-5965: Stack-based buffer overflow in the unique_service_name function in ssdp/ssdp_server.c in the SSDP parser in the portable SDK for UPnP Devices 1.3.1 allows remote attackers to execute arbitrary code via a long DeviceType field in a UDP packet. (Jan. 2013). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2012- 5965Google ScholarGoogle Scholar
  2. 2013b. CVE-2013-1203: Cisco ASA CX Context-Aware Security Software allows remote attackers to cause a denial of service (device reload) via crafted TCP packets that appear to have been forwarded by a Cisco Adaptive Security Appliances device. (May 2013). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2013- 1203Google ScholarGoogle Scholar
  3. 2015. CVE-2015-0618: Cisco IOS XR 5.0.1 and 5.2.1 on Network Convergence System 6000 devices and 5.1.3 and 5.1.4 on Carrier Routing System X devices allows remote attackers to cause a denial of service via malformed IPv6 packets with extension headers. (Feb. 2015). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2015- 0618Google ScholarGoogle Scholar
  4. 2016. CVE-2016-5080: Integer overflow in the rtxMemHeapAlloc function in asn1rt_a.lib in Objective Systems ASN1C for C/C++ before 7.0.2 allows context-dependent attackers to execute arbitrary code or cause a denial of service, on a system running an application compiled by ASN1C, via crafted ASN.1 data. (July 2016). https://cve.mitre.org/cgibin/cvename.cgi?name=CVE- 2016- 5080Google ScholarGoogle Scholar
  5. Artem Alimarine, Sjaak Smetsers, Arjen van Weelden, Marko van Eekelen, and Rinus Plasmeijer. 2005. There and Back Again: Arrows for Invertible Programming. In Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell (Haskell ’05). ACM, New York, NY, USA, 86–97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nada Amin and Tiark Rompf. 2017. LMS-Verify: Abstraction Without Regret for Verified Systems Programming. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 859–873. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Apache Software Foundation. 2016. Apache Avro 1.8.0 Documentation. (2016). http://avro.apache.org/docs/current/ {Accessed May 04, 2016}.Google ScholarGoogle Scholar
  8. Godmar Back. 2002. DataScript - A Specification and Scripting Language for Binary Data. In Proceedings of the 1st ACM SIGPLAN/SIGSOFT Conference on Generative Programming and Component Engineering (GPCE ’02). Springer-Verlag, London, UK, UK, 66–77. http://dl.acm.org/citation.cfm?id=645435.652647 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Julian Bangert and Nickolai Zeldovich. 2014. Nail: A Practical Tool for Parsing and Generating Data Formats. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6-8, 2014. 615–628. https://www.usenix.org/conference/osdi14/technical- sessions/presentation/bangert Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aditi Barthwal and Michael Norrish. 2009. Verified, Executable Parsing. In Programming Languages and Systems, Giuseppe Castagna (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 160–174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jean-Philippe Bernardy and Patrik Jansson. 2016. Certified Context-Free Parsing: A formalisation of Valiant’s Algorithm in Agda. Logical Methods in Computer Science Volume 12, Issue 2 (June 2016).Google ScholarGoogle Scholar
  12. Aaron Bohannon, J. Nathan Foster, Benjamin C. Pierce, Alexandre Pilkiewicz, and Alan Schmitt. 2008. Boomerang: Resourceful Lenses for String Data. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’08). ACM, 407–419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2012. Frama-C: A Software Analysis Perspective. In Proceedings of the 10th International Conference on Software Engineering and Formal Methods (SEFM’12). Springer-Verlag, Berlin, Heidelberg, 233–247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nils Anders Danielsson. 2013. Correct-by-construction Pretty-printing. In Proceedings of the 2013 ACM SIGPLAN workshop on Dependently-typed programming, [email protected] 2013. 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL ’15. ACM Press, 689–700. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Edsger W. Dijkstra. 1967. A constructive approach to the problem of program correctness. (Aug. 1967). http://www.cs. utexas.edu/users/EWD/ewd02xx/EWD209.PDF Circulated privately.Google ScholarGoogle Scholar
  17. Robert Dockins, Adam Foltzer, Joe Hendrix, Brian Huffman, Dylan McNamee, and Aaron Tomb. 2016. Constructing Semantic Models of Programs with the Software Analysis Workbench. In Verified Software. Theories, Tools, and Experiments, Sandrine Blazy and Marsha Chechik (Eds.). Springer International Publishing, Cham, 56–72.Google ScholarGoogle Scholar
  18. Olivier Dubuisson. 2001. ASN. 1: communication between heterogeneous systems. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-Specific Language for Processing Ad Hoc Data. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005. 295–304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. 2006. The Next 700 Data Description Languages. In Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2006, Charleston, South Carolina, USA, January 11-13, 2006. 2–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pedro Fonseca, Kaiyuan Zhang, Xi Wang, and Arvind Krishnamurthy. 2017. An Empirical Study on the Correctness of Formally Verified Distributed Systems. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys ’17). ACM, New York, NY, USA, 328–343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Nathan Foster, Alexandre Pilkiewicz, and Benjamin C. Pierce. 2008. Quotient Lenses. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming (ICFP ’08). ACM, 383–396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christopher S. Hardin and Roshan P. James. 2013. Core_bench: micro-benchmarking for OCaml. (2013). https://github. com/janestreet/core_benchGoogle ScholarGoogle Scholar
  24. John Hughes. 2000. Generalising Monads to Arrows. Sci. Comput. Program. 37, 1-3 (May 2000), 67–111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shachar Itzhaky, Rohit Singh, Armando Solar-Lezama, Kuat Yessenov, Yongquan Lu, Charles Leiserson, and Rezaul Chowdhury. 2016. Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations. Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2016 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stephen C. Johnson. 1979. Yacc: Yet Another Compiler-Compiler. Technical Report.Google ScholarGoogle Scholar
  27. Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR(1) Parsers. In Programming Languages and Systems, Helmut Seidl (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 397–416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andrew J. Kennedy. 2004. Functional Pearl: Pickler Combinators. J. Funct. Program. 14, 6 (Nov. 2004), 727–739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. 2013. Synthesis modulo recursive functions. In Proc. OOPSLA. 407–426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hsiang-Shang Ko and Zhenjiang Hu. 2017. An Axiomatic Basis for Bidirectional Programming. Proceedings of the ACM on Programming Languages 2, POPL, Article 41 (Dec. 2017), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hsiang-Shang Ko, Tao Zan, and Zhenjiang Hu. 2016. BiGUL: a formally verified core language for putback-based bidirectional programming. Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2016 (2016), 61–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Adam Koprowski and Henri Binsztok. 2011. TRX: A Formally Verified Parser Interpreter. Logical Methods in Computer Science 7, 2 (2011).Google ScholarGoogle Scholar
  33. Daan Leijen and Erik Meijer. 2001. Parsec: Direct style monadic parser combinators for the real world. (2001).Google ScholarGoogle Scholar
  34. Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’13). ACM, New York, NY, USA, 461–472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. Manna and R. Waldinger. 1979. Synthesis: Dreams ⇒ Programs. IEEE Trans. Softw. Eng. 5, 4 (July 1979), 294–328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kazutaka Matsuda and Meng Wang. 2018. FliPpr: A System for Deriving Parsers from Pretty-Printers. New Generation Computing 36, 3 (01 Jul 2018), 173–202.Google ScholarGoogle Scholar
  37. Peter J. McCann and Satish Chandra. 2000. Packet Types: Abstract Specification of Network Protocol Messages. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM ’00). ACM, New York, NY, USA, 321–333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Anders Miltner, Kathleen Fisher, Benjamin C. Pierce, David Walker, and Steve Zdancewic. 2017. Synthesizing Bijective Lenses. Proceedings of the ACM on Programming Languages 2, POPL (Dec 2017), 1–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Mockapetris. 1987. Domain names - implementation and specification. RFC 1035. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. 2012. RockSalt: Better, Faster, Stronger SFI for the x86. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012. 395–404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shin-Cheng Mu, Zhenjiang Hu, and Masato Takeichi. 2004. An Injective Language for Reversible Computation. In Mathematics of Program Construction, Dexter Kozen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 289–313.Google ScholarGoogle Scholar
  42. Ruoming Pang, Vern Paxson, Robin Sommer, and Larry Peterson. 2006. binpac: A yacc for writing application protocol parsers. In Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. ACM, 289–300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. T. J. Parr and R. W. Quong. 1995. ANTLR: A Predicated-LL(k) Parser Generator. Software: Practice and Experience 25, 7 (July 1995), 789–810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Dusko Pavlovic, Peter Pepper, and Douglas R. Smith. 2010. Formal Derivation of Concurrent Garbage Collectors. In Mathematics of Program Construction. Springer Berlin Heidelberg, 353–376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago ZanellaBéguelin, Antoine Delignat-Lavaud, Catalin Hritcu, Karthikeyan Bhargavan, Cédric Fournet, and Nikhil Swamy. 2017. Verified Low-Level Programming Embedded in F*. PACMPL 1, ICFP (Sept. 2017), 17:1–17:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tillmann Rendel and Klaus Ostermann. 2010. Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing. In Proceedings of the Third ACM Haskell Symposium on Haskell (Haskell ’10). ACM, New York, NY, USA, 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tom Ridge. 2011. Simple, Functional, Sound and Complete Parsing for All Context-Free Grammars. In Certified Programs and Proofs, Jean-Pierre Jouannaud and Zhong Shao (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 103–118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Keith Simmons. 2016. Cheerios. (2016). https://courses.cs.washington.edu/courses/cse599w/16sp/projects/cheerios.pdf.Google ScholarGoogle Scholar
  49. Douglas R. Smith and Stephen J. Westfold. 2008. Synthesis of Propositional Satisfiability Solvers. (2008).Google ScholarGoogle Scholar
  50. Yellamraju V. Srinivas and Richard Jüllig. 1995. Specware: Formal support for composing software. In Mathematics of Program Construction, Bernhard Möller (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 399–422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Raj Srinivasan. 1995. XDR: External data representation standard. Technical Report. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Jean-Karim Zinzindohoue, and Santiago ZanellaBéguelin. 2016. Dependent Types and Multi-monadic Effects in F*. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 256–270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Gang Tan and Greg Morrisett. 2018. Bidirectional Grammars for Machine-Code Decoding and Encoding. Journal of Automated Reasoning 60, 3 (01 Mar 2018), 257–277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. The Coq Development Team. 2018. The Coq Proof Assistant, version 8.7.2. (Feb. 2018).Google ScholarGoogle Scholar
  55. Mark Tullsen, Lee Pike, Nathan Collins, and Aaron Tomb. 2018. Formal Verification of a Vehicle-to-Vehicle (V2V) Messaging System. In Computer Aided Verification, Hana Chockler and Georg Weissenbacher (Eds.). Springer International Publishing, Cham, 413–429.Google ScholarGoogle Scholar
  56. Marcell van Geest and Wouter Swierstra. 2017. Generic Packet Descriptions: Verified Parsing and Pretty Printing of Low-level Data. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2017). ACM, New York, NY, USA, 30–40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Kenton Varda. 2008. Protocol Buffers. https://developers.google.com/protocol-buffers/. (2008).Google ScholarGoogle Scholar
  58. Dimitrios Vytiniotis and Andrew J. Kennedy. 2010. Functional Pearl: Every bit counts. Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010 (2010), 15–26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Qianchuan Ye and Benjamin Delaware. 2019. A verified protocol buffer compiler. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, Cascais, Portugal, January 14-15, 2019. 222–233. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Narcissus: correct-by-construction derivation of decoders and encoders from binary formats

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!