Abstract
The x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-64 instructions. However, formal semantics for the x86-64 ISA are difficult to obtain and often written manually through great effort. We describe an automatically synthesized formal semantics of the input/output behavior for a large fraction of the x86-64 Haswell ISA’s many thousands of instruction variants. The key to our results is stratified synthesis, where we use a set of instructions whose semantics are known to synthesize the semantics of additional instructions whose semantics are unknown. As the set of formally described instructions increases, the synthesis vocabulary expands, making it possible to synthesize the semantics of increasingly complex instructions. Using this technique we automatically synthesized formal semantics for 1,795 instruction variants of the x86-64 Haswell ISA. We evaluate the learned semantics against manually written semantics (where available) and find that they are formally equivalent with the exception of 50 instructions, where the manually written semantics contain an error. We further find the learned formulas to be largely as precise as manually written ones and of similar size.
- N. Amit, D. Tsafrir, A. Schuster, A. Ayoub, and E. Shlomo. Virtual cpu validation. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP ’15, pages 311–327, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3834-9. doi: 10.1145/2815400.2815420. URL http://doi. acm.org/10.1145/2815400.2815420. Google Scholar
Digital Library
- G. Balakrishnan, R. Gruian, T. Reps, and T. Teitelbaum. Codesurfer/x86—a platform for analyzing x86 executables. In Compiler Construction, pages 250–254. Springer, 2005. Google Scholar
Digital Library
- G. Balakrishnan, R. Gruian, T. W. Reps, and T. Teitelbaum. Codesurfer/x86-a platform for analyzing x86 executables. In Compiler Construction, 14th International Conference, CC 2005, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2005, Edinburgh, UK, April 4-8, 2005, Proceedings, pages 250–254, 2005. Google Scholar
Digital Library
- S. Bansal and A. Aiken. Automatic generation of peephole superoptimizers. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006, pages 394–403, 2006. Google Scholar
Digital Library
- S. Bansal and A. Aiken. Automatic generation of peephole superoptimizers. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pages 394– 403, New York, NY, USA, 2006. ACM. ISBN 1-59593- 451-0. doi: 10.1145/1168857.1168906. URL http: //doi.acm.org/10.1145/1168857.1168906. Google Scholar
Digital Library
- C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic, T. King, A. Reynolds, and C. Tinelli. CVC4. In Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, pages 171–177, 2011. Google Scholar
Digital Library
- D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. BAP: A binary analysis platform. In Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings, pages 463–469, 2011. Google Scholar
Digital Library
- M. Charney. Personal communication, February 2016.Google Scholar
- M. Christodorescu, N. Kidd, and W.-H. Goh. String analysis for x86 binaries. In Proceedings of the Workshop on Program Analysis for Software Tools and Engineering, volume 31, pages 88–95, 2005. Google Scholar
Digital Library
- E. Darulova and V. Kuncak. Sound compilation of reals. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014, pages 235–248, 2014. Google Scholar
Digital Library
- J. K. Feser, S. Chaudhuri, and I. Dillig. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 229–239, 2015. Google Scholar
Digital Library
- P. Godefroid and A. Taly. Automated synthesis of symbolic instruction encodings from i/o samples. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pages 441–452, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1205-9. doi: 10.1145/2254064.2254116. URL http://doi. acm.org/10.1145/2254064.2254116. Google Scholar
Digital Library
- S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, pages 62–73, 2011. Google Scholar
Digital Library
- Intel. Intel 64 and IA-32 Architectures Software Developer Manuals, Revision 325462-057US, December 2015. URL http://www.intel. com/content/www/us/en/processors/ architectures-software-developer-manuals. html.Google Scholar
- S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracleguided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, pages 215–224, 2010. Google Scholar
Digital Library
- J. Kinder and H. Veith. Jakstab: A static analysis platform for binaries. In Computer Aided Verification, 20th International Conference, CAV 2008, Princeton, NJ, USA, July 7-14, 2008, Proceedings, pages 423–427, 2008. Google Scholar
Digital Library
- X. Leroy. The CompCert C Verified Compiler, 2012.Google Scholar
- J. Lim and T. W. Reps. TSL: A system for generating abstract interpreters and its application to machine-code analysis. ACM Trans. Program. Lang. Syst., 35(1):4, 2013. Google Scholar
Digital Library
- A. V. Nori, S. Ozair, S. K. Rajamani, and D. Vijaykeerthy. Efficient synthesis of probabilistic programs. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 208–217, 2015. Google Scholar
Digital Library
- P. Osera and S. Zdancewic. Type-and-example-directed program synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 619–630, 2015. Google Scholar
Digital Library
- D. A. Ramos and D. R. Engler. Practical, Low-Effort Equivalence Verification of Real Code. In Computer Aided Verification, 2011. Google Scholar
Digital Library
- doi: 10.1007/ 978-3-642-22110-1_55. URL http://dx.doi. org/10.1007/978-3-642-22110-1_55.Google Scholar
- J. Regehr and U. Duongsaa. Deriving abstract transfer functions for analyzing embedded software. In Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’06), Ottawa, Ontario, Canada, June 14-16, 2006, pages 34–43, 2006. Google Scholar
Digital Library
- J. Regehr and A. Reid. HOIST: a system for automatically deriving static analyzers for embedded systems. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2004, Boston, MA, USA, October 7-13, 2004, pages 133–143, 2004. Google Scholar
Digital Library
- T. Reps and G. Balakrishnan. Improved memory-access analysis for x86 executables. In Compiler Construction, pages 16–35. Springer, 2008. Google Scholar
Digital Library
- T. W. Reps, S. Sagiv, and G. Yorsh. Symbolic implementation of the best transformer. In Verification, Model Checking, and Abstract Interpretation, 5th International Conference, VMCAI 2004, Venice, January 11-13, 2004, Proceedings, pages 252– 266, 2004.Google Scholar
- E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. In Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, Houston, TX, USA - March 16 - 20, 2013, pages 305–316, 2013. Google Scholar
Digital Library
- E. Schkufza, R. Sharma, and A. Aiken. Stochastic optimization of floating-point programs with tunable precision. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, page 9, 2014. Google Scholar
Digital Library
- A. Solar-Lezama, R. M. Rabbah, R. Bod´ık, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005, pages 281–294, 2005. Google Scholar
Digital Library
- D. X. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information Systems Security, 4th International Conference, ICISS 2008, Hyderabad, India, December 16-20, 2008. Proceedings, pages 1–25, 2008. Google Scholar
Digital Library
- V. Srinivasan and T. W. Reps. Synthesis of machine code from semantics. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 596–607, 2015. Google Scholar
Digital Library
- A. V. Thakur, J. Lim, A. Lal, A. Burton, E. Driscoll, M. Elder, T. Andersen, and T. W. Reps. Directed proof generation for machine code. In Computer Aided Verification, 22nd International Conference, CAV 2010, Edinburgh, UK, July 15-19, 2010. Proceedings, pages 288–305, 2010. Google Scholar
Digital Library
- C. M. Wintersteiger, Y. Hamadi, and L. M. de Moura. Efficiently solving quantified bit-vector formulas. Formal Methods in System Design, 42(1):3–23, 2013. Google Scholar
Digital Library
Index Terms
Stratified synthesis: automatically learning the x86-64 instruction set
Recommendations
A complete formal semantics of x86-64 user-level instruction set architecture
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe present the most complete and thoroughly tested formal semantics of x86-64 to date. Our semantics faithfully formalizes all the non-deprecated, sequential user-level instructions of the x86-64 Haswell instruction set architecture. This totals 3155 ...
Stratified synthesis: automatically learning the x86-64 instruction set
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationThe x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-...
Formally verified big step semantics out of x86-64 binaries
CPP 2019: Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and ProofsThis paper presents a methodology for generating formally proven equivalence theorems between decompiled x86-64 machine code and big step semantics. These proofs are built on top of two additional contributions. First, a robust and tested formal x86-64 ...







Comments