Abstract
Symbolic execution is a key component of precise binary program analysis tools. We discuss how to automatically boot-strap the construction of a symbolic execution engine for a processor instruction set such as x86, x64 or ARM. We show how to automatically synthesize symbolic representations of individual processor instructions from input/output examples and express them as bit-vector constraints. We present and compare various synthesis algorithms and instruction sampling strategies. We introduce a new synthesis algorithm based on smart sampling which we show is one to two orders of magnitude faster than previous synthesis algorithms in our context. With this new algorithm, we can automatically synthesize bit-vector circuits for over 500 x86 instructions (8/16/32-bits, outputs, EFLAGS) using only 6 synthesis templates and in less than two hours using the Z3 SMT solver on a regular machine. During this work, we also discovered several inconsistencies across x86 processors, errors in the x86 Intel spec, and several bugs in previous manually-written x86 instruction handlers.
- D. Brumley, I. Jager, Th. Avgerinos, and E. J. Schwartz. BAP: A Binary Analysis Platform. In CAV'2011, July 2011. Google Scholar
Digital Library
- A. Chlipala. Modular Development of Certified Program Verifiers with a Proof Assistant. In ICFP 2006, September 2006. Google Scholar
Digital Library
- L. de Moura and N. Bjorner. Z3: An Efficient SMT Solver. In TACAS 2008, April 2008. Google Scholar
Digital Library
- P. Godefroid and J. Kinder. Proving Memory Safety of Floating-Point Computations by Combining Static and Dynamic Program Analysis. In ISSTA 2010, July 2010. Google Scholar
Digital Library
- P. Godefroid, M.Y. Levin, and D. Molnar. Active Property Checking. In EMSOFT 2008, October 2008. Google Scholar
Digital Library
- P. Godefroid, M.Y. Levin, and D. Molnar. Automated Whitebox Fuzz Testing. In NDSS 2008, February 2008.Google Scholar
- S. A. Goldman and M. J. Kearns. On the Complexity of Teaching. Journal of Computer and System Sciences, 50:303--314, 1992. Google Scholar
Digital Library
- S. Gulwani. Automating String Processing in Spreadsheets using Input-Output Examples. In POPL 2011, January 2011. Google Scholar
Digital Library
- S. Gulwani, V. A. Korthikanti, and A. Tiwari. Synthesizing Geometry Constructions. In PLDI 2011, May 2011. Google Scholar
Digital Library
- W. C. Hsieh, D. R. Engler, and G. Back. Reverse-Engineering Instruction Encodings. In USENIX 2001, June 2001. Google Scholar
Digital Library
- N. Immerman. Descriptive complexity. Springer, 1999.Google Scholar
Cross Ref
- S. Itzhaky, S. Gulwani, N. Immerman, and M. Sagiv. A Simple Inductive Synthesis Methodology and its Applications. In OOPSLA 2010, October 2010. Google Scholar
Digital Library
- S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-Guided Component-Based Program Synthesis. In ICSE 2010, May 2010. Google Scholar
Digital Library
- W. Ma, A. Forin, and J. Liu. Rapid Prototyping and Compact Testing of CPU Emulators. In Proceedings of the 21st IEEE International Symposium on Rapid System Prototyping, June 2010.Google Scholar
- L. Martignoni, R. Paleari, G. Fresi Roglia, and D. Bruschi. Testing CPU Emulators. In ISSTA 2009, July 2009. Google Scholar
Digital Library
- L. Martignoni, R. Paleari, G. Fresi Roglia, and D. Bruschi. Testing System Virtual Machines. In ISSTA 2010, July 2010. Google Scholar
Digital Library
- D. Molnar, X. C. Li, and D. Wagner. Dynamic Test Generation To Find Integer Bugs in x86 Binary Linux Programs. In Proc. of the 18th Usenix Security Symposium, August 2009. Google Scholar
Digital Library
- J. Regehr and U. Duongsaa. Deriving Abstract Transfer Functions for Analyzing Embedded Software. In LCTES 2006, 2006. Google Scholar
Digital Library
- J. Regehr and A. Reid. HOIST: A System for Automatically Deriving Static Analyzers for Embedded Systems. In ASPLOS 2004, 2004. Google Scholar
Digital Library
- S. Sarkar, P. Sewell, F. Zappa Nardelli, S. Owens, T. Ridge, Th. Braibant, M. O. Myreen, and J. Aglave. The Semantics of x86-CC Multiprocessor Machine Code. In POPL 2009, January 2009. Google Scholar
Digital Library
- A. Solar-Lezama, R. M. Rabbah, R. Bodík, and K. Ebcioglu. Programming by Sketching for Bit-Streaming Programs. In PLDI 2005, May 2005. Google Scholar
Digital Library
- A. Solar-Lezama, L. Tancau, R. Bodík, S. A. Seshia, and V. A. Saraswat. Combinatorial Sketching for Finite Programs. In ASPLOS 2006, 2006. Google Scholar
Digital Library
- D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. BitBlaze: A New Approach to Computer Security via Binary Analysis. In ICISS 2008, December 2008. Google Scholar
Digital Library
- A. Taly, S. Gulwani, and A. Tiwari. Synthesizing Switching Logic Using Constraint Solving. In VMCAI 2009, January 2009. Google Scholar
Digital Library
- A. Taly and A. Tiwari. Switching Logic Synthesis for Reachability. In EMSOFT 2010, October 2010. Google Scholar
Digital Library
Index Terms
Automated synthesis of symbolic instruction encodings from I/O samples
Recommendations
Stratified synthesis: automatically learning the x86-64 instruction set
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationThe x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-...
Automated synthesis of symbolic instruction encodings from I/O samples
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationSymbolic execution is a key component of precise binary program analysis tools. We discuss how to automatically boot-strap the construction of a symbolic execution engine for a processor instruction set such as x86, x64 or ARM. We show how to ...
Stratified synthesis: automatically learning the x86-64 instruction set
PLDI '16The x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-...







Comments