Abstract
Binary rewriters are tools that are used to modify the functionality of binaries lacking source code. Binary rewriters can be used to rewrite binaries for a variety of purposes including optimization, hardening, and extraction of executable components. To rewrite a binary based on semantic criteria, an essential primitive to have is a machine-code synthesizer---a tool that synthesizes an instruction sequence from a specification of the desired behavior, often given as a formula in quantifier-free bit-vector logic (QFBV). However, state-of-the-art machine-code synthesizers such as McSynth++ employ naive search strategies for synthesis: McSynth++ merely enumerates candidates of increasing length without performing any form of prioritization. This inefficient search strategy is compounded by the huge number of unique instruction schemas in instruction sets (e.g., around 43,000 in Intel's IA-32) and the exponential cost inherent in enumeration. The effect is slow synthesis: even for relatively small specifications, McSynth++ might take several minutes or a few hours to find an implementation.
In this paper, we describe how we use machine learning to make the search in McSynth++ smarter and potentially faster. We converted the linear search in McSynth++ into a best-first search over the space of instruction sequences. The cost heuristic for the best-first search comes from two models---used together---built from a corpus of 〈QFBV-formula, instruction-sequence〉 pairs: (i) a language model that favors useful instruction sequences, and (ii) a regression model that correlates features of instruction sequences with features of QFBV formulas, and favors instruction sequences that are more likely to implement the input formula. Our experiments for IA-32 showed that our model-assisted synthesizer enables synthesis of code for 6 out of 50 formulas on which McSynth++ times out, speeding up the synthesis time by at least 549X, and for the remaining formulas, speeds up synthesis by 4.55X.
- M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. 2005. Control-flow Integrity. In CCS. Google Scholar
Digital Library
- G. Balakrishnan and T. Reps. 2010. WYSINWYX: What You See Is Not What You eXecute. TOPLAS 32, 6 (2010). Google Scholar
Digital Library
- S. Bansal and A. Aiken. 2006. Automatic Generation of Peephole Superoptimizers. In ASPLOS. Google Scholar
Digital Library
- S. Bansal and A. Aiken. 2008. Binary Translation Using Peephole Superoptimizers. In OSDI.Google Scholar
- D. Brumley, I. Jager, T. Avgerinos, and E. Schwartz. 2011. BAP: A Binary Analysis Platform. In CAV. Google Scholar
Cross Ref
- B. Dutertre and L. de Moura. 2006. Yices: An SMT Solver. (2006). http://yices.csl.sri.com/.Google Scholar
- K. ElWazeer, K. Anand, A. Kotha, M. Smithson, and R. Barua. 2013. Scalable Variable and Data Type Detection in a Binary Rewriter. In PLDI. Google Scholar
Digital Library
- T. Gvero and V. Kuncak. 2015. Synthesizing Java expressions from free-form queries. In OOPSLA. Google Scholar
Digital Library
- J. Henning. 2006. SPEC CP U2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (2006), 1–17. Google Scholar
Digital Library
- B. Hsu and J. Glass. 2008. Iterative Language Model Estimation: Efficient Data Structure and Algorithms. In Interspeech.Google Scholar
- R. Joshi, G. Nelson, and K. Randall. 2002. Denali: A Goal-directed Superoptimizer. In PLDI. Google Scholar
Digital Library
- J. Lim, A. Lal, and T. Reps. 2011. Symbolic Analysis via Semantic Reinterpretation. Softw. Tools for Tech. Transfer 13, 1 (2011), 61–87.Google Scholar
Digital Library
- J. Lim and T. Reps. 2013. TSL: A system for generating abstract interpreters and its application to machine-code analysis. TOPLAS 35, 4 (2013). Google Scholar
Digital Library
- H. Massalin. 1987. Superoptimizer: A Look at the Smallest Program. In ASPLOS.Google Scholar
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google Scholar
Digital Library
- P. Phothilimthana, A. Thakur, R. Bodik, and D. Ghurjati. 2016a. GreenThumb: Superoptimizer Construction Framework. UCB/EECS-2016-8. University of California–Berkeley Tech Report. http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/ EECS- 2016- 8.pdfGoogle Scholar
- P. Phothilimthana, A. Thakur, R. Bodik, and D. Ghurjati. 2016b. Scaling up Superoptimization. In ASPLOS. Google Scholar
Digital Library
- V. Raychev, M. Vechev, and A. Krause. 2015. Predicting Program Properties from“Big Code". In POPL.Google Scholar
- V. Raychev, M. Vechev, and E. Yahav. 2014. Code Completion with Statistical Language Models. In PLDI. Google Scholar
Digital Library
- H. Saïdi. 2008. Logical Foundation for Static Analysis: Application to Binary Static Analysis for Security. ACM SIGAda Ada Letters 28, 1 (2008), 96–102. Google Scholar
Digital Library
- E. Schkufza, R. Sharma, and A. Aiken. 2013. Stochastic Superoptimization. In ASPLOS. Google Scholar
Digital Library
- A. Slowinska, T. Stancescu, and H. Bos. 2012. Body Armor for Binaries: Preventing Buffer Overflows Without Recompilation. In ATC.Google Scholar
- D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. 2008. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Int. Conf. on Information Systems Security. Google Scholar
Digital Library
- V. Srinivasan and T. Reps. 2015a. Partial Evaluation of Machine Code. In OOPSLA. Google Scholar
Digital Library
- V. Srinivasan and T. Reps. 2015b. Synthesis of Machine Code from Semantics. In PLDI. Google Scholar
Digital Library
- V. Srinivasan and T. Reps. 2016. An Improved Algorithm for Slicing Machince Code. In OOPSLA.Google Scholar
- V. Srinivasan, T. Sharma, and T. Reps. 2016. Speeding-up Machine-Code Synthesis. In OOPSLA. Google Scholar
Digital Library
- B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. 2015. A Generic Approach to Automatic Deobfuscation of Executable Code. In S&P. Google Scholar
Digital Library
Index Terms
Model-assisted machine-code synthesis
Recommendations
Speeding up machine-code synthesis
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsMachine-code synthesis is the problem of searching for an instruction sequence that implements a semantic specification, given as a formula in quantifier-free bit-vector logic (QFBV). Instruction sets like Intel's IA-32 have around 43,000 unique ...
Speeding up machine-code synthesis
OOPSLA '16Machine-code synthesis is the problem of searching for an instruction sequence that implements a semantic specification, given as a formula in quantifier-free bit-vector logic (QFBV). Instruction sets like Intel's IA-32 have around 43,000 unique ...
Synthesis of machine code from semantics
PLDI '15In this paper, we present a technique to synthesize machine-code instructions from a semantic specification, given as a Quantifier-Free Bit-Vector (QFBV) logic formula. Our technique uses an instantiation of the Counter-Example Guided Inductive ...






Comments