skip to main content
research-article
Open Access

Model-assisted machine-code synthesis

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Binary rewriters are tools that are used to modify the functionality of binaries lacking source code. Binary rewriters can be used to rewrite binaries for a variety of purposes including optimization, hardening, and extraction of executable components. To rewrite a binary based on semantic criteria, an essential primitive to have is a machine-code synthesizer---a tool that synthesizes an instruction sequence from a specification of the desired behavior, often given as a formula in quantifier-free bit-vector logic (QFBV). However, state-of-the-art machine-code synthesizers such as McSynth++ employ naive search strategies for synthesis: McSynth++ merely enumerates candidates of increasing length without performing any form of prioritization. This inefficient search strategy is compounded by the huge number of unique instruction schemas in instruction sets (e.g., around 43,000 in Intel's IA-32) and the exponential cost inherent in enumeration. The effect is slow synthesis: even for relatively small specifications, McSynth++ might take several minutes or a few hours to find an implementation.

In this paper, we describe how we use machine learning to make the search in McSynth++ smarter and potentially faster. We converted the linear search in McSynth++ into a best-first search over the space of instruction sequences. The cost heuristic for the best-first search comes from two models---used together---built from a corpus of 〈QFBV-formula, instruction-sequence〉 pairs: (i) a language model that favors useful instruction sequences, and (ii) a regression model that correlates features of instruction sequences with features of QFBV formulas, and favors instruction sequences that are more likely to implement the input formula. Our experiments for IA-32 showed that our model-assisted synthesizer enables synthesis of code for 6 out of 50 formulas on which McSynth++ times out, speeding up the synthesis time by at least 549X, and for the remaining formulas, speeds up synthesis by 4.55X.

References

  1. M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. 2005. Control-flow Integrity. In CCS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Balakrishnan and T. Reps. 2010. WYSINWYX: What You See Is Not What You eXecute. TOPLAS 32, 6 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Bansal and A. Aiken. 2006. Automatic Generation of Peephole Superoptimizers. In ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bansal and A. Aiken. 2008. Binary Translation Using Peephole Superoptimizers. In OSDI.Google ScholarGoogle Scholar
  5. D. Brumley, I. Jager, T. Avgerinos, and E. Schwartz. 2011. BAP: A Binary Analysis Platform. In CAV. Google ScholarGoogle ScholarCross RefCross Ref
  6. B. Dutertre and L. de Moura. 2006. Yices: An SMT Solver. (2006). http://yices.csl.sri.com/.Google ScholarGoogle Scholar
  7. K. ElWazeer, K. Anand, A. Kotha, M. Smithson, and R. Barua. 2013. Scalable Variable and Data Type Detection in a Binary Rewriter. In PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Gvero and V. Kuncak. 2015. Synthesizing Java expressions from free-form queries. In OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Henning. 2006. SPEC CP U2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (2006), 1–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Hsu and J. Glass. 2008. Iterative Language Model Estimation: Efficient Data Structure and Algorithms. In Interspeech.Google ScholarGoogle Scholar
  11. R. Joshi, G. Nelson, and K. Randall. 2002. Denali: A Goal-directed Superoptimizer. In PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Lim, A. Lal, and T. Reps. 2011. Symbolic Analysis via Semantic Reinterpretation. Softw. Tools for Tech. Transfer 13, 1 (2011), 61–87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Lim and T. Reps. 2013. TSL: A system for generating abstract interpreters and its application to machine-code analysis. TOPLAS 35, 4 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Massalin. 1987. Superoptimizer: A Look at the Smallest Program. In ASPLOS.Google ScholarGoogle Scholar
  15. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Phothilimthana, A. Thakur, R. Bodik, and D. Ghurjati. 2016a. GreenThumb: Superoptimizer Construction Framework. UCB/EECS-2016-8. University of California–Berkeley Tech Report. http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/ EECS- 2016- 8.pdfGoogle ScholarGoogle Scholar
  17. P. Phothilimthana, A. Thakur, R. Bodik, and D. Ghurjati. 2016b. Scaling up Superoptimization. In ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Raychev, M. Vechev, and A. Krause. 2015. Predicting Program Properties from“Big Code". In POPL.Google ScholarGoogle Scholar
  19. V. Raychev, M. Vechev, and E. Yahav. 2014. Code Completion with Statistical Language Models. In PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Saïdi. 2008. Logical Foundation for Static Analysis: Application to Binary Static Analysis for Security. ACM SIGAda Ada Letters 28, 1 (2008), 96–102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Schkufza, R. Sharma, and A. Aiken. 2013. Stochastic Superoptimization. In ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Slowinska, T. Stancescu, and H. Bos. 2012. Body Armor for Binaries: Preventing Buffer Overflows Without Recompilation. In ATC.Google ScholarGoogle Scholar
  23. D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. 2008. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Int. Conf. on Information Systems Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Srinivasan and T. Reps. 2015a. Partial Evaluation of Machine Code. In OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Srinivasan and T. Reps. 2015b. Synthesis of Machine Code from Semantics. In PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. Srinivasan and T. Reps. 2016. An Improved Algorithm for Slicing Machince Code. In OOPSLA.Google ScholarGoogle Scholar
  27. V. Srinivasan, T. Sharma, and T. Reps. 2016. Speeding-up Machine-Code Synthesis. In OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. 2015. A Generic Approach to Automatic Deobfuscation of Executable Code. In S&P. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Model-assisted machine-code synthesis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!