skip to main content
article

Statistical similarity of binaries

Published:02 June 2016Publication History
Skip Abstract Section

Abstract

We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled using different compilers, or has been modified. The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures. We have implemented our approach in a tool called Esh, and applied it to find various prominent vulnerabilities across compilers and versions, including Heartbleed, Shellshock and Venom. We show that Esh produces high accuracy results, with few to no false positives -- a crucial factor in the scenario of vulnerability search in stripped binaries.

References

  1. Clobberingtime: Cves, and a ffected products. http://www. kb.cert.org/vuls/id/852879.Google ScholarGoogle Scholar
  2. Gnu coreutils. http://www.gnu.org/software/ coreutils.Google ScholarGoogle Scholar
  3. Heartbleed vulnerability cve information. https: //cve.mitre.org/cgi-bin/cvename.cgi?name= CVE-2014-0160.Google ScholarGoogle Scholar
  4. Hex-rays IDAPRO. http://www.hex-rays.com.Google ScholarGoogle Scholar
  5. Smack: A bounded software verifier for c programs. https: //github.com/smackers/smack.Google ScholarGoogle Scholar
  6. Venom vulnerability cve information. http://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2015-3456.Google ScholarGoogle Scholar
  7. zynamics bindi ff. http://www.zynamics.com/bindiff. html.Google ScholarGoogle Scholar
  8. zynamics bindi ff manual - understanding bindiff. www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.Google ScholarGoogle Scholar
  9. Aiken, A. Moss. https://theory.stanford.edu/ ~aiken/moss/.Google ScholarGoogle Scholar
  10. Barnett, M., Chang, B. E., DeLine, R., Jacobs, B., and Leino, K. R. M. Boogie: A modular reusable verifier for objectoriented programs. In Formal Methods for Components and Objects, 4th International Symposium, FMCO 2005, Amsterdam, The Netherlands, November 1-4, 2005, Revised Lectures (2005), pp. 364–387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Boiman, O., and Irani, M. Similarity by composition. In NIPS (2006), MIT Press, pp. 177–184.Google ScholarGoogle Scholar
  12. Brumley, D., Jager, I., Avgerinos, T., and Schwartz, E. J. Bap: A binary analysis platformIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 463–469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David, Y., and Yahav, E. Tracelet-based code search in executablesIn Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014), PLDI ’14, ACM, pp. 349–360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Egele, M., Woo, M., Chapman, P., and Brumley, D. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. (2014), pp. 303–317.Google ScholarGoogle Scholar
  16. Ferrante, J., Ottenstein, K. J., and Warren, J. D. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hawblitzel, C., Lahiri, S. K., Pawar, K., Hashmi, H., Gokbulut, S., Fernando, L., Detlefs, D., and Wadsworth, S. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013 (2013), pp. 191–201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jacobson, E. R., Rosenblum, N., and Miller, B. P. Labeling library functions in stripped binariesIn Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (2011), PASTE ’11, ACM, pp. 1–8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Khoo, W. M., Mycroft, A., and Anderson, R. Rendezvous: A search engine for binary codeIn Proceedings of the 10th Working Conference on Mining Software Repositories (2013), MSR ’13, IEEE Press, pp. 329–338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kleinbaum, D. G., and Klein, M. Analysis of Matched Data Using Logistic Regression. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. Lahiri, S. K., Sinha, R., and Hawblitzel, C. Automatic rootcausing for program equivalence failures in binaries. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I (2015), pp. 362–379.Google ScholarGoogle Scholar
  22. Lattner, C., and Adve, V. Llvm: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on (2004), IEEE, pp. 75–86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leino, K. R. M. This is boogie 2. http://research. microsoft.com/en-us/um/people/leino/papers/ krml178.pdf.Google ScholarGoogle Scholar
  24. Ng, B. H., and Prakash, A. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual (July 2013), pp. 492–501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Partush, N., and Yahav, E. Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, ch. Abstract Semantic Di fferencing for Numerical Programs, pp. 238–258.Google ScholarGoogle Scholar
  26. Partush, N., and Yahav, E. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014 (2014), pp. 811–828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pewny, J., Garmany, B., Gawlik, R., Rossow, C., and Holz, T. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015 (2015), pp. 709–724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pewny, J., Schuster, F., Bernhard, L., Holz, T., and Rossow, C. Leveraging semantic signatures for bug search in binary programsIn Proceedings of the 30th Annual Computer Security Applications Conference (2014), ACSAC ’14, ACM, pp. 406–415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ramos, D. A., and Engler, D. R. Practical, low-effort equivalence verification of real codeIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 669–685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rosenblum, N., Miller, B. P., and Zhu, X. Recovering the toolchain provenance of binary codeIn Proceedings of the 2011 International Symposium on Software Testing and Analysis (2011), ISSTA ’11, ACM, pp. 100–110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sæbjørnsen, A., Willcock, J., Panas, T., Quinlan, D. J., and Su, Z. Detecting code clones in binary executables. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, IL, USA, July 19-23, 2009 (2009), pp. 117–128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sharma, R., Schkufza, E., Churchill, B., and Aiken, A. Datadriven equivalence checkingIn Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (2013), OOPSLA ’13, ACM, pp. 391–406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Smith, R., and Horwitz, S. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC) (2009).Google ScholarGoogle Scholar
  34. Swamidass, S. J., Azencott, C., Daily, K., and Baldi, P. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics 26, 10 (2010), 1348– 1356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Statistical similarity of binaries

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!