skip to main content
article

Similarity of binaries through re-optimization

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code.

We present a technique that is scalable and precise, as it alleviates the need for heavyweight semantic comparison by performing out-of-context re-optimization of procedure fragments. It works by decomposing binary procedures to comparable fragments and transforming them to a canonical, normalized form using the compiler optimizer, which enables finding equivalent fragments through simple syntactic comparison. We use a statistical framework built by analyzing samples collected "in the wild" to generate a global context that quantifies the significance of each pair of fragments, and uses it to lift pairwise fragment equivalence to whole procedure similarity.

We have implemented our technique in a tool called GitZ and performed an extensive evaluation. We show that GitZ is able to perform millions of comparisons efficiently, and find similarity with high accuracy.

References

  1. Esh - statistical similarity of binaries. http://binsim.com.Google ScholarGoogle Scholar
  2. gcc optimizations options. https://gcc.gnu.org/onlinedocs/ gcc/Optimize-Options.html.Google ScholarGoogle Scholar
  3. Llvm’s analysis and transform passes. http://llvm.org/ docs/Passes.html.Google ScholarGoogle Scholar
  4. Mcsema. https://github.com/trailofbits/mcsema.Google ScholarGoogle Scholar
  5. Shellshock vulnerability cve information. https://cve. mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-6271.Google ScholarGoogle Scholar
  6. Yard - yet another roc drawer. http://github.com/ ntamas/yard.Google ScholarGoogle Scholar
  7. zynamics bindi ff. http://www.zynamics.com/bindiff. html.Google ScholarGoogle Scholar
  8. zynamics bindi ff manual - understanding bindiff. http: //www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.Google ScholarGoogle Scholar
  9. D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. Bap: A binary analysis platform. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 463–469, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. David, N. Partush, and E. Yahav. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 349–360, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Duan and J. Regehr. Correctness proofs for device drivers in embedded systems. In 5th International Workshop on Systems Software Verification, SSV’10, Vancouver, BC, Canada, October 6-7, 2010, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014., pages 303–317, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: E fficient cross-architecture identification of bugs in binary code. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  15. Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24- 28, 2016, pages 480–491, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Hawblitzel, S. K. Lahiri, K. Pawar, H. Hashmi, S. Gokbulut, L. Fernando, D. Detlefs, and S. Wadsworth. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 191–201, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. R. Jacobson, N. Rosenblum, and B. P. Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools, PASTE ’11, pages 1–8, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Jang, D. Brumley, and S. Venkataraman. BitShred : Feature Hashing Malware for Scalable Triage and Semantic Analysis. Proceedings of the 18th ACM Conference on Computer and Communications Security, pages 309–320, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 329–338, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In CAV, pages 712–717, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. R. M. Leino. This is boogie 2. http://microsoft.com/ en-us/research/publication/this-is-boogie-2-2/.Google ScholarGoogle Scholar
  22. N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89–100, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. H. Ng and A. Prakash. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual, pages 492–501, July 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Partush and E. Yahav. Abstract semantic di fferencing for numerical programs. In Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings, pages 238–258. Springer, 2013.Google ScholarGoogle Scholar
  25. N. Partush and E. Yahav. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 811–828, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Person, M. B. Dwyer, S. G. Elbaum, and C. S. Pasareanu. Di fferential symbolic execution. In SIGSOFT FSE, pages 226–237, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 709–724, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC ’14, pages 406–415, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. A. Ramos and D. R. Engler. Practical, low-e ffort equivalence verification of real code. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 669–685, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the Toolchain Provenance of Binary Code Categories and Subject Descriptors. 20th International Symposium on Software Testing and Analysis (ISSTA), page 11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Sharma, E. Schkufza, B. Churchill, and A. Aiken. Datadriven equivalence checking. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’13, pages 391–406, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna. Sok: (state of) the art of war: O ffensive techniques in binary analysis. 2016.Google ScholarGoogle Scholar
  33. R. Smith and S. Horwitz. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC), 2009.Google ScholarGoogle Scholar
  34. S. J. Swamidass, C. Azencott, K. Daily, and P. Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics, 26(10):1348–1356, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981., pages 439–449, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Similarity of binaries through re-optimization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 52, Issue 6
          PLDI '17
          June 2017
          708 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3140587
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2017
            708 pages
            ISBN:9781450349888
            DOI:10.1145/3062341

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2017

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!