skip to main content
research-article
Open Access

Neural reverse engineering of stripped binaries using augmented control flow graphs

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

We address the problem of reverse engineering of stripped executables, which contain no debug information. This is a challenging problem because of the low amount of syntactic information available in stripped executables, and the diverse assembly code patterns arising from compiler optimizations. We present a novel approach for predicting procedure names in stripped executables. Our approach combines static analysis with neural models. The main idea is to use static analysis to obtain augmented representations of call sites; encode the structure of these call sites using the control-flow graph (CFG) and finally, generate a target name while attending to these call sites. We use our representation to drive graph-based, LSTM-based and Transformer-based architectures. Our evaluation shows that our models produce predictions that are difficult and time consuming for humans, while improving on existing methods by 28% and by 100% over state-of-the-art neural textual models that do not use any static analysis. Code and data for this evaluation are available at https://github.com/tech-srl/Nero.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a presentation video of our talk @ OOPSLA'20. In this paper we address the problem of reverse engineering of stripped executables. This is a challenging problem because of the low amount of syntactic information available, and the diverse assembly code patterns arising from compiler optimizations. We present a novel approach for predicting procedure names in stripped executables. Our approach combines static analysis with neural models. The main idea is to use static analysis to obtain augmented representations of call sites; encode the structure of these call sites using the CFG and finally, generate a target name while attending to these call sites. We use our representation to drive graph-based, LSTM-based and Transformer-based architectures. Our evaluation shows that our models produce predictions that are difficult and time consuming for humans, while improving on existing methods by 28% and by 100% over state-of-the-art neural textual models.

References

  1. Miltiadis Allamanis. 2018. The Adverse Efects of Code Duplication in Machine Learning Models of Code. arXiv preprint arXiv: 1812. 06469 ( 2018 ).Google ScholarGoogle Scholar
  2. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015a. Suggesting Accurate Method and Class Names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015 ). ACM, New York, NY, USA, 38-49. https://doi.org/10.1145/2786805.2786849 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In ICLR.Google ScholarGoogle Scholar
  4. Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. 2091-2100. http://jmlr.org/proceedings/papers/v48/allamanis16.htmlGoogle ScholarGoogle Scholar
  5. Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. 2015b. Bimodal Modelling of Source Code and Natural Language. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37 (ICML'15). JMLR.org, 2123-2132. http://dl.acm.org/citation.cfm?id= 3045118. 3045344Google ScholarGoogle Scholar
  6. Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019a. code2seq: Generating Sequences from Structured Representations of Code. In International Conference on Learning Representations. https://openreview.net/forum?id=H1gKYo09tXGoogle ScholarGoogle Scholar
  7. Uri Alon, Roy Sadaka, Omer Levy, and Eran Yahav. 2019b. Structural Language Models for Any-Code Generation. arXiv preprint arXiv: 1910. 00577 ( 2019 ).Google ScholarGoogle Scholar
  8. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A General Path-based Representation for Predicting Program Properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018 ). ACM, New York, NY, USA, 404-419. https://doi.org/10.1145/3192366.3192412 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019c. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 ( 2019 ), 29 pages. https://doi.org/10.1145/3290353 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473 ( 2014 ). http://arxiv.org/abs/1409.0473Google ScholarGoogle Scholar
  11. Tifany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. 2014. BYTEWEIGHT: Learning to recognize functions in binary code. Proceedings of the 23rd USENIX Security Symposium ( 2014 ), 845-860.Google ScholarGoogle Scholar
  12. Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv: 1809. 05193 ( 2018 ).Google ScholarGoogle Scholar
  13. Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. 2933-2942. http://jmlr.org/proceedings/papers/v48/bielik16.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. 2019. Generative Code Modeling with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=Bke4KsA5FXGoogle ScholarGoogle Scholar
  15. Chung-Cheng Chiu, Tara N Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J Weiss, Kanishka Rao, Ekaterina Gonina, et al. 2018. State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4774-4778.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 ( 2014 ).Google ScholarGoogle Scholar
  17. Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of Binaries Through Re-optimization. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017 ). ACM, New York, NY, USA, 79-94. https://doi.org/10.1145/3062341.3062387 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González. 2018. Path-based Function Embedding and Its Application to Error-handling Specification Mining. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018 ). ACM, New York, NY, USA, 423-433. https://doi.org/10.1145/3236024.3236059 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171-4186.Google ScholarGoogle Scholar
  20. Steven H H Ding, Benjamin C M Fung, and Philippe Charland. 2019. Asm2Vec : Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. S&P ( 2019 ), 5-6.Google ScholarGoogle Scholar
  21. R. Edmonds. 2006. PolyUnpack : Automating the Hidden-Code Extraction of.Google ScholarGoogle Scholar
  22. Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured Neural Summarization. In International Conference on Learning Representations. https://openreview.net/forum?id=H1ersoRqtmGoogle ScholarGoogle Scholar
  23. Martin Fowler and Kent Beck. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Debin: Predicting Debug Information in Stripped Binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS '18). ACM, New York, NY, USA, 1667-1680. https://doi.org/10.1145/3243734.3243866 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997 ), 1735-1780. https://doi.org/10.1162/neco. 1997. 9.8. 1735 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Einar W. Høst and Bjarte M. Østvold. 2009. Debugging Method Names. In Proceedings of the 23rd European Conference on ECOOP 2009-Object-Oriented Programming (Genoa). Springer-Verlag, Berlin, Heidelberg, 294-317. https://doi.org/10. 1007/978-3-642-03013-0_14 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Intel. [n. d.]. Linux64-abi LINUXABI. https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf.Google ScholarGoogle Scholar
  28. Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2018. Mapping Language to Code in Programmatic Context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1643-1652.Google ScholarGoogle ScholarCross RefCross Ref
  29. Emily R. Jacobson, Nathan E. Rosenblum, and Barton P. Miller. 2011. Labeling library functions in stripped binaries. In Proceedings of the 10th SIGPLAN-SIGSOFT workshop on Program analysis for software tools, PASTE'11. 1-8. https: //doi.org/10.1145/2024569.2024571 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Omer Katz, Noam Rinetzky, and Eran Yahav. 2018. Statistical Reconstruction of Class Hierarchies in Binaries. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 363-376. https://doi.org/10.1145/3173162.3173202 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 ( 2014 ).Google ScholarGoogle Scholar
  32. Thomas Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.Google ScholarGoogle Scholar
  33. Jeremy Lacomis, Pengcheng Yin, Edward J Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. 2019. DIRE: A Neural Approach to Decompiled Identifier Naming. arXiv preprint arXiv: 1909. 09029 ( 2019 ).Google ScholarGoogle Scholar
  34. JongHyup Lee, Thanassis Avgerinos, and David Brumley. 2011. TIE: Principled reverse engineering of types in binary programs. ( 2011 ).Google ScholarGoogle Scholar
  35. Cristina V Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages 1, OOPSLA ( 2017 ), 84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-Driven Program Completion. CoRR abs/1705.09042 ( 2017 ). arXiv: 1705.09042 http://arxiv.org/abs/1705.09042Google ScholarGoogle Scholar
  37. Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Efective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015. 1412-1421. http://aclweb.org/anthology/D/D15/D15-1166.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  38. James R Lyle and David Binkley. 1993. Program slicing in the presence of pointers. In Proceedings of the 1993 Software Engineering Research Forum. Citeseer, 255-260.Google ScholarGoogle Scholar
  39. Chris Maddison and Daniel Tarlow. 2014. Structured generative models of natural source code. In International Conference on Machine Learning. 649-657.Google ScholarGoogle Scholar
  40. Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian Sketch Learning for Program Synthesis. CoRR abs/1703.05698 ( 2017 ). arXiv: 1703.05698 http://arxiv.org/abs/1703.05698Google ScholarGoogle Scholar
  41. Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-Architecture Bug Search in Binary Executables. In Proceedings of the 2015 IEEE Symposium on Security and Privacy (SP '15). IEEE Computer Society, Washington, DC, USA, 709-724. https://doi.org/10.1109/SP. 2015.49 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147 (Oct. 2018 ), 25 pages. https://doi.org/10.1145/3276517 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Alec Radford, Jefrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language models are unsupervised multitask learners. ( 2018 ).Google ScholarGoogle Scholar
  44. Veselin Raychev, Pavol Bielik, and Martin Vechev. 2016a. Probabilistic Model for Code with Decision Trees. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016 ). ACM, New York, NY, USA, 731-747. https://doi.org/10.1145/2983990.2984041 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. 2016b. Learning Programs from Noisy Data. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '16). ACM, New York, NY, USA, 761-774. https://doi.org/10.1145/2837614.2837671 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '15). ACM, New York, NY, USA, 111-124. https://doi.org/10.1145/2676726.2677009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, New York, NY, USA, 419-428. https://doi.org/10.1145/2594291.2594321 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. T. Reps, G. Balakrishnan, J. Lim, and T. Teitelbaum. 2005. A Next-generation Platform for Analyzing Executables. In Proceedings of the Third Asian Conference on Programming Languages and Systems (APLAS'05). Springer-Verlag, Berlin, Heidelberg, 212-229. https://doi.org/10.1007/11575467_15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting argument selection defects. Proceedings of the ACM on Programming Languages 1, OOPSLA ( 2017 ), 104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on source code: a neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, [email protected] 2018, Philadelphia, PA, USA, June 18-22, 2018. 31-41. https://doi.org/10.1145/3211346.3211353 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing Functions in Binaries with Neural Networks.. In USENIX Security Symposium. 611-626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Nitish Srivastava, Geofrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research 15, 1 ( 2014 ), 1929-1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104-3112.Google ScholarGoogle Scholar
  54. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998-6008.Google ScholarGoogle Scholar
  55. Daniel Votipka, Seth Rabin, Kristopher Micinski, Jefrey S Foster, and Michelle L Mazurek. 2020. An Observational Investigation of Reverse Engineers' Processes. In 29th USENIX Security Symposium (USENIX Security 20). 1875-1892.Google ScholarGoogle Scholar
  56. Mark Weiser. 1984. Program Slicing. IEEE Transactions on Software Engineering SE-10, 4 (jul 1984 ), 352-357. https: //doi.org/10.1109/TSE. 1984.5010248 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 363-376.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Neural reverse engineering of stripped binaries using augmented control flow graphs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Programming Languages
          Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
          November 2020
          3108 pages
          EISSN:2475-1421
          DOI:10.1145/3436718
          Issue’s Table of Contents

          Copyright © 2020 Owner/Author

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 November 2020
          Published in pacmpl Volume 4, Issue OOPSLA

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!