Abstract
Reconstructing the meaning of a program from its binary executable is known as reverse engineering; it has a wide range of applications in software security, exposing piracy, legacy systems, etc. Since reversing is ultimately a search for meaning, there is much interest in inferring a type (a meaning) for the elements of a binary in a consistent way. Unfortunately existing approaches do not guarantee any semantic relevance for their reconstructed types. This paper presents a new and semantically-founded approach that provides strong guarantees for the reconstructed types. Key to our approach is the derivation of a witness program in a high-level language alongside the reconstructed types. This witness has the same semantics as the binary, is type correct by construction, and it induces a (justifiable) type assignment on the binary. Moreover, the approach effectively yields a type-directed decompiler. We formalise and implement the approach for reversing MinX, an abstraction of x86, to MinC, a type-safe dialect of C with recursive datatypes. Our evaluation compiles a range of textbook C algorithms to MinX and then recovers the original structures.
- G. Balakrishnan and T. Reps. Analyzing Memory Accesses in x86 Executables. In CC, LNCS, pages 5–23. Springer, 2004.Google Scholar
- G. Balakrishnan and T. Reps. Divine: Discovering Variables in Executables. In VMCAI, LNCS, pages 1–28. Springer, 2007. Google Scholar
Digital Library
- S. Blazy, V. Laporte, and D. Pichardie. Verified Abstract Interpretation Techniques for Disassembling Low-level Self-modifying Code. In ITP, volume 8558 of LNCS, pages 128–143, 2014.Google Scholar
- E. Chan, S. Venkataraman, N. Tkach, K. Larson, A. Gutierrez, and R. H. Campbell. Characterizing Data Structures for Volatile Forensics. In Systematic Approaches to Digital Forensic Engineering, pages 1–9, 2011. Google Scholar
Digital Library
- A. Cozzie, F. Stratton, H. Xue, and S. T. King. Digging For Data Structures. In USENIX Symposium on Operating Systems Design and Implementation, pages 231–244. USENIX, 2008. Google Scholar
Digital Library
- E. Dolgova and A. Chernov. Automatic Reconstruction of Data types in the Decompilation Problem. Programming and Computer Software, 35(2):105–119, 2009. Google Scholar
Digital Library
- K. Elwazeer, K. Anand, A. Kotha, M. Smithson, and R. Barua. Scalable Variable and Data Type Detection in a Binary Rewriter. In PLDI, pages 51–60, 2013. Google Scholar
Digital Library
- T. Frühwirth. Constraint Handling Rules. CUP, 2009. Google Scholar
Digital Library
- I. Guilfanov. A Simple Type System for Program Reengineering. In WCRE, pages 357–. IEEE Computer Society, 2001. Google Scholar
Digital Library
- J. Jaffar. Efficient Unification over Infinite Terms. New Generation Computing, 2(3):207–219, 1984.Google Scholar
Digital Library
- H. S. Warren Jr. Hacker’s Delight. Addison-Wesley, 2002.Google Scholar
- S. Katsumata and A. Ohori. Proof-Directed De-compilation of Low-Level Code. In ESOP, volume 2028 of LNCS, pages 352–366. Springer, 2001. Google Scholar
Digital Library
- J. Kinder, H. Veith, and F. Zuleger. An Abstract Interpretation-Based Framework for Control Flow Reconstruction from Binaries. In VMCAI, volume 5403 of LNCS, pages 214–228. Springer, 2009. Google Scholar
Digital Library
- R. Kowalski. Algorithm = Logic + Control. CACM, 22(7):424–436, 1979. Google Scholar
Digital Library
- J. Lee, T. Avgerinos, and D. Brumley. TIE: Principled Reverse Engineering of Types in Binary Programs. In NDSS. The Internet Society, 2011.Google Scholar
- X. Leroy. Formal Certification of a Compiler Back-end or: Programming a Compiler with a Proof Assistant. In POPL, pages 42–54, 2006. Google Scholar
Digital Library
- C. M. Li and F. Manyà. MaxSAT, Hard and Soft Constraints. In Handbook of Satisfiability, pages 613–631. IOS Press, 2009.Google Scholar
- Z. Lin, X. Zhang, and D. Xu. Automatic Reverse Engineering of Data Structures from Binary Execution. In NDSS. The Internet Society, 2010.Google Scholar
- R. Milner. A Theory of Type Polymorphism in Programming. Journal of Computer and System Science, 17:348–375, 1978.Google Scholar
Cross Ref
- G. Morrisett and D. Walker. From System F to Typed Assembly Language. TOPLAS, 21(3):527–568, 1999. Google Scholar
Digital Library
- A. Mycroft. Type-Based Decompilation (or Program Reconstruction via Type Reconstruction). In ESOP, volume 1576 of LNCS, pages 208–223. Springer, 1999. Google Scholar
Digital Library
- M. O. Myreen, M. J. C. Gordon, and K. Slind. Machine-Code Verification for Multiple Architectures - An Application of Decompilation into Logic. In FMCAD, pages 1–8, 2008. Google Scholar
Digital Library
- Z. Pavlinovic, T. King, and T. Wies. Finding Minimum Type Error Sources. In OOPSLA, pages 525–542. ACM Press, 2014. Google Scholar
Digital Library
- M. P. Peres Cervantes. Static Methods to Check Low-Level Code for a Graph Reduction Machine. PhD thesis, University of York, 2014. http://etheses.whiterose.ac.uk/id/eprint/6248.Google Scholar
- E. Robbins, J. Howe, and A. King. Theory Propagation and Reification. Science of Computer Programming, 111:3–22, 2015. Google Scholar
Digital Library
- E. Robbins, J. M. Howe, and A. King. Theory Propagation and Rational-Trees. In PPDP, pages 193–204. ACM Press, 2013. Google Scholar
Digital Library
- M. Sutton, A. Greene, and P. Amini. Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley, 2007. Google Scholar
Digital Library
- K. Troshina, Y. Derevenets, and A. Chernov. Reconstruction of composite types for Decompilation. In Working Conference on Source Code Analysis and Manipulation, pages 179–188, 2010. Google Scholar
Digital Library
- M. J. Van Emmerik. Static Single Assignment for Decompilation. PhD thesis, University of Queensland, 2007. http://espace.library. uq.edu.au/view/UQ:158682.Google Scholar
- W. Wang. Ucc, 2014. http://ucc.sourceforge.net/.Google Scholar
- M. A. Weiss. Data Structures and Algorithm Analysis in C. Addison-Wesley, 1996. Google Scholar
Digital Library
Index Terms
From MinX to MinC: semantics-driven decompilation of recursive datatypes
Recommendations
From MinX to MinC: semantics-driven decompilation of recursive datatypes
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesReconstructing the meaning of a program from its binary executable is known as reverse engineering; it has a wide range of applications in software security, exposing piracy, legacy systems, etc. Since reversing is ultimately a search for meaning, ...
Typed compilation of recursive datatypes
Standard ML employs an opaque (or generative) semantics of datatypes, in which every datatype declaration produces a new type that is different from any other type, including other identically defined datatypes. A natural way of accounting for this is ...
Reconstruction of Composite Types for Decompilation
SCAM '10: Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and ManipulationDecompilation is reconstruction of a program in a high-level language from a program in a low-level language. This paper presents a method for automatic reconstruction of composite types (structures, arrays and combinations of them)in a high-level ...






Comments