Abstract
Translating low-level machine instructions into higher-level intermediate language (IL) is one of the central steps in many binary analysis and instrumentation systems. Existing systems build such translators manually. As a result, it takes a great deal of effort to support new architectures. Even for widely deployed architectures, full instruction sets may not be modeled, e.g., mature systems such as Valgrind still lack support for AVX, FMA4 and SSE4.1 for x86 processors. To overcome these difficulties, we propose a novel approach that leverages knowledge about instruction set semantics that is already embedded into modern compilers such as GCC. In particular, we present a learning-based approach for automating the translation of assembly instructions to a compiler's architecture-neutral IL. We present an experimental evaluation that demonstrates the ability of our approach to easily support many architectures (x86, ARM and AVR), including their advanced instruction sets. Our implementation is available as open-source software.
- Bad rounding in cvtsi2ss instruction. https://bugs.kde.org/show_bug.cgi?id=319393.Google Scholar
- Dagger. http://dagger.repzret.org.Google Scholar
- Incorrect decoding of vpbroadcastb,w reg,reg forms. https://bugs.kde.org/show_bug.cgi?id=340725.Google Scholar
- insn_basic test might crash because of setting and not clearing DF flag. https://bugs.kde.org/show_bug.cgi?id=326983.Google Scholar
- Power lxvw4x instruction uses 4 32 byte loads. https://bugs.kde.org/show_bug.cgi?id=339433.Google Scholar
- Martın Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control-flow Integrity Principles, Implementations, and Applications. ACM Trans. Inf. Syst. Secur.Google Scholar
- Kapil Anand, Matthew Smithson, Aparna Kotha, Khaled Elwazeer, and Rajeev Barua. Decompilation to Compiler High IR in a Binary Rewriter. Technical report, Univ of Maryland, 2010.Google Scholar
- ARM. ARM Architecture Reference Manual ARMv7A and ARMV7-R edition. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html, 2014.Google Scholar
- Atmel. Atmel AVR 8-bit Instruction Set. www.atmel.com/images/Atmel-0856-AVR-Instruction-Set-Manual.pdf, 2014.Google Scholar
- Thanassis Avgerinos, Sang Kil Cha, Brent Lim Tze Hao, and David Brumley. AEG: Automatic Exploit Generation. In Network and Distributed System Security Symposium, 2011.Google Scholar
- Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum. CodeSurfer/x86 -- A Platform for Analyzing X86 Executables. In Compiler Construction, 2005.Google Scholar
Digital Library
- Fabrice Bellard. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '05, 2005.Google Scholar
- Derek L. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, Cambridge, MA, USA, 2004.Google Scholar
- David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J. Schwartz. BAP: A Binary Analysis Platform. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV'11, 2011.Google Scholar
Digital Library
- Vitaly Chipounov and George Candea. Dynamically Translating x86 to LLVM using QEMU. Technical Report EPFL-TR-149975, 2010.Google Scholar
- Cristina Cifuentes, Brian Lewis, and David Ung. Walkabout - A Retargetable Dynamic Binary Translation Framework. In Workshop on Binary Translation, 2002.Google Scholar
- Cristina Cifuentes, Mike Van Emmerik, and Norman Ramsey. The Design of a Resourceable and Retargetable Binary Translator. In Reverse Engineering, 1999. Proceedings. Sixth Working Conference on, 1999.Google Scholar
Cross Ref
- Christian S. Collberg. Reverse InterpretationGoogle Scholar
- Mutation Analysis = Automatic Retargeting. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation, PLDI '97, 1997.Google Scholar
- Manuel Costa, Jon Crowcroft, Miguel Castro, Antony Rowstron, Lidong Zhou, Lintao Zhang, and Paul Barham. Vigilante: End-to-end Containment of Internet Worm Epidemics. ACM Trans. Comput. Syst., 26(4), December 2008.Google Scholar
- Jack W. Davidson and Christopher W. Fraser. Code Selection Through Object Code Optimization. ACM Trans. Program. Lang. Syst., 1984.Google Scholar
- Thomas Dullien and Sebastian Porst. REIL: A platform-independent intermediate representation of disassembled code for static code analysis. 2009.Google Scholar
- Manuel Egele, Christopher Kruegel, Engin Kirda, Heng Yin, and Dawn Song. Dynamic Spyware Analysis. In 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, ATC'07, 2007.Google Scholar
- Úlfar Erlingsson, Martın Abadi, Michael Vrable, Mihai Budiu, and George C. Necula. XFI: Software Guards for System Address Spaces. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI '06, 2006.Google Scholar
- LLVM Foundation. The LLVM Compiler Infrastructure Project. http://llvm.org.Google Scholar
- Jonathan Graehl, Kevin Knight, and Jonathan May. Training Tree Transducers. Comput. Linguist., 2008.Google Scholar
Digital Library
- Niranjan Hasabnis, Rui Qiao, and R. Sekar. Checking Correctness of Code Generator Architecture Specifications. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '15, 2015.Google Scholar
Digital Library
- Niranjan Hasabnis and R Sekar. LISC - Learning Instruction Semantics from Code Generator - software release. http://seclab.cs.sunysb.edu/seclab/lisc/.Google Scholar
- Wilson C. Hsieh, Dawson R. Engler, and Godmar Back. Reverse-Engineering Instruction Encodings. In Proceedings of the General Track: 2001 USENIX Annual Technical Conference, 2001.Google Scholar
- Chun-Chen Hsu, Pangfeng Liu, Chien-Min Wang, Jan-Jan Wu, Ding-Yong Hong, Pen-Chung Yew, and Wei-Chung Hsu. LnQ: Building High Performance Dynamic Binary Translators with Existing Compiler Backends. In Parallel Processing (ICPP), 2011.Google Scholar
Digital Library
- Yuan-Shin Hwang, Tzong-Yen Lin, and Rong-Guey Chang. DisIRer: Converting a Retargetable Compiler into a Multiplatform Binary Translator. ACM Trans. Archit. Code Optim., 7, December 2010.Google Scholar
- Johannes Kinder and Helmut Veith. Jakstab: A Static Analysis Platform for Binaries. In Proceedings of the 20th International Conference on Computer Aided Verification, CAV '08, 2008.Google Scholar
- Vladimir Kiriansky, Derek Bruening, and Saman P. Amarasinghe. Secure Execution via Program Shepherding. In USENIX Security Symposium, 2002.Google Scholar
Digital Library
- Julian Kranz, Alexander Sepp, and Axel Simon. GDSL: A Universal Toolkit for Giving Semantics to Machine Language. In Programming Languages and Systems, Lecture Notes in Computer Science. 2013.Google Scholar
Digital Library
- Christopher Kruegel and Thomas Toth. Using Decision Trees to Improve Signature-Based Intrusion Detection. In RAID, 2003.Google Scholar
Cross Ref
- James R. Larus and Eric Schnarr. EEL: Machine-independent Executable Editing. In Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and Implementation, June 1995.Google Scholar
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, 2005.Google Scholar
- Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, 2007.Google Scholar
- James Newsome and Dawn Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In Network and Distributed System Security Symposium (NDSS), 2005.Google Scholar
- J. Oncina, P. García, and E. Vidal. Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks. IEEE Trans. Pattern Anal. Mach. Intell., 1993.Google Scholar
- GNU Project. The GNU Compiler Collection. http://gcc.gnu.org.Google Scholar
- Rui Qiao, Mingwei Zhang, and R. Sekar. A Principled Approach for ROP Defense. In Proceedings of the 31st Annual Computer Security Applications Conference, ACSAC 2015, 2015.Google Scholar
Digital Library
- Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan Zhou, and Youfeng Wu. LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, 2006.Google Scholar
- William C. Rounds. Mappings and grammars on trees. Mathematical systems theory, 4(3), 1970.Google Scholar
- Prateek Saxena, R Sekar, and Varun Puranik. Efficient Fine-grained Binary Instrumentation with Applications to Taint-tracking. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, CGO '08, 2008.Google Scholar
Digital Library
- R. Sekar, I. V. Ramakrishnan, and Andrei Voronkov. Handbook of automated reasoning. chapter Term Indexing, pages 1853--1964. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 2001.Google Scholar
- R. C. Sekar, R. Ramesh, and I. V. Ramakrishnan. Adaptive Pattern Matching. In Proceedings of the 19th International Colloquium on Automata, Languages and Programming, ICALP '92, 1992.Google Scholar
Digital Library
- RC Sekar, R Ramesh, and IV Ramakrishnan. Adaptive Pattern Matching. SIAM Journal on Computing, 24(6):1207--1234, 1995.Google Scholar
Digital Library
- Alexander Sepp, Julian Kranz, and Axel Simon. GDSL: A Generic Decoder Specification Language for Interpreting Machine Language. Electronic Notes in Theoretical Computer Science, 2012. Third Workshop on Tools for Automatic Program Analysis (TAPAS' 2012).Google Scholar
Digital Library
- Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. BitBlaze: A New Approach to Computer Security via Binary Analysis. In Proceedings of the 4th International Conference on Information Systems Security. Keynote invited paper., December 2008.Google Scholar
Digital Library
- A. Tongaonkar and R. Sekar. Condition Factorization: A Technique for Building Fast and Compact Packet Matching Automata. IEEE Transactions on Information Forensics and Security, 2016.Google Scholar
- Alok Tongaonkar, R. Sekar, and Sreenaath Vasudevan. Fast Packet Classification Using Condition Factorization. In Proceedings of the 7th International Conference on Applied Cryptography and Network Security, ACNS '09, 2009.Google Scholar
- P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Cross-Site Scripting Prevention with Dynamic Data Tainting and Static Analysis. In Proceeding of the Network and Distributed System Security Symposium (NDSS), 2007.Google Scholar
- Kenji Yamada and Kevin Knight. A Syntax-based Statistical Translation Model. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL'01, 2001.Google Scholar
- Bennet Yee, David Sehr, Gregory Dardyk, J. Bradley Chen, Robert Muth, Tavis Ormandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar. Native Client: A Sandbox for Portable, Untrusted x86 Native Code. In Security and Privacy, 2009 30th IEEE Symposium on, 2009.Google Scholar
Digital Library
- Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda. Panorama: Capturing System-wide Information Flow for Malware Detection and Analysis. In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS '07, 2007.Google Scholar
- Mingwei Zhang, Rui Qiao, Niranjan Hasabnis, and R. Sekar. A Platform for Secure Static Binary Instrumentation. In ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2014.Google Scholar
Digital Library
- Mingwei Zhang and R. Sekar. Control Flow Integrity for COTS Binaries. In Proceedings of the 22nd USENIX Conference on Security, SEC'13, 2013.Google Scholar
Index Terms
Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers
Recommendations
Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsTranslating low-level machine instructions into higher-level intermediate language (IL) is one of the central steps in many binary analysis and instrumentation systems. Existing systems build such translators manually. As a result, it takes a great deal ...
Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers
ASPLOS'16Translating low-level machine instructions into higher-level intermediate language (IL) is one of the central steps in many binary analysis and instrumentation systems. Existing systems build such translators manually. As a result, it takes a great deal ...







Comments