skip to main content
10.1145/3338503.3357721acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

SATURN - Software Deobfuscation Framework Based On LLVM

Published: 15 November 2019 Publication History

Abstract

The strength of obfuscated software has increased over the recent years. Compiler based obfuscation has become the de facto standard in the industry and recent papers also show that injection of obfuscation techniques is done at the compiler level. In this paper we discuss a generic approach for deobfuscation and recompilation of obfuscated code based on the compiler framework LLVM. We show how binary code can be lifted back into the compiler intermediate language LLVM-IR and explain how we recover the control flow graph of an obfuscated binary function with an iterative control flow graph construction algorithm based on compiler optimizations and satisfiability modulo theories (SMT) solving. Our approach does not make any assumptions about the obfuscated code, but instead uses strong compiler optimizations available in LLVM and Souper Optimizer to simplify away the obfuscation. Our experimental results show that this approach can be effective to weaken or even remove the applied obfuscation techniques like constant unfolding, certain arithmetic-based opaque expressions, dead code insertions, bogus control flow or integer encoding found in public and commercial obfuscators. The recovered LLVM-IR can be further processed by custom deobfuscation passes that are now applied at the same level as the injected obfuscation techniques or recompiled with one of the available LLVM backends. The presented work is implemented in a deobfuscation tool called SATURN.

References

[1]
Remis Balaniuk. 2015. Drill and Join: A Method for Exact Inductive Program Synthesis. In Logic-Based Program Synthesis and Transformation, Maurizio Proietti and Hirohisa Seki (Eds.). Springer International Publishing, Cham, 219--237.
[2]
Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05). USENIX Association, Berkeley, CA, USA, 41--41. http://dl.acm.org/citation.cfm?id=1247360.1247401
[3]
Fabrizio Biondi, Sébastien Josse, Axel Legay, and Thomas Sirvent. 2017. Effectiveness of Synthesis in Concolic Deobfuscation. Computers and Security, Vol. 70 (Sept. 2017), 500--515. https://doi.org/10.1016/j.cose.2017.07.006
[4]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 209--224. http://dl.acm.org/citation.cfm?id=1855741.1855756
[5]
Marek Chalupa. 2016 [cit. 2019-07--12]. Slicing of LLVM Bitcode [online]. Master's thesis. Masaryk University, Faculty of Informatics, Brno. https://theses.cz/id/ok0jh1/
[6]
Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2012. The S2E Platform: Design, Implementation, and Applications. ACM Trans. Comput. Syst., Vol. 30, 1, Article 2 (Feb. 2012), bibinfonumpages49 pages. https://doi.org/10.1145/2110356.2110358
[7]
Christian Collberg, Clark Thomborson, and Douglas Low. 1997. A Taxonomy of Obfuscating Transformations. Technical Report 148. Department of Computer Sciences, The University of Auckland. http://www.cs.auckland.ac.nz/ collberg/Research/Publications/CollbergThomborsonLow97a/index.html
[8]
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst., Vol. 13, 4 (Oct. 1991), 451--490. https://doi.org/10.1145/115372.115320
[9]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS'08/ETAPS'08). Springer-Verlag, Berlin, Heidelberg, 337--340. http://dl.acm.org/citation.cfm?id=1792734.1792766
[10]
Chris Eagle. 2008. The IDA Pro Book: The Unofficial Guide to the World's Most Popular Disassembler. No Starch Press, San Francisco, CA, USA.
[11]
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM -- Software Protection for the Masses. In Proceedings of the IEEE/ACM 1st International Workshop on Software Protection, SPRO'15, Firenze, Italy, May 19th, 2015, Brecht Wyseur (Ed.). IEEE, 3--9. https://doi.org/10.1109/SPRO.2015.10
[12]
Johannes Kinder and Dmitry Kravchenko. 2012. Alternating Control Flow Reconstruction. In Proc. 13th Int. Conf. Verification, Model Checking, and Abstract Interpretation (VMCAI) (LNCS), Vol. 7148. Springer, 267--282.
[13]
Johannes Kinder and Helmut Veith. 2008. Jakstab: A Static Analysis Platform for Binaries. In Proc. 20th Int. Conf. Computer Aided Verification (CAV) (LNCS), Vol. 5123. Springer, 423--427.
[14]
Lukas Korencik. 2019. Decompiling Binaries into LLVM IR Using McSema and Dyninst. Master's thesis. Masaryk University, Faculty of Informatics, Brno. https://is.muni.cz/th/pxe1j/
[15]
J. Kroustek, P. Matula, and P. Zemek. 2017. RetDec: An Open-Source Machine-Code Decompiler. [talk]. Presented at Botconf 2017, Montpellier, FR.
[16]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). IEEE Computer Society, Washington, DC, USA, 75--. http://dl.acm.org/citation.cfm?id=977395.977673
[17]
LLVM Mailing List. 2019. Optimization Problem. http://llvm.1065342.n5.nabble.com/llvm-dev-Optimization-Problem-td127994.html#a127998 Retrieved July 12, 2019 from
[18]
Microsoft. 2018. x64 calling convention. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 Retrieved July 13, 2019 from
[19]
Jiang Ming, Dongpeng Xu, Li Wang, and Dinghao Wu. 2015. LOOP: Logic-Oriented Opaque Predicate Detection in Obfuscated Binary Code. In Proceedings of the 22textsuperscriptnd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). ACM, New York, NY, USA, 757--768. https://doi.org/10.1145/2810103.2813617
[20]
Trail of Bits. 2019 a. McSema - Framework for lifting X86, AMD64, and AARCH64 program binaries to LLVM bitcode. https://github.com/trailofbits/mcsema Retrieved August 24, 2019 from
[21]
Trail of Bits. 2019 b. Remill - Library for lifting of x86, amd64, and aarch64 machine code to LLVM bitcode. https://github.com/trailofbits/remill Retrieved July 10, 2019 from
[22]
Mathilde Ollivier, Sé bastien Bardin, Richard Bonichon, and Jean-Yves Marion. 2019. How to Kill Symbolic Deobfuscation for Free; or Unleashing the Potential of Path-Oriented Protections. CoRR (2019). http://arxiv.org/abs/1908.01549
[23]
Quarkslab. 2019. Epona - Epona is a new compiler that integrates innovative software protection for code integrity, obfuscation, and tamper-proofing. https://epona.quarkslab.com/en/ Retrieved August 27, 2019 from
[24]
Redislabs. 2019. Redis: open source, in-memory data structure store, used as a database, cache and message broker. https://redis.io Retrieved August 22, 2019 from
[25]
John Regehr. 2019. Souper github - Use ConstantRange analysis to help Constant Synthesis. https://github.com/google/souper/commit/20b20ef8c8883513a9cc388b5c01743b70033fb5 Retrieved August 24, 2019 from
[26]
Thomas Reinbacher and Jörg Brauer. 2011. Precise Control Flow Reconstruction Using Boolean Logic. In Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT '11). ACM, New York, NY, USA, 117--126. https://doi.org/10.1145/2038642.2038662
[27]
Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Jubi Taneja, and John Regehr. 2017. Souper: A Synthesizing Superoptimizer. CoRR (2017). http://arxiv.org/abs/1711.04422
[28]
Sebastian Schrittwieser, Stefan Katzenbeisser, Johannes Kinder, Georg Merzdovnik, and Edgar Weippl. 2016. Protecting Software Through Obfuscation: Can It Keep Pace with Progress in Code Analysis? ACM Comput. Surv., Vol. 49, 1, Article 4 (April 2016), bibinfonumpages37 pages. https://doi.org/10.1145/2886012
[29]
Jiri Slaby. 2016. LLVM Slicer - Static slicer based on the Mark Weiser's algorithm. https://github.com/sdasgup3/llvm-slicer Retrieved July 12, 2019 from
[30]
Clark Taylor and Christian Colberg. 2016. A Tool for Teaching Reverse Engineering. In 2016 USENIX Workshop on Advances in Security Education (ASE 16). USENIX Association, Austin, TX. https://www.usenix.org/conference/ase16/workshop-program/presentation/taylor
[31]
Philippe Tillet, H. T. Kung, and David Cox. 2019. Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2019). ACM, New York, NY, USA, 10--19. https://doi.org/10.1145/3315508.3329973
[32]
Ramtine Tofighi-Shirazi, Irina Mu ariuca Asu avoae, Philippe Elbaz-Vincent, and Thanh-Ha Le. 2019. Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis. In 3rd International Workshop on Software PROtection. London, United Kingdom. https://hal.archives-ouvertes.fr/hal-02269192
[33]
Linda Torczon and Keith Cooper. 2011. Engineering A Compiler 2nd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[34]
Julien Vanegue, Sean Heelan, and Rolf Rolles. 2012. SMT Solvers for Software Security. In Proceedings of the 6th USENIX Conference on Offensive Technologies (WOOT'12). USENIX Association, Berkeley, CA, USA, 9--9. http://dl.acm.org/citation.cfm?id=2372399.2372412
[35]
Shuai Wang, Pei Wang, and Dinghao Wu. 2015. Reassembleable Disassembling. In Proceedings of the 24th USENIX Conference on Security Symposium (SEC'15). USENIX Association, Berkeley, CA, USA, 627--642. http://dl.acm.org/citation.cfm?id=2831143.2831183
[36]
S. Bharadwaj Yadavalli and Aaron Smith. 2019. Raising Binaries to LLVM IR with MCTOLL (WIP Paper). In Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2019). ACM, New York, NY, USA, 213--218. https://doi.org/10.1145/3316482.3326354
[37]
B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. 2015. A Generic Approach to Automatic Deobfuscation of Executable Code. In 2015 IEEE Symposium on Security and Privacy. 674--691. https://doi.org/10.1109/SP.2015.47
[38]
Yingzhou Zhang. 2019. SymPas: Symbolic Program Slicing. CoRR (2019). http://arxiv.org/abs/1903.05333

Cited By

View all
  • (2024)A Method to Quantitative Compare Obfuscating TtransformationsСпособ количественного сравнения обфусцирующих преобразованийInformatics and AutomationИнформатика и автоматизация10.15622/ia.23.3.323:3(684-726)Online publication date: 28-May-2024
  • (2024)Two-Level Software Obfuscation with Cooperative Co-Evolutionary Algorithms2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612116(1-8)Online publication date: 30-Jun-2024
  • (2023)On Simplifying Expressions with Mixed Boolean-ArithmeticModeling and Analysis of Information Systems10.18255/1818-1015-2023-2-140-15930:2(140-159)Online publication date: 14-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPRO'19: Proceedings of the 3rd ACM Workshop on Software Protection
November 2019
87 pages
ISBN:9781450368353
DOI:10.1145/3338503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binary recompilation
  2. binary rewriting
  3. code lifting
  4. deobfuscation
  5. llvm
  6. obfuscation
  7. reverse engineering
  8. static software analysis

Qualifiers

  • Research-article

Conference

CCS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 8 of 14 submissions, 57%

Upcoming Conference

CCS '24
ACM SIGSAC Conference on Computer and Communications Security
October 14 - 18, 2024
Salt Lake City , UT , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)5
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Method to Quantitative Compare Obfuscating TtransformationsСпособ количественного сравнения обфусцирующих преобразованийInformatics and AutomationИнформатика и автоматизация10.15622/ia.23.3.323:3(684-726)Online publication date: 28-May-2024
  • (2024)Two-Level Software Obfuscation with Cooperative Co-Evolutionary Algorithms2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612116(1-8)Online publication date: 30-Jun-2024
  • (2023)On Simplifying Expressions with Mixed Boolean-ArithmeticModeling and Analysis of Information Systems10.18255/1818-1015-2023-2-140-15930:2(140-159)Online publication date: 14-Jun-2023
  • (2023)Of Ahead Time: Evaluating Disassembly of Android Apps Compiled to Binary OATs Through the ARTProceedings of the 16th European Workshop on System Security10.1145/3578357.3591219(21-29)Online publication date: 8-May-2023
  • (2023)Simplifying Mixed Boolean-Arithmetic Obfuscation by Program Synthesis and Term RewritingProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623186(2351-2365)Online publication date: 15-Nov-2023
  • (2022)DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph NetworkElectronics10.3390/electronics1119323011:19(3230)Online publication date: 8-Oct-2022
  • (2022)A Survey of Obfuscation and Deobfuscation Techniques in Android Code Protection2022 7th IEEE International Conference on Data Science in Cyberspace (DSC)10.1109/DSC55868.2022.00013(40-47)Online publication date: Jul-2022
  • (2021)Dynamic Taint Analysis versus Obfuscated Self-CheckingProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485926(182-193)Online publication date: 6-Dec-2021
  • (2021)SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask2021 IEEE Symposium on Security and Privacy (SP)10.1109/SP40001.2021.00012(833-851)Online publication date: May-2021
  • (2021)Profiling HPC Applications with Low Overhead and High Accuracy2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00180(1311-1319)Online publication date: Sep-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media