skip to main content
10.1145/3422337.3447848acmconferencesArticle/Chapter ViewAbstractPublication PagescodaspyConference Proceedingsconference-collections
research-article

SE-PAC: A Self-Evolving PAcker Classifier against rapid packers evolution

Published: 26 April 2021 Publication History

Abstract

Packers are widespread tools used by malware authors to hinder static malware detection and analysis. Identifying the packer used to pack a malware is essential to properly unpack and analyze the malware, be it manually or automatically. While many well-known packers are used, there is a growing trend for new custom packers that make malware analysis and detection harder. Research works have been very effective in identifying known packers or their variants, with signature-based, supervised machine learning or similarity-based techniques. However, identifying new packer classes remains an open problem.
This paper presents a self-evolving packer classifier that provides an effective, incremental, and robust solution to cope with the rapid evolution of packers. We propose a composite pairwise distance metric combining different types of packer features. We derive an incremental clustering approach able to identify both (variants of) known packer classes and new ones, as well as to update clusters automatically and efficiently. Our system thus continuously enhances, integrates, adapts and evolves packer knowledge. Moreover, to optimize post clustering packer processing costs, we introduce a new post clustering strategy for selecting small subsets of relevant samples from the clusters. Our approach effectiveness and time-resilience are assessed with: 1) a real-world malware feed dataset composed of 16k packed binaries, comprising 29 unique packers, and 2) a synthetic dataset composed of 19k manually crafted packed binaries, comprising 31 unique packers (including custom ones).

Supplementary Material

MP4 File (SE-PAC_CODASPY'21.mp4)
Packers evolve as rapidly as malware, constantly bringing new classes or new variants of existing ones. To build an effective malware analysis and detection system, it is thus essential to keep the packer classification system updated. In this presentation we introduce SE-PAC: a new Self-Evolving PAcker Classifier framework that relies on incremental clustering in order to cope with the issue of rapid evolution of packers. We evaluated our solution on two datasets: malware feed, and synthetic. The results showed that our classifier is effective and robust over time in identifying both known and new packer families. Indeed, our approach constantly enhances, integrates, adapts and evolves packer knowledge, making our classifier effective for longer times.

References

[1]
2007. BITDEFENDER ANTIVIRUS TECHNOLOGY. White paper. https://www. bitdefender.com/files/Main/file/BitDefender_Antivirus_Technology.pdf Access: Oct. 2020.
[2]
2016. PE Toy. https://github.com/qy7tt/petoy. Access: Oct. 2020.
[3]
2017. PePacker. https://github.com/SamLarenN/PePacker. Access: Oct. 2020.
[4]
2018. Amber. https://github.com/EgeBalci/Amber. Access: Oct. 2020.
[5]
2018. Simple-PE32-Packer. https://github.com/z3r0d4y5/Simple-PE32-Packer. Access: Oct. 2020.
[6]
2019. theArk. https://github.com/aaaddress1/theArk. Access: Oct. 2020.
[7]
2019. Writing a simple PE Packer in detail. https://github.com/levanvn/Packer_ Simple-1. Access: Oct. 2020.
[8]
2020. Detect-It-Easy. https://github.com/horsicq/Detect-It-Easy. Access: Oct. 2020.
[9]
2020. Origami. https://github.com/dr4k0nia/Origami. Access: Oct. 2020.
[10]
2020. PE-Packer. https://github.com/czs108/PE-Packer. Access: Oct. 2020.
[11]
2020. PeLib. https://github.com/avast-tl/pelib. Access: Oct. 2020.
[12]
2020. Radare2. https://rada.re/n/. Access: Oct. 2020.
[13]
2020. scikit-learn: Machine Learning in Python. https://scikit-learn.org/. Access: Oct. 2020.
[14]
2020. Silent-Packer. https://github.com/SilentVoid13/Silent_Packer. Access: Oct. 2020.
[15]
2020. xorPacker. https://github.com/nqntmqmqmb/xorPacker. Access: Oct. 2020.
[16]
2020. YARA. https://github.com/VirusTotal/yara. Access: Oct. 2020.
[17]
Margareta Ackerman and Sanjoy Dasgupta. 2014. Incremental clustering: The case for extra clusters. In Advances in Neural Information Processing Systems. 307--315.
[18]
Munkhbayar Bat-Erdene, Hyundo Park, Hongzhe Li, Heejo Lee, and Mahn-Soo Choi. 2017. Entropy analysis to classify unknown packing algorithms for malware detection. International Journal of Information Security 16, 3 (2017), 227--248.
[19]
Fabrizio Biondi, Michael A Enescu, Thomas Given-Wilson, Axel Legay, Lamine Noureddine, and Vivek Verma. 2019. Effective, efficient, and robust packing detection and classification. Computers & Security 85 (2019), 436--451.
[20]
Erhan Erkut, Yilmaz Ülküsal, and Oktay Yenicerioğlu. 1994. A comparison of p-dispersion heuristics. Computers & operations research 21, 10 (1994), 1103--1113.
[21]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A densitybased algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226--231.
[22]
John C Gower. 1971. A general coefficient of similarity and some of its properties. Biometrics (1971), 857--871.
[23]
Nguyen Minh Hai, Mizuhito Ogawa, and Quan Thanh Tho. 2017. Packer identification based on metadata signature. In Proceedings of the 7th Software Security, Protection, and Reverse Engineering/Software Security and Protection Workshop. 1--11.
[24]
Kesav Kancherla, John Donahue, and Srinivas Mukkamala. 2016. Packer identification using Byte plot and Markov plot. Journal of Computer Virology and Hacking Techniques 12, 2 (01 May 2016), 101--111. https://doi.org/10.1007/s11416-015-0249-8
[25]
Xufang Li, Peter KK Loh, and Freddy Tan. 2011. Mechanisms of polymorphic and metamorphic viruses. In 2011 European intelligence and security informatics conference. IEEE, 149--154.
[26]
Xingwei Li, Zheng Shan, Fudong Liu, Yihang Chen, and Yifan Hou. 2019. A consistently-executing graph-based approach for malware packer identification. IEEE Access 7 (2019), 51620--51629.
[27]
Lorenzo Martignoni, Mihai Christodorescu, and Somesh Jha. 2007. Omniunpack: Fast, generic, and safe unpacking of malware. In Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007). IEEE, 431--441.
[28]
Steve Morgan. 2019. Global Ransomware Damage Costs Predicted To Reach $20 Billion (USD) By 2021. https://cybersecurityventures.com/global-ransomwaredamage-costs-predicted-to-reach-20-billion-usd-by-2021 Access: Oct. 2020.
[29]
Davoud Moulavi, Pablo A Jaskowiak, Ricardo JGB Campello, Arthur Zimek, and Jörg Sander. 2014. Density-based clustering validation. In Proceedings of the 2014 SIAM international conference on data mining. SIAM, 839--847.
[30]
Balaji Prasad. 2016. Cloak and Dagger: Unpacking Hidden Malware Attacks. https://www.symantec.com/blogs/expert-perspectives/unpackinghidden-malware-attacks Access: Oct. 2020.
[31]
Moustafa Saleh, E Paul Ratazzi, and Shouhuai Xu. 2017. A control flow graphbased signature for packer identification. In MILCOM 2017--2017 IEEE Military Communications Conference (MILCOM). IEEE, 683--688.
[32]
Mike Sconzo. 2015. I am packer and so can you. DEF CON. https://youtu.be/ jCIT7rXX8y0 Access: Oct. 2020.
[33]
Mike Sconzo. 2020. Packerid. https://github.com/sooshie/packerid. Access: Oct. 2020.
[34]
Michael Sikorski and Andrew Honig. 2012. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software (1st ed.). No Starch Press, San Francisco, CA, USA.
[35]
Li Sun, Steven Versteeg, Serdar Boztaş, and Trevor Yann. 2010. Pattern recognition techniques for the classification of malware packers. In Australasian Conference on Information Security and Privacy. Springer, 370--390.
[36]
Chrysostomos Symvoulidis. 2020. An Incremental DBSCAN approach in Python for real-time monitoring data. https://github.com/csymvoul/Incremental_ DBSCAN. Access: Oct. 2020.
[37]
PN Tan, M Steinbach, and V Kumar. 2006. Chapter 8 Cluster analysis: basic concepts and algorithms. Introduction to data mining, 6th edn. Peason Addison Wesley, Boston (2006), 486--568.
[38]
Xabier Ugarte-Pedrero, Davide Balzarotti, Igor Santos, and Pablo G Bringas. 2015. SoK: Deep packer inspection: A longitudinal study of the complexity of run-time packers. In 2015 IEEE Symposium on Security and Privacy. IEEE, 659--673.
[39]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, Oct (2010), 2837--2854.
[40]
Jixin Zhang, Kehuan Zhang, Zheng Qin, Hui Yin, and Qixin Wu. 2018. Sensitive system calls based packed malware variants detection using principal component initialized MultiLayers neural networks. Cybersecurity 1, 1 (2018), 10.

Cited By

View all
  • (2024)Identifying Malware Packers through Multilayer Feature Engineering in Static AnalysisInformation10.3390/info1502010215:2(102)Online publication date: 9-Feb-2024
  • (2023)A survey on run-time packers and mitigation techniquesInternational Journal of Information Security10.1007/s10207-023-00759-y23:2(887-913)Online publication date: 1-Nov-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CODASPY '21: Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy
April 2021
348 pages
ISBN:9781450381437
DOI:10.1145/3422337
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. clustering
  3. features combination
  4. incremental learning
  5. malware obfuscation
  6. novelty detection
  7. packers

Qualifiers

  • Research-article

Conference

CODASPY '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 149 of 789 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Identifying Malware Packers through Multilayer Feature Engineering in Static AnalysisInformation10.3390/info1502010215:2(102)Online publication date: 9-Feb-2024
  • (2023)A survey on run-time packers and mitigation techniquesInternational Journal of Information Security10.1007/s10207-023-00759-y23:2(887-913)Online publication date: 1-Nov-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media