skip to main content
10.1145/2799979.2800015acmotherconferencesArticle/Chapter ViewAbstractPublication PagessinConference Proceedingsconference-collections
research-article

Information theoretic method for classification of packed and encoded files

Published: 08 September 2015 Publication History

Abstract

Malware authors make use of some anti-reverse engineering and obfuscation techniques like packing and encoding in-order to conceal their malicious payload. These techniques succeeded in evading the traditional signature based AV scanners. Packed or encoded malware samples are difficult to be analysed directly by the AV scanners. So, such samples must be initially unpacked or decoded for efficient analysis of the malicious code. This paper illustrates a static information theoretic method for the classification of packed and encoded files. The proposed method extracts fragments of fixed size from the files and calculates the entropy scores of the fragments. These entropy scores are then used for computing the Similarity Distance Matrix for fragments in a file-pair. The proposed system classifies all the encoded and packed samples properly, thereby obtaining improved detection. The proposed system is also capable of differentiating the type of packers used for the packing or encoding process.

References

[1]
Hack gmail, myspace, facebook accounts using rinlogger. http://www.101hacker.com/2011/08/hack-gmailmyspacefacebook-accounts.html.
[2]
Hxd - freeware hex editor and disk editor. http://mh-nexus.de/en/hxd/.
[3]
Ultimate guide to setup darkcomet with noip. http://www.slideshare.net/pichpratna/ultimate-guide-to-setup-darkcomet-with-noip.
[4]
Virustotal. https://www.virustotal.com/.
[5]
Weka, open source machine learning software. http://www.cs.waikato.ac.nz/ml/weka/.
[6]
http://www.download25.com/fsg-download.html, 2005.
[7]
B. Arkin, S. Stender, and G. McGraw. Software penetration testing. IEEE Security & Privacy, 3(1): 84--87, 2005.
[8]
C. A. Benninger. Maitland: analysis of packed and encrypted malware via paravirtualization extensions. PhD thesis, University of Victoria, 2012.
[9]
D. Bilar. Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 1(2): 156--168, 2007.
[10]
L. Breiman. Random forests. Machine learning, 45(1): 5--32, 2001.
[11]
T. Brosch and M. Morgenstern. Runtime packers: The hidden problem. Black Hat USA, 2006.
[12]
K. Coogan, S. K. Debray, T. Kaochar, and G. M. Townsend. Automatic static unpacking of malware binaries. In A. Zaidman, G. Antoniol, and S. Ducasse, editors, WCRE, pages 167--176. IEEE Computer Society, 2009.
[13]
D. Devi and S. Nandi. PE File Features in Detection of Packed Executables. Entropy, 1(2):3, 2012.
[14]
A. Dinaburg, P. Royal, M. I. Sharif, and W. Lee. Ether: malware analysis via hardware virtualization extensions. In P. Ning, P. F. Syverson, and S. Jha, editors, ACM Conference on Computer and Communications Security, pages 51--62. ACM, 2008.
[15]
T. Fujii, K. Yoshioka, J. Shikata, and T. Matsumoto. An efficient dynamic detection method for various x86 shellcodes. In SAINT, pages 284--289. IEEE, 2012.
[16]
B. Gu, X. Bai, Z. Yang, A. C. Champion, and D. Xuan. Malicious shellcode detection with virtual memory snapshots. In INFOCOM, pages 974--982. IEEE, 2010.
[17]
F. Guo, P. Ferrie, and T. cker Chiueh. A study of the packer problem and its solutions. In R. Lippmann, E. Kirda, and A. Trachtenberg, editors, RAID, volume 5230 of Lecture Notes in Computer Science, pages 98--115. Springer, 2008.
[18]
G. Jacob, P. M. Comparetti, M. Neugschwandtner, C. Kruegel, and G. Vigna. A static, packer-agnostic filter to detect similar malware samples. In Detection of intrusions and Malware, and vulnerability assessment, pages 102--122. Springer, 2013.
[19]
G. Jeong, E. Choo, J. Lee, M. Bat-Erdene, and H. Lee. Generic unpacking using entropy analysis. In MALWARE, pages 98--105. IEEE, 2010.
[20]
M. Kandias and D. Gritzalis. Metasploit the penetration tester's guide. Computers & Security, 32: 268--269, 2013.
[21]
M. G. Kang, P. Poosankam, and H. Yin. Renovo: A hidden code extractor for packed executables. In Proceedings of the 2007 ACM workshop on Recurring malcode, pages 46--53. ACM, 2007.
[22]
Kaspersky. Virus.win32.sality.bh.http://www.securelist.com/en/descriptions/15312802/irus.Win32.Sality.bh#doc1, 2011.
[23]
R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, pages 1137--1145, 1995.
[24]
J. Larimer. An inside look at stuxnet. http://blogs.iss.net/archive/papers/ibm-xforce-an-inside-look-at-stuxnet.pdf, 2009.
[25]
R. Lyda and J. Hamrock. Using entropy analysis to find encrypted and packed malware. IEEE Security & Privacy, 5(2): 40--45, 2007.
[26]
L. M. M. Oberhumer and J. Reiser. Upx: the ultimate packer for executables. http://upx.sourceforge.net, 2007.
[27]
L. Martignoni, M. Christodorescu, and S. Jha. Omniunpack: Fast, generic, and safe unpacking of malware. In ACSAC07, 2007.
[28]
R. Perdisci, A. Lanzi, and W. Lee. Classification of packed executables for accurate computer virus detection. Pattern Recognition Letters, 29(14): 1941--1946, 2008.
[29]
J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell., 28(10): 1619--1630, 2006.
[30]
R. E. Schapire. A brief introduction to boosting. In IJCAI, pages 1401--1406, 1999.
[31]
A. Software. Aspack. http://www.aspack.com.
[32]
Y. Song, M. E. Locasto, A. Stavrou, A. D. Keromytis, and S. J. Stolfo. On the infeasibility of modeling polymorphic shellcode. Machine learning, 81(2): 179--205, 2010.
[33]
I. Sorokin. Comparing files using structural entropy. Journal in Computer Virology, 7(4): 259--265, 2011.
[34]
K. Timm. Malware validation techniques. http://blogs.cisco.com/security/malware_validation_techniques, 2010.
[35]
X. Ugarte-Pedrero, I. Santos, B. Sanz, C. Laorden, and P. G. Bringas. Countering entropy measure attacks on packed software detection. In CCNC, pages 164--168. IEEE, 2012.
[36]
N. Verma, V. Mishra, and V. Singh. Detection of alphanumeric shellcodes using similarity index. In Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on, pages 1573--1577. IEEE, 2014.
[37]
C. Wressnegger, F. Boldewin, and K. Rieck. Deobfuscating embedded malware using probable-plaintext attacks. In Research in Attacks, Intrusions, and Defenses, pages 164--183. Springer, 2013.
[38]
S. Yu, S. Zhou, L. Liu, R. Yang, and J. Luo. Malware variants identification based on byte frequency. In Networks Security Wireless Communications and Trusted Computing (NSWCTC), 2010 Second International Conference on, volume 2, pages 32--35. IEEE, 2010.
[39]
Z. Zhao and G.-J. Ahn. Using instruction sequence abstraction for shellcode detection and attribution. In CNS, pages 323--331. IEEE, 2013.

Cited By

View all
  • (2024)A Machine Learning-Based PE Header Analysis for Malware DetectionInternational Journal of Innovative Science and Research Technology (IJISRT)10.38124/ijisrt/IJISRT24MAR615(1671-1676)Online publication date: 1-Apr-2024
  • (2023)Algebraic Structures Induced by the Insertion and Detection of MalwareComputation10.3390/computation1107014011:7(140)Online publication date: 11-Jul-2023
  • (2022)Measurement of Malware Family Classification on a Large-Scale Real-World Dataset2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom56396.2022.00196(1390-1397)Online publication date: Dec-2022
  • Show More Cited By

Index Terms

  1. Information theoretic method for classification of packed and encoded files

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SIN '15: Proceedings of the 8th International Conference on Security of Information and Networks
      September 2015
      350 pages
      ISBN:9781450334532
      DOI:10.1145/2799979
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 September 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. classification
      2. encoders
      3. entropy
      4. malware
      5. obfuscation
      6. packers

      Qualifiers

      • Research-article

      Conference

      SIN '15

      Acceptance Rates

      SIN '15 Paper Acceptance Rate 34 of 92 submissions, 37%;
      Overall Acceptance Rate 102 of 289 submissions, 35%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 23 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Machine Learning-Based PE Header Analysis for Malware DetectionInternational Journal of Innovative Science and Research Technology (IJISRT)10.38124/ijisrt/IJISRT24MAR615(1671-1676)Online publication date: 1-Apr-2024
      • (2023)Algebraic Structures Induced by the Insertion and Detection of MalwareComputation10.3390/computation1107014011:7(140)Online publication date: 11-Jul-2023
      • (2022)Measurement of Malware Family Classification on a Large-Scale Real-World Dataset2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom56396.2022.00196(1390-1397)Online publication date: Dec-2022
      • (2022)Towards A Framework for Preprocessing Analysis of Adversarial Windows Malware2022 10th International Symposium on Digital Forensics and Security (ISDFS)10.1109/ISDFS55398.2022.9800812(1-6)Online publication date: 6-Jun-2022
      • (2022)Packer classification based on association rule miningApplied Soft Computing10.1016/j.asoc.2022.109373127:COnline publication date: 1-Sep-2022
      • (2019)Effective and Light-Weight Deobfuscation and Semantic-Aware Attack Detection for PowerShell ScriptsProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security10.1145/3319535.3363187(1831-1847)Online publication date: 6-Nov-2019
      • (2018)Packer identification method based on byte sequencesConcurrency and Computation: Practice and Experience10.1002/cpe.508232:8Online publication date: 18-Nov-2018

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media