10.1145/3331076.3331111acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Transfer learning for malware multi-classification

Published:10 June 2019Publication History

ABSTRACT

In this paper, we build on top of the MalConv neural networks learning architecture which was initially designed for malware/benign classification. We evaluate the transfer learning of MalConv for malware multi-class classification by extending its contribution in several directions: (1) We assess MalConv performance on a multi-classification problem using a new dataset composed of solely malware samples belonging to different malware families, (2) we evaluate MalConv on the raw bytes data as well as on the opcodes extracted from the reversed assembly samples and compare the results, (3) we validate the MalConv findings about regularization, and (4) we study MalConv performance when using a medium size dataset and limited computational resources and GPU. The obtained results show that MalConv performs equally well for multi-classification and its performance on raw byte sequences is comparable to opcodes sequences. DeCov regularization is shown to improve the accuracy results better than other regularization techniques.

References

  1. Keras: The python deep learning library. https://keras.io/. Accessed: 2018-07-14.Google ScholarGoogle Scholar
  2. Y. Awad, M. Nassar, and H. Safa. Modeling malware as a language. In 2018 IEEE International Conference on Communications (ICC), pages 1--6. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Barker. Malware detection in executables using neural networks. https://devblogs.nvidia.com/malware-detection-neural-networks/. Accessed: 2018-07-14.Google ScholarGoogle Scholar
  4. M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, and D. Batra. Reducing over-fitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068, 2015.Google ScholarGoogle Scholar
  5. Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier. Language modeling with gated convolutional networks. arXiv preprint arXiv:1612.08083, 2016.Google ScholarGoogle Scholar
  6. C. Eagle. The IDA pro book. No Starch Press, 2011.Google ScholarGoogle Scholar
  7. O. Ferrand. How to detect the cuckoo sandbox and to strengthen it? Journal of Computer Virology and Hacking Techniques, 11(1):51--58, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. Gibert Llauradó. Convolutional neural networks for malware classification. Master's thesis, Universitat Politècnica de Catalunya, 2016.Google ScholarGoogle Scholar
  9. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.Google ScholarGoogle Scholar
  11. J.-Y. Kim, S.-J. Bu, and S.-B. Cho. Malware detection using deep transferred generative adversarial networks. In International Conference on Neural Information Processing, pages 556--564. Springer, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  12. Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.Google ScholarGoogle Scholar
  13. B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert. Deep learning for classification of malware system call sequences. In Australasian Joint Conference on Artificial Intelligence, pages 137--149. Springer, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Li, L. Liu, D. Gao, and M. K. Reiter. On challenges in evaluating malware clustering. In International Workshop on Recent Advances in Intrusion Detection, pages 238--255. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  16. M. Nassar and H. Safa. Throttling malware families in 2d. arXiv preprint arXiv:1901.10590, 2019.Google ScholarGoogle Scholar
  17. L. Nataraj, S. Karthikeyan, G. Jacob, and B. Manjunath. Malware images: visualization and automatic classification. In Proceedings of the 8th international symposium on visualization for cyber security, page 4. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas. Malware detection by eating a whole exe. arXiv preprint arXiv:1710.09435, 2017.Google ScholarGoogle Scholar
  19. E. Rezende, G. Ruppert, T. Carvalho, F. Ramos, and P. DeGeus. Malicious software classification using transfer learning of resnet-50 deep neural network. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1011--1014. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi. Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135, 2018.Google ScholarGoogle Scholar
  21. C. Sandbox. Automated malware analysis. https://cuckoosandbox.org, 2013.Google ScholarGoogle Scholar
  22. E. C. R. Shin, D. Song, and R. Moazzezi. Recognizing functions in binaries with neural networks. In USENIX Security Symposium, pages 611--626, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. T. Siegelmann and E. D. Sontag. On the computational power of neural nets. Journal of computer and system sciences, 50(1):132--150, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929--1958, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Sylvester. Malconv: Lessons learned from deep learning on executables. http://www.jsylvest.com/blog/2017/12/malconv/. Accessed: 2018-07-14.Google ScholarGoogle Scholar

Index Terms

  1. Transfer learning for malware multi-classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!