Abstract
Malware detection is a crucial aspect of software security. Current malware detectors work by checking for signatures, which attempt to capture the syntactic characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes current detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter the syntactic properties of the malware byte sequence without significantly affecting their execution behavior.
This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behavior of malware as well as that of the program being checked for infection, and uses abstract interpretation to “hide” irrelevant aspects of these behaviors. As a concrete application of our approach, we show that (1) standard signature matching detection schemes are generally sound but not complete, (2) the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers and (3) the malware detection scheme proposed by Kinder et al. and based on standard model-checking techniques is sound in general and complete on some, but not all, obfuscations handled by the semantics-aware malware detector.
- Adleman, L. M. 1988. An abstract theory of computer viruses. In Proceedings of Advances in Cryptology (CRYPTO'88). Lecture Notes in Computer Science, vol. 403. Springer, Berlin, Germany. Google Scholar
Digital Library
- Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S., and Yang, K. 2001. On the (im)possibility of obfuscating programs. In Proceedings of the Advances in Cryptology (CRYPTO'01). Lecture Notes in Computer Science, vol. 2139. Springer, 1--18. Google Scholar
Digital Library
- Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M. M., Lavoie, Y., and Tawbi, N. 2001. Static detection of malicious code in executable programs. Symposium on Requirements Engineering for Information Security. http://www.sreis.org/old/2001/index.html.Google Scholar
- Briesemeister, L., Porras, P. A., and Tiwari, A. 2005. Model checking of worm quarantine and counter-quarantine under a group defense. Tech. rep. SRI-CSL-05-03, Computer Science Laboratory. SRI International.Google Scholar
- Chess, D. and White, S. 2000. An undetectable computer virus. In Proceedings of the Virus Bulletin Conference (VB2000). Virus Bulletin, Orlando, FL.Google Scholar
- Chow, S., Gu, Y., Johnson, H., and Zakharov, V. 2001. An approach to the obfuscation of control-flow of sequential computer programs. In Proceedings of the 4th International Information Security Conference (ISC'01), G. Davida and Y. Frankel, Eds. Lecture Notes in Computer Science, vol. 2200. Springer, 144--155. Google Scholar
Digital Library
- Christodorescu, M. and Jha, S. 2003. Static analysis of executables to detect malicious patterns. In Proceedings of the 12th USENIX Security Symposium (Security'03). USENIX Association, Berkeley, CA, 169--186. Google Scholar
Digital Library
- Christodorescu, M., Jha, S., and Kruegel, C. 2007. Mining specifications of malicious behavior. In Proceedings of the 6th Joint Meeting European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE'07). Google Scholar
Digital Library
- Christodorescu, M., Jha, S., Seshia, S. A., Song, D., and Bryant, R. E. 2005. Semantics-aware malware detection. In Proceedings of the 2005 IEEE Symposium on Security and Privacy (S&P'05). IEEE Computer Society, Los Alamitos, CA, 32--46. Google Scholar
Digital Library
- Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., and Veith, H. 2005. Malware normalization. Tech. rep. 1539, University of Wisconsin, Madison. WI.Google Scholar
- Clarke Jr. E. M., Grumberg, O., and Peled, D. A. 2001. Model Checking. MIT Press, Cambridge, MA.Google Scholar
- Cohen, F. 1985. Computer viruses. Ph.D. thesis, University of Southern California.Google Scholar
- Cohen, F. 1989. Computational aspects of computer viruses. Comput. Secur. 8, 4, 325. Google Scholar
Digital Library
- Cohen, F. B. 1987. Computer viruses: Theory and experiments. Comput. Secur. 6, 22--35. Google Scholar
Digital Library
- Collberg, C., Thomborson, C., and Low, D. 1997. A taxonomy of obfuscating transformations. Tech. rep. 148, Department of Computer Sciences, University of Auckland.Google Scholar
- Collberg, C., Thomborson, C., and Low, D. 1998. Manufacturing cheap, resilient, and stealthy opaque constructs. In Proceedings of the 25th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'98). ACM Press, 184--196. Google Scholar
Digital Library
- Cousot, P. and Cousot, R. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximation of fixed points. In Proceedings of the 4th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'77). ACM Press, 238--252. Google Scholar
Digital Library
- Cousot, P. and Cousot, R. 1979. Systematic design of program analysis frameworks. In Proceedings of the 6th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'79). ACM Press, 269--282. Google Scholar
Digital Library
- Cousot, P. and Cousot, R. 1992. Abstract interpretation frameworks. J. Logic Comput. 2, 4 (Aug.), 511--547.Google Scholar
Cross Ref
- Cousot, P. and Cousot, R. 2002. Systematic design of program transformation frameworks by abstract interpretation. In Proceedings of the 29th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages (POPL'02). ACM Press, 178--190. Google Scholar
Digital Library
- Dalla Preda, M., Christodorescu, M., Jha, S., and Debray, S. 2007. A semantics-based approach to malware detection. In Proceedings of the 32nd ACM Symp. on Principles of Programming Languages (POPL'07). ACM Press, 377--388. Google Scholar
Digital Library
- Dalla Preda, M. and Giacobazzi, R. 2005. Control code obfuscation by abstract interpretation. In Proceedings of the 3rd IEEE International Conference on Software Engineering and Formal Methods (SEFM'05). IEEE Computer Society, Los Alamitos, CA, USA, 301--310. Google Scholar
Digital Library
- Dalla Preda, M. and Giacobazzi, R. 2005. Semantics-based code obfuscation by abstract interpretation. In Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP'05). Lecture Notes in Computer Science, vol. 3580. Springer, 1325--1336. Google Scholar
Digital Library
- Detristan, T., Ulenspiegel, T., Malcom, Y., and von Underduk, M. S. 2003. Polymorphic shellcode engine using spectrum analysis. Phrack 11, 61 http://www.phrack.org.Google Scholar
- Goldwasser, S. and Kalai, Y. T. 2005. On the impossibility of obfuscation with auxiliary input. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05). IEEE Computer Society, 553--562. Google Scholar
Digital Library
- Gupta, A. and Sekar, R. 2003. An approach for detecting self-propagating email using anomaly detection. In Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection (RAID'03), G. Vigna, E. Jonsson, and C. Kruegel, Eds. Lecture Notes in Computer Science, vol. 2820. Springer, 55--72.Google Scholar
- Intel Corporation. 2001. IA-32 Intel Architecture Software Developer's Manual. Intel Corporation.Google Scholar
- Jordan, M. 2002. Dealing with metamorphism. Virus Bull. 10, 4--6.Google Scholar
- Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H. 2005. Detecting malicious code by model checking. In Proceedings of the 2nd International Conference on Intrusion and Malware Detection and Vulnerability Assessment (DIMVA'05), K. Julisch and C. Krügel, Eds. Lecture Notes in Computer Science, vol. 3548. Springer, 174--187. Google Scholar
Digital Library
- Kolter, J. Z. and Maloof, M. A. 2004. Learning to detect malicious executables in the wild. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). ACM Press, 470--478. Google Scholar
Digital Library
- Lakhotia, A. and Mohammed, M. 2004. Imposing Order on Program Statements to Assist Anti-Virus Scanners. In Proceedings of the 11th Working Conference on Reverse Engineering (WCRE'04). IEEE Computer Society, 161--170. Google Scholar
Digital Library
- Lakhotia, A. and Singh, P. K. 2000. Challenges in getting “formal” with viruses. In Virus Bull.Google Scholar
- Lee, W., Nimbalkar, R. A., Yee, K. K., Patil, S. B., Desai, P. H., Tran, T. T., and Stolfo, S. J. 2000. A data mining and CIDF based approach for detecting novel and distributed intrusions. In Proceedings of the 3rd International Workshop on Recent Advances in Intrusion Detection (RAID 2000). Lecture Notes in Computer Sciences, vol. 1907. Springer, 49--65. Google Scholar
Digital Library
- Lee, W. and Stolfo, S. 1998. Data mining approaches for intrusion detection. In Proceedings of the 7th USENIX Security Symposium. USENIX Association, 79--93. Google Scholar
Digital Library
- Lee, W., Stolfo, S., and Mok, K. W. 1999. A data mining framework for building intrusion detection models. In Proceedings of the IEEE Symposium on Security and Privacy (S & P'99). IEEE Computer Society, Los Alamitos, CA, USA, 120--132.Google Scholar
- Li, W.-J., Wang, K., Stolfo, S. J., and Herzog, B. 2005. Fileprints: Identifying file types by n-gram analysis. In Proceedings of the 6th Annual IEEE Systems, Man, and Cybernetics (SMC) Workshop on Information Assurance (IAW'05). IEEE Computer Society, 64--71.Google Scholar
- Linn, C. and Debray, S. 2003. Obfuscation of executable code to improve resistance to static disassembly. In Proceedings of the 10th ACM Conference on Computer and Communications Security (CCS'03). ACM Press, 290--299. Google Scholar
Digital Library
- Lo, R. W., Levitt, K. N., and Olsson, R. A. 1995. Mcf: A malicious code filter. Comput. Secur. 14, 541--566.Google Scholar
Digital Library
- McHugh, J. 2001. Intrusion and intrusion detection. Int. J. Inform. Secu. 1, 1, 14--35.Google Scholar
Digital Library
- Morley, P. 2001. Processing virus collections. In Proceedings of the Virus Bulletin Conference (VB2'001). Virus Bulletin, 129--134.Google Scholar
- Nachenberg, C. 1997. Computer virus-antivirus coevolution. Comm. ACM 40, 1, 46--51. Google Scholar
Digital Library
- Rajaat. 1999. Polymorphism. 29A Mag. 1, 3, 1--2.Google Scholar
- Singh, P. and Lakhotia, A. 2003. Static verification of worm and virus behaviour in binary executables using model checking. In Proceedings of the 4th IEEE Information Assurance Workshop. IEEE Computer Society, Los Alamitos, CA, USA.Google Scholar
- Symantec Corporation. 2006. Symantec Internet Security Threat Report: Trends for January 06--June 06. Vol. X. Symantec Corporation, Cupertino, CA.Google Scholar
- Ször, P. 2005. The Art of Computer Virus Research and Defense. Addison-Wesley Professional, Boston, MA. Google Scholar
Digital Library
- Ször, P. and Ferrie, P. 2001. Hunting for metamorphic. In Proceedings of the Virus Bulletin Conference (VB2001). Virus Bulletin, 123--144.Google Scholar
- Walenstein, A., Mathur, R., Chouchane, M. R., and, Lakhotia, A 2006. Normalizing Metamorphic Malware Using Term Rewriting. In Proceedings of the 6th International Workshop on Source Code Analysis and Manipulation (SCAM'06). 75--84, IEEE Computer Society Press. Google Scholar
Digital Library
- Wee, H. 2005. On obfuscating point functions. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC'05). ACM Press, 523--532. Google Scholar
Digital Library
- zombie. 2001a. Automated reverse engineering: Mistfall engine. Published online at http://www.madchat.org//vxdevl/papers/vxers/Z0mbie/autorev.txt (last accessed on Sep. 29, 2006).Google Scholar
- zombie. 2001b. Real Permutating{sic} Engine. Published online at http://vx.netlux.org/vx.php?id=er05.Google Scholar
Index Terms
A semantics-based approach to malware detection
Recommendations
A semantics-based approach to malware detection
Proceedings of the 2007 POPL ConferenceMalware detection is a crucial aspect of software security. Current malware detectors work by checking for "signatures," which attempt to capture (syntactic) characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic ...
A semantics-based approach to malware detection
POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesMalware detection is a crucial aspect of software security. Current malware detectors work by checking for "signatures," which attempt to capture (syntactic) characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic ...
Metamorphic malware detection using base malware identification approach
Malware is a malicious program that is intentionally developed to harm computer systems. Because the metamorphic malwares are advanced in nature, they mutate their code in each generation by employing code obfuscation techniques to thwart detection. ...






Comments