skip to main content
research-article
Public Access

Adaptive Cyber Defense Against Multi-Stage Attacks Using Learning-Based POMDP

Published:08 November 2020Publication History
Skip Abstract Section

Abstract

Growing multi-stage attacks in computer networks impose significant security risks and necessitate the development of effective defense schemes that are able to autonomously respond to intrusions during vulnerability windows. However, the defender faces several real-world challenges, e.g., unknown likelihoods and unknown impacts of successful exploits. In this article, we leverage reinforcement learning to develop an innovative adaptive cyber defense to maximize the cost-effectiveness subject to the aforementioned challenges. In particular, we use Bayesian attack graphs to model the interactions between the attacker and networks. Then we formulate the defense problem of interest as a partially observable Markov decision process problem where the defender maintains belief states to estimate system states, leverages Thompson sampling to estimate transition probabilities, and utilizes reinforcement learning to choose optimal defense actions using measured utility values. The algorithm performance is verified via numerical simulations based on real-world attacks.

References

  1. K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. 2017. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine 34, 6 (2017), 26--38.Google ScholarGoogle Scholar
  2. Karl J. Âström. 1965. Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications 10, 1 (1965), 174--205.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Becker, P. Kumar, and Ching Zong Wei. 1985. Adaptive control with the stochastic approximation algorithm: Geometry and convergence. IEEE Transactions on Automatic Control 30, 4 (1985), 330--338.Google ScholarGoogle ScholarCross RefCross Ref
  4. Richard E. Bellman and Stuart E. Dreyfus. 1962. Applied Dynamic Programming. Princeton University Press.Google ScholarGoogle Scholar
  5. David Bigelow, Thomas Hobson, Robert Rudd, William Streilein, and Hamed Okhravi. 2015. Timely rerandomization for mitigating memory disclosures. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’15). 268--279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Chen, Z. Hu, J. Xu, M. Zhu, and P. Liu. 2018. Feedback control can make data structure layout randomization more cost-effective under zero-day attacks. Cybersecurity 1 (2018), 3.Google ScholarGoogle ScholarCross RefCross Ref
  7. Andrew Clark, Kun Sun, Linda Bushnell, and Radha Poovendran. 2015. A game-theoretic approach to IP address randomization in decoy-based cyber defense. In Proceedings of the 6th International Conference on Decision and Game Theory for Security (GameSec’15). 3--21.Google ScholarGoogle ScholarCross RefCross Ref
  8. George Cybenko, Sushil Jajodia, Michael P. Wellman, and Peng Liu. 2014. Adversarial and uncertain reasoning for adaptive cyber defense: Building the scientific foundation. In Proceedings of the International Conference on Information Systems Security (ICISS’14). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  9. F. Dai, Y. Hu, K. Zheng, and B. Wu. 2015. Exploring risk flow attack graph for security risk assessment. IET Information Security 9, 6 (2015), 344--353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nir Friedman and Yoram Singer. 1999. Efficient Bayesian parameter estimation in large discrete domains. In Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II (NIPS’98). 417--423.Google ScholarGoogle Scholar
  11. Marcel Frigault, Lingyu Wang, Anoop Singhal, and Sushil Jajodia. 2008. Measuring network security using dynamic Bayesian network. In Proceedings of the 4th ACM Workshop on Quality of Protection (QoP’08). 23--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brian Gorenc and Fritz Sands. 2017. Hacker Machine Interface: The State of SCADA HMI Vulnerabilities. Technical Report. Trend Micro Zero Day Initiative Team.Google ScholarGoogle Scholar
  13. Z. Hu, M. Zhu, P. Chen, and P. Liu. 2019. On convergence rates of game theoretic reinforcement learning algorithms. Automatica 104, 6 (2019), 90--101.Google ScholarGoogle ScholarCross RefCross Ref
  14. Zhisheng Hu, Minghui Zhu, and Peng Liu. 2017. Online algorithms for adaptive cyber defense on Bayesian attack graphs. In Proceedings of the 4th ACM Workshop on Moving Target Defense in Association with the 2017 ACM Conference on Computer and Communications Security (MTD’17). 99--109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jeff Hughes, Lawrence Carin, and George Cybenko. 2008. Cybersecurity strategies: The QuERIES methodology. Computer 41, 8 (2008), 20--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Iannucci, Q. Chen, and S. Abdelwahed. 2016. High-performance intrusion response planning on many-core architectures. In Proceedings of the 2016 25th International Conference on Computer Communication and Networks (ICCCN’16). 1--6.Google ScholarGoogle Scholar
  17. Håvard Johansen, Dag Johansen, and Robbert van Renesse. 2007. FirePatch: Secure and Time-Critical Dissemination of Software Patches. Springer US, 373--384.Google ScholarGoogle Scholar
  18. Per Larsen, Andrei Homescu, Stefan Brunthaler, and Michael Franz. 2014. SoK: Automated software diversity. In Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Lippmann, K. Ingols, C. Scott, K. Piwowarski, K. Kratkiewicz, M. Artz, and R. Cunningham. 2006. Validating and restoring defense in depth using attack graphs. In Proceedings of the 2006 IEEE Military Communications Conference (MILCOM’06). 1--10.Google ScholarGoogle Scholar
  20. Yu Liu and Hong Man. 2005. Network vulnerability assessment using Bayesian networks. In Proceedings of the 2005 Conference on Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security. 61--71.Google ScholarGoogle ScholarCross RefCross Ref
  21. Erik Miehling, Mohammad Rasouli, and Demosthenis Teneketzis. 2015. Optimal defense policies for partially observable spreading processes on Bayesian attack graphs. In Proceedings of the 2nd ACM Workshop on Moving Target Defense (MTD’15). 67--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Miehling, M. Rasouli, and D. Teneketzis. 2018. A POMDP approach to the dynamic defense of large-scale cyber networks. IEEE Transactions on Information Forensics and Security 13, 10 (2018), 2490--2505.Google ScholarGoogle ScholarCross RefCross Ref
  23. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  24. Savita Mohurle and Manisha Patil. 2017. A brief study of wannacry threat: Ransomware attack 2017. International Journal of Advanced Research in Computer Science 8, 5 (2017), 1938--1940.Google ScholarGoogle Scholar
  25. Thanh H. Nguyen, Mason Wright, Michael P. Wellman, and Satinder Baveja. 2017. Multi-stage attack graph security games: Heuristic strategies, with empirical game-theoretic analysis. In Proceedings of the 2017 Workshop on Moving Target Defense (MTD’17). 87--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Ossenbuhl, J. Steinberger, and H. Baier. 2015. Towards automated incident handling: How to select an appropriate response against a network-based attack? In Proceedings of the 2015 9th International Conference on IT Security Incident Management IT Forensics (IMF’15). 51--67.Google ScholarGoogle Scholar
  27. Xinming Ou, Wayne F. Boyer, and Miles A. McQueen. 2006. A scalable approach to attack graph generation. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS’06). 336--345.Google ScholarGoogle Scholar
  28. Yi Ouyang, Mukul Gagrani, Ashutosh Nayyar, and Rahul Jain. 2017. Learning unknown Markov decision processes: A Thompson sampling approach. In Advances in Neural Information Processing Systems 30 (NIPS’17). 1333--1342.Google ScholarGoogle Scholar
  29. N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman. 2018. SoK: Security and privacy in machine learning. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroSP’18). 399--414.Google ScholarGoogle Scholar
  30. Joelle Pineau, Geoff Gordon, and Sebastian Thrun. 2003. Point-based value iteration: An anytime algorithm for POMDPs. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). 1025--1030.Google ScholarGoogle Scholar
  31. Nayot Poolsappasit, Rinku Dewri, and Indrajit Ray. 2012. Dynamic security risk management using Bayesian attack graphs. IEEE Transactions on Dependable and Secure Computing 9, 1 (2012), 61--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pascal Poupart and Craig Boutilier. 2003. Value-directed compression of POMDPs. In Advances in Neural Information Processing Systems (NIPS’02). 1579--1586.Google ScholarGoogle Scholar
  33. Pascal Poupart and Craig Boutilier. 2004. VDCBPI: An approximate scalable algorithm for large POMDPs. In Advances in Neural Information Processing Systems (NIPS’04). 1081--1088.Google ScholarGoogle Scholar
  34. Tom Roeder and Fred B. Schneider. 2010. Proactive obfuscation. ACM Transactions on Computer Systems 28, 2 (2010), Article 4, 54 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, and Ian Osband. 2017. A tutorial on Thompson sampling. arXiv:1707.02038Google ScholarGoogle Scholar
  36. Carlos Sarraute, Olivier Buffet, and Jörg Hoffmann. 2012. POMDPs make better hackers: Accounting for uncertainty in penetration testing. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI’12). 1816--1824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mike Schiffman. 2017. Common Vulnerability Scoring System v3.0: Specification Document. Retrieved September 19, 2020 from https://www.first.org/cvss/v3.0/specification-document.Google ScholarGoogle Scholar
  38. Guy Shani. 2007. Learning and Solving Partially Observable Markov Decision Processes. Ben Gurion University.Google ScholarGoogle Scholar
  39. Guy Shani, Joelle Pineau, and Robert Kaplow. 2013. A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27, 1 (2013), 1--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Edward J. Sondik. 1978. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research 26, 2 (1978), 282--304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Matthijs T. J. Spaan and Nikos Vlassis. 2005. Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 1 (2005), 195--220.Google ScholarGoogle ScholarCross RefCross Ref
  42. Malcolm J. A. Strens. 2000. A Bayesian framework for reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 943--950.Google ScholarGoogle Scholar
  43. Symantec. 2015. Internet Security Threat Report. Retrieved September 19, 2020 from https://library.cyentia.com/report/report_002191.html.Google ScholarGoogle Scholar
  44. W. R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 2 (1933), 285--294.Google ScholarGoogle ScholarCross RefCross Ref
  45. Michel Tokic. 2010. Adaptive -greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence. Lecture Notes in Computer Science, Vol. 6359. Springer, 203--210.Google ScholarGoogle Scholar
  46. Yan Virin, Guy Shani, Solomon Eyal Shimony, and Ronen I. Brafman. 2007. Scaling up: Solving POMDPs through value based clustering. In Proceedings of the National Conference on Artificial Intelligence (AAAI’07), Vol. 22. 1290--1295.Google ScholarGoogle Scholar
  47. Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (1992), 279--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Peng Xie, J. H. Li, Xinming Ou, Peng Liu, and R. Levy. 2010. Using Bayesian networks for cyber security analysis. In Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems Networks (DSN’10). 211--220.Google ScholarGoogle Scholar
  49. Zhi Xin, Huiyu Chen, Hao Han, Bing Mao, and Li Xie. 2010. Misleading malware similarities analysis by automatic data structure obfuscation. In Proceedings of the 13th International Conference on Information Security (ISC’10).Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Lu Yu and Richard R. Brooks. 2013. Applying POMDP to moving target optimization. In Proceedings of the 8th Annual Cyber Security and Information Intelligence Research Workshop (CSIIRW’13). Article 49, 4 pages.Google ScholarGoogle Scholar
  51. Emmanuele Zambon and Damiano Bolzoni. 2006. Network Intrusion Detection Systems: False Positive Reduction Through Anomaly Detection. Retrieved September 19, 2020 from http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Zambon.pdf.Google ScholarGoogle Scholar
  52. Chenfeng Vincent Zhou, Christopher Leckie, and Shanika Karunasekera. 2010. A survey of coordinated attacks and collaborative intrusion detection. Computers 8 Security 29, 1 (2010), 124--140.Google ScholarGoogle Scholar
  53. Minghui Zhu, Zhisheng Hu, and Peng Liu. 2014. Reinforcement learning algorithms for adaptive cyber defense against Heartbleed. In Proceedings of the 1st ACM Workshop on Moving Target Defense (MTD’14). 51--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Minghui Zhu and Sonia Martínez. 2014. On attack-resilient distributed formation control in operator-vehicle networks. SIAM Journal on Control and Optimization 52, 5 (2014), 3176--3202.Google ScholarGoogle ScholarCross RefCross Ref
  55. Quanyan Zhu and Tamer Başar. 2009. Dynamic policy-based IDS configuration. In Proceedings of the 48th IEEE Conference on Decision and Control (CDC’09) Held Jointly with the 2009 28th Chinese Control Conference. 8600--8605.Google ScholarGoogle ScholarCross RefCross Ref
  56. Quanyan Zhu, Hamidou Tembine, and Tamer Başar. 2013. Hybrid learning in stochastic games and its applications in network security. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control 17, 14 (2013), 305--329.Google ScholarGoogle Scholar
  57. Cliff Changchun Zou, Weibo Gong, and Don Towsley. 2002. Code red worm propagation modeling and analysis. In Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS’02). 138--147.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive Cyber Defense Against Multi-Stage Attacks Using Learning-Based POMDP

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Privacy and Security
        ACM Transactions on Privacy and Security  Volume 24, Issue 1
        February 2021
        191 pages
        ISSN:2471-2566
        EISSN:2471-2574
        DOI:10.1145/3426975
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 November 2020
        • Accepted: 1 August 2020
        • Revised: 1 June 2020
        • Received: 1 October 2019
        Published in tops Volume 24, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!