skip to main content
research-article
Public Access

Adaptive Cyber Defense Against Multi-Stage Attacks Using Learning-Based POMDP

Published: 08 November 2020 Publication History

Abstract

Growing multi-stage attacks in computer networks impose significant security risks and necessitate the development of effective defense schemes that are able to autonomously respond to intrusions during vulnerability windows. However, the defender faces several real-world challenges, e.g., unknown likelihoods and unknown impacts of successful exploits. In this article, we leverage reinforcement learning to develop an innovative adaptive cyber defense to maximize the cost-effectiveness subject to the aforementioned challenges. In particular, we use Bayesian attack graphs to model the interactions between the attacker and networks. Then we formulate the defense problem of interest as a partially observable Markov decision process problem where the defender maintains belief states to estimate system states, leverages Thompson sampling to estimate transition probabilities, and utilizes reinforcement learning to choose optimal defense actions using measured utility values. The algorithm performance is verified via numerical simulations based on real-world attacks.

References

[1]
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. 2017. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine 34, 6 (2017), 26--38.
[2]
Karl J. Âström. 1965. Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications 10, 1 (1965), 174--205.
[3]
A. Becker, P. Kumar, and Ching Zong Wei. 1985. Adaptive control with the stochastic approximation algorithm: Geometry and convergence. IEEE Transactions on Automatic Control 30, 4 (1985), 330--338.
[4]
Richard E. Bellman and Stuart E. Dreyfus. 1962. Applied Dynamic Programming. Princeton University Press.
[5]
David Bigelow, Thomas Hobson, Robert Rudd, William Streilein, and Hamed Okhravi. 2015. Timely rerandomization for mitigating memory disclosures. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’15). 268--279.
[6]
P. Chen, Z. Hu, J. Xu, M. Zhu, and P. Liu. 2018. Feedback control can make data structure layout randomization more cost-effective under zero-day attacks. Cybersecurity 1 (2018), 3.
[7]
Andrew Clark, Kun Sun, Linda Bushnell, and Radha Poovendran. 2015. A game-theoretic approach to IP address randomization in decoy-based cyber defense. In Proceedings of the 6th International Conference on Decision and Game Theory for Security (GameSec’15). 3--21.
[8]
George Cybenko, Sushil Jajodia, Michael P. Wellman, and Peng Liu. 2014. Adversarial and uncertain reasoning for adaptive cyber defense: Building the scientific foundation. In Proceedings of the International Conference on Information Systems Security (ICISS’14). 1--8.
[9]
F. Dai, Y. Hu, K. Zheng, and B. Wu. 2015. Exploring risk flow attack graph for security risk assessment. IET Information Security 9, 6 (2015), 344--353.
[10]
Nir Friedman and Yoram Singer. 1999. Efficient Bayesian parameter estimation in large discrete domains. In Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II (NIPS’98). 417--423.
[11]
Marcel Frigault, Lingyu Wang, Anoop Singhal, and Sushil Jajodia. 2008. Measuring network security using dynamic Bayesian network. In Proceedings of the 4th ACM Workshop on Quality of Protection (QoP’08). 23--30.
[12]
Brian Gorenc and Fritz Sands. 2017. Hacker Machine Interface: The State of SCADA HMI Vulnerabilities. Technical Report. Trend Micro Zero Day Initiative Team.
[13]
Z. Hu, M. Zhu, P. Chen, and P. Liu. 2019. On convergence rates of game theoretic reinforcement learning algorithms. Automatica 104, 6 (2019), 90--101.
[14]
Zhisheng Hu, Minghui Zhu, and Peng Liu. 2017. Online algorithms for adaptive cyber defense on Bayesian attack graphs. In Proceedings of the 4th ACM Workshop on Moving Target Defense in Association with the 2017 ACM Conference on Computer and Communications Security (MTD’17). 99--109.
[15]
Jeff Hughes, Lawrence Carin, and George Cybenko. 2008. Cybersecurity strategies: The QuERIES methodology. Computer 41, 8 (2008), 20--26.
[16]
S. Iannucci, Q. Chen, and S. Abdelwahed. 2016. High-performance intrusion response planning on many-core architectures. In Proceedings of the 2016 25th International Conference on Computer Communication and Networks (ICCCN’16). 1--6.
[17]
Håvard Johansen, Dag Johansen, and Robbert van Renesse. 2007. FirePatch: Secure and Time-Critical Dissemination of Software Patches. Springer US, 373--384.
[18]
Per Larsen, Andrei Homescu, Stefan Brunthaler, and Michael Franz. 2014. SoK: Automated software diversity. In Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP’14).
[19]
R. Lippmann, K. Ingols, C. Scott, K. Piwowarski, K. Kratkiewicz, M. Artz, and R. Cunningham. 2006. Validating and restoring defense in depth using attack graphs. In Proceedings of the 2006 IEEE Military Communications Conference (MILCOM’06). 1--10.
[20]
Yu Liu and Hong Man. 2005. Network vulnerability assessment using Bayesian networks. In Proceedings of the 2005 Conference on Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security. 61--71.
[21]
Erik Miehling, Mohammad Rasouli, and Demosthenis Teneketzis. 2015. Optimal defense policies for partially observable spreading processes on Bayesian attack graphs. In Proceedings of the 2nd ACM Workshop on Moving Target Defense (MTD’15). 67--76.
[22]
E. Miehling, M. Rasouli, and D. Teneketzis. 2018. A POMDP approach to the dynamic defense of large-scale cyber networks. IEEE Transactions on Information Forensics and Security 13, 10 (2018), 2490--2505.
[23]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
[24]
Savita Mohurle and Manisha Patil. 2017. A brief study of wannacry threat: Ransomware attack 2017. International Journal of Advanced Research in Computer Science 8, 5 (2017), 1938--1940.
[25]
Thanh H. Nguyen, Mason Wright, Michael P. Wellman, and Satinder Baveja. 2017. Multi-stage attack graph security games: Heuristic strategies, with empirical game-theoretic analysis. In Proceedings of the 2017 Workshop on Moving Target Defense (MTD’17). 87--97.
[26]
S. Ossenbuhl, J. Steinberger, and H. Baier. 2015. Towards automated incident handling: How to select an appropriate response against a network-based attack? In Proceedings of the 2015 9th International Conference on IT Security Incident Management IT Forensics (IMF’15). 51--67.
[27]
Xinming Ou, Wayne F. Boyer, and Miles A. McQueen. 2006. A scalable approach to attack graph generation. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS’06). 336--345.
[28]
Yi Ouyang, Mukul Gagrani, Ashutosh Nayyar, and Rahul Jain. 2017. Learning unknown Markov decision processes: A Thompson sampling approach. In Advances in Neural Information Processing Systems 30 (NIPS’17). 1333--1342.
[29]
N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman. 2018. SoK: Security and privacy in machine learning. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroSP’18). 399--414.
[30]
Joelle Pineau, Geoff Gordon, and Sebastian Thrun. 2003. Point-based value iteration: An anytime algorithm for POMDPs. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). 1025--1030.
[31]
Nayot Poolsappasit, Rinku Dewri, and Indrajit Ray. 2012. Dynamic security risk management using Bayesian attack graphs. IEEE Transactions on Dependable and Secure Computing 9, 1 (2012), 61--74.
[32]
Pascal Poupart and Craig Boutilier. 2003. Value-directed compression of POMDPs. In Advances in Neural Information Processing Systems (NIPS’02). 1579--1586.
[33]
Pascal Poupart and Craig Boutilier. 2004. VDCBPI: An approximate scalable algorithm for large POMDPs. In Advances in Neural Information Processing Systems (NIPS’04). 1081--1088.
[34]
Tom Roeder and Fred B. Schneider. 2010. Proactive obfuscation. ACM Transactions on Computer Systems 28, 2 (2010), Article 4, 54 pages.
[35]
Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, and Ian Osband. 2017. A tutorial on Thompson sampling. arXiv:1707.02038
[36]
Carlos Sarraute, Olivier Buffet, and Jörg Hoffmann. 2012. POMDPs make better hackers: Accounting for uncertainty in penetration testing. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI’12). 1816--1824.
[37]
Mike Schiffman. 2017. Common Vulnerability Scoring System v3.0: Specification Document. Retrieved September 19, 2020 from https://www.first.org/cvss/v3.0/specification-document.
[38]
Guy Shani. 2007. Learning and Solving Partially Observable Markov Decision Processes. Ben Gurion University.
[39]
Guy Shani, Joelle Pineau, and Robert Kaplow. 2013. A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27, 1 (2013), 1--51.
[40]
Edward J. Sondik. 1978. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research 26, 2 (1978), 282--304.
[41]
Matthijs T. J. Spaan and Nikos Vlassis. 2005. Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 1 (2005), 195--220.
[42]
Malcolm J. A. Strens. 2000. A Bayesian framework for reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 943--950.
[43]
Symantec. 2015. Internet Security Threat Report. Retrieved September 19, 2020 from https://library.cyentia.com/report/report_002191.html.
[44]
W. R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 2 (1933), 285--294.
[45]
Michel Tokic. 2010. Adaptive -greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence. Lecture Notes in Computer Science, Vol. 6359. Springer, 203--210.
[46]
Yan Virin, Guy Shani, Solomon Eyal Shimony, and Ronen I. Brafman. 2007. Scaling up: Solving POMDPs through value based clustering. In Proceedings of the National Conference on Artificial Intelligence (AAAI’07), Vol. 22. 1290--1295.
[47]
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (1992), 279--292.
[48]
Peng Xie, J. H. Li, Xinming Ou, Peng Liu, and R. Levy. 2010. Using Bayesian networks for cyber security analysis. In Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems Networks (DSN’10). 211--220.
[49]
Zhi Xin, Huiyu Chen, Hao Han, Bing Mao, and Li Xie. 2010. Misleading malware similarities analysis by automatic data structure obfuscation. In Proceedings of the 13th International Conference on Information Security (ISC’10).
[50]
Lu Yu and Richard R. Brooks. 2013. Applying POMDP to moving target optimization. In Proceedings of the 8th Annual Cyber Security and Information Intelligence Research Workshop (CSIIRW’13). Article 49, 4 pages.
[51]
Emmanuele Zambon and Damiano Bolzoni. 2006. Network Intrusion Detection Systems: False Positive Reduction Through Anomaly Detection. Retrieved September 19, 2020 from http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Zambon.pdf.
[52]
Chenfeng Vincent Zhou, Christopher Leckie, and Shanika Karunasekera. 2010. A survey of coordinated attacks and collaborative intrusion detection. Computers 8 Security 29, 1 (2010), 124--140.
[53]
Minghui Zhu, Zhisheng Hu, and Peng Liu. 2014. Reinforcement learning algorithms for adaptive cyber defense against Heartbleed. In Proceedings of the 1st ACM Workshop on Moving Target Defense (MTD’14). 51--58.
[54]
Minghui Zhu and Sonia Martínez. 2014. On attack-resilient distributed formation control in operator-vehicle networks. SIAM Journal on Control and Optimization 52, 5 (2014), 3176--3202.
[55]
Quanyan Zhu and Tamer Başar. 2009. Dynamic policy-based IDS configuration. In Proceedings of the 48th IEEE Conference on Decision and Control (CDC’09) Held Jointly with the 2009 28th Chinese Control Conference. 8600--8605.
[56]
Quanyan Zhu, Hamidou Tembine, and Tamer Başar. 2013. Hybrid learning in stochastic games and its applications in network security. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control 17, 14 (2013), 305--329.
[57]
Cliff Changchun Zou, Weibo Gong, and Don Towsley. 2002. Code red worm propagation modeling and analysis. In Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS’02). 138--147.

Cited By

View all
  • (2024)Optimal Detection for Bayesian Attack Graphs Under Uncertainty in Monitoring and Reimaging2024 American Control Conference (ACC)10.23919/ACC60939.2024.10644873(3927-3934)Online publication date: 10-Jul-2024
  • (2024)Learning Near-Optimal Intrusion Responses Against Dynamic AttackersIEEE Transactions on Network and Service Management10.1109/TNSM.2023.329341321:1(1158-1177)Online publication date: Mar-2024
  • (2024)A Robust and Efficient Risk Assessment Framework for Multi-Step Attacks2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00056(309-314)Online publication date: 15-Mar-2024
  • Show More Cited By

Index Terms

  1. Adaptive Cyber Defense Against Multi-Stage Attacks Using Learning-Based POMDP

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Privacy and Security
      ACM Transactions on Privacy and Security  Volume 24, Issue 1
      February 2021
      191 pages
      ISSN:2471-2566
      EISSN:2471-2574
      DOI:10.1145/3426975
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 November 2020
      Accepted: 01 August 2020
      Revised: 01 June 2020
      Received: 01 October 2019
      Published in TOPS Volume 24, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Reinforcement learning
      2. Thompson sampling
      3. adaptive cyber defense

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)397
      • Downloads (Last 6 weeks)42
      Reflects downloads up to 24 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Optimal Detection for Bayesian Attack Graphs Under Uncertainty in Monitoring and Reimaging2024 American Control Conference (ACC)10.23919/ACC60939.2024.10644873(3927-3934)Online publication date: 10-Jul-2024
      • (2024)Learning Near-Optimal Intrusion Responses Against Dynamic AttackersIEEE Transactions on Network and Service Management10.1109/TNSM.2023.329341321:1(1158-1177)Online publication date: Mar-2024
      • (2024)A Robust and Efficient Risk Assessment Framework for Multi-Step Attacks2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00056(309-314)Online publication date: 15-Mar-2024
      • (2024)A Novel Two Step Computer Network Attack and Defense Strategy2024 International Conference on Inventive Computation Technologies (ICICT)10.1109/ICICT60155.2024.10544975(1360-1367)Online publication date: 24-Apr-2024
      • (2023)ProMD: A Proactive Intrusion Response System for Enterprise Network with Multi-Domain2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00085(402-409)Online publication date: 21-Dec-2023
      • (2023)Towards an Uncertainty-aware Decision Engine for Proactive Self-Protecting Software2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)10.1109/ACSOS-C58168.2023.00027(21-23)Online publication date: 25-Sep-2023
      • (2023)A survey: When moving target defense meets game theoryComputer Science Review10.1016/j.cosrev.2023.10054448(100544)Online publication date: May-2023
      • (2022)Research and Challenges of Reinforcement Learning in Cyber Defense Decision-Making for Intranet SecurityAlgorithms10.3390/a1504013415:4(134)Online publication date: 18-Apr-2022
      • (2022)Intrusion Prevention Through Optimal StoppingIEEE Transactions on Network and Service Management10.1109/TNSM.2022.317678119:3(2333-2348)Online publication date: Sep-2022
      • (2022)An Experimental Platform for Autonomous Intelligent Cyber-Defense Agents: Towards a collaborative community approach (WIPP)2022 Resilience Week (RWS)10.1109/RWS55399.2022.9984037(1-7)Online publication date: 26-Sep-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media