Abstract
The increasing use of Machine Learning (ML) components embedded in autonomous systems—so-called Learning-Enabled Systems (LESs)—has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LESs pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LESs with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose solutions to practical use. Probabilistic safety argument templates at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also scope our methods with case studies on simulated Autonomous Underwater Vehicles and physical Unmanned Ground Vehicles.
- [1] . 2018. Considerations in Assuring Safety of Increasingly Autonomous Systems.
Technical Report NASA/CR-2018-220080. NASA. 172 pages.Google Scholar - [2] . 2020. Universal adversarial perturbations in epileptic seizure detection. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–6.Google Scholar
Cross Ref
- [3] . 2020. Dynamic assurance cases: A pathway to trusted autonomy. Computer 53, 12 (2020), 35–46. Google Scholar
Digital Library
- [4] . 2020. Quantifying assurance in learning-enabled systems. In Computer Safety, Reliability, and Security(
LNCS , Vol. 12234), , , , and (Eds.). Springer, Cham, 270–286. Google ScholarDigital Library
- [5] . 2021. Assuring the machine learning lifecycle: Desiderata, methods, and challenges. ACM Computing Surveys (CSUR) 54, 5 (2021), 1–39.Google Scholar
Digital Library
- [6] . 2019. Space and time efficient kernel density estimation in high dimensions. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 15773–15782.Google Scholar
- [7] . 2021. Recent advances in adversarial training for adversarial robustness. In Proc. of the 30th Int. Joint Conf. on Artificial Intelligence (IJCAI’21). 4312–4321. Google Scholar
Cross Ref
- [8] . 2021. Distribution awareness for AI system testing. In 43rd IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2021, Madrid, Spain, May 25–28, 2021. IEEE, 96–98.Google Scholar
Digital Library
- [9] . 2012. Random search for hyper-parameter optimization. J. of Machine Learning Research 13, 2 (2012), 281–305.Google Scholar
Digital Library
- [10] . 2017. Adaptive coverage and operational profile-based testing for reliability improvement. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, Buenos Aires, Argentina, 541–551. Google Scholar
Digital Library
- [11] . 2021. Adaptive test case allocation, selection and generation using coverage spectrum and operational profile. IEEE Transactions on Software Engineering 47, 5 (2021), 881–898.Google Scholar
Cross Ref
- [12] . 1993. Data Reduction and Error Analysis for the Physical Sciences. Vol. 7. American Institute of Physics.Google Scholar
Cross Ref
- [13] . 2000. A methodology for safety case development. Safety and Reliability 20, 1 (2000), 34–42.Google Scholar
Cross Ref
- [14] . 2011. Toward a formalism for conservative claims about the dependability of software-based systems. IEEE Tran. on Software Engineering 37, 5 (2011), 708–717.Google Scholar
Digital Library
- [15] . 2017. Deriving a frequentist conservative confidence bound for probability of failure per demand for systems with different operational and test profiles. Reliability Engineering & System Safety 158 (2017), 246–253.Google Scholar
Cross Ref
- [16] . 2010. Safety and assurance cases: Past, present and possible future – an Adelard perspective. In Making Systems Safer, and (Eds.). Springer London, London, 51–67.Google Scholar
Cross Ref
- [17] . 2021. Safety case templates for autonomous systems. arXiv preprint arXiv:2102.02625 (2021).Google Scholar
- [18] . 2019. Disruptive innovations and disruptive assurance: Assuring machine learning and autonomy. Computer 52, 9 (2019), 82–89.Google Scholar
Cross Ref
- [19] . 2014. Building blocks for assurance cases. In IEEE International Symposium on Software Reliability Engineering Workshops. IEEE, Naples, Italy, 186–191. Google Scholar
Digital Library
- [20] . 2020. Assurance 2.0: A manifesto. arXiv preprint arXiv:2004.10474 (2020).Google Scholar
- [21] . 2020. Mind the gaps: Assuring the safety of autonomous systems from an engineering, ethical, and legal perspective. Artificial Intelligence 279 (2020), 103201. Google Scholar
Digital Library
- [22] . 2018. Engineering trustworthy self-adaptive software with dynamic assurance cases. IEEE Tran. on Software Engineering 44, 11 (2018), 1039–1069.Google Scholar
Cross Ref
- [23] . 2017. A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology 1, 1 (2017), 161–187.Google Scholar
Cross Ref
- [24] . 2016. RELAI testing: A technique to assess and improve software reliability. IEEE Transactions on Software Engineering 42, 5 (2016), 452–475. Google Scholar
Digital Library
- [25] . 2015. HAZOP: Guide to Best Practice. Elsevier.Google Scholar
- [26] . 2021. Distribution-aware testing of neural networks using generative models. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22–30 May 2021. IEEE, 226–237.Google Scholar
- [27] . 2018. Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning 107, 3 (2018), 481–508.Google Scholar
Digital Library
- [28] . 1998. Evaluating testing methods by delivered reliability [software]. IEEE Tran. on Softw. Eng. 24, 8 (1998), 586–601.Google Scholar
Digital Library
- [29] . 2018. AI2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 3–18.Google Scholar
- [30] . 2020. Reliability evaluation of ML systems, the oracle problem. In Int. Symp. on Software Reliability Engineering Workshops (ISSREW). IEEE, Coimbra, Portugal, 127–130. Google Scholar
Cross Ref
- [31] . 2021. Operation is the hardest teacher: Estimating DNN accuracy looking for mispredictions. In IEEE/ACM 43rd Int. Conf. on Software Engineering (ICSE’21). Madrid, Spain, 348–358. Google Scholar
Digital Library
- [32] . 2015. An extended HAZOP analysis approach with dynamic fault tree. Journal of Loss Prevention in the Process Industries 38 (2015), 224–232. Google Scholar
Cross Ref
- [33] . 1990. Partition testing does not inspire confidence. IEEE Tran. on Software Engineering 16, 12 (1990), 1402–1411.Google Scholar
Digital Library
- [34] . 2020. Is neuron coverage a meaningful measure for testing deep neural networks?. In Proc. of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 851–862.Google Scholar
Digital Library
- [35] . 2020. Testing an underwater robot executing transect missions in Mayotte. In Towards Autonomous Robotic Systems(
LNCS , Vol. 12228), , , and (Eds.). Springer, Cham, 116–127.Google Scholar - [36] . 2021. Coverage guided testing for recurrent neural networks. IEEE Tran. on Reliability (2021).
early access .Google ScholarCross Ref
- [37] . 2020. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review 37 (2020), 100270.Google Scholar
Cross Ref
- [38] . 2017. Safety verification of deep neural networks. In Computer Aided Verification(
LNCS , Vol. 10426). Springer International Publishing, Cham, 3–29.Google Scholar - [39] . 2018. Continuous argument engineering: Tackling uncertainty in machine learning based systems. In SafeComp’18(
LNCS , Vol. 11094), , , , and (Eds.). Springer, Cham, 14–21.Google Scholar - [40] . 2021. Towards dynamic safety assurance for Industry 4.0. Journal of Systems Architecture 114 (2021), 101914. Google Scholar
Digital Library
- [41] 2018. The increasing risks of risk assessment: On the rise of artificial intelligence and non-determinism in safety-critical systems. In the 26th Safety-Critical Systems Symposium. Safety-Critical Systems Club, York, UK., 15.Google Scholar
- [42] . 2016. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice 94 (2016), 182–193.Google Scholar
Cross Ref
- [43] . 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. In CAV’17(
LNCS , Vol. 10426). Springer, Cham, 97–117.Google Scholar - [44] . 1999. Arguing Safety: A Systematic Approach to Managing Safety Cases. PhD Thesis. University of York.Google Scholar
- [45] . 2021. Using complementary risk acceptance criteria to structure assurance cases for safety-critical AI components. In AISafety’21 Workshop at IJCAI’21.Google Scholar
- [46] . 2015. The importance of security cases: Proof is good, but not enough. IEEE Security Privacy 13, 4 (2015), 73–75. Google Scholar
Digital Library
- [47] . 2019. Credible autonomy safety argumentation. In 27th Safety-Critical Systems Symp.Safety-Critical Systems Club, Bristol, UK.Google Scholar
- [48] . 2018. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security. Chapman and Hall/CRC, 99–112.Google Scholar
- [49] . 2016. New Foresight Review on Robotics and Autonomous Systems.
Technical Report No. 2016.1. Lloyd’s Register Foundation, London, U.K.65 pages.Google Scholar - [50] . 2022. Establishment of CORONET, COVID-19 risk in oncology evaluation tool, to identify patients with cancer at low versus high risk of severe complications of COVID-19 disease on presentation to hospital. JCO Clinical Cancer Informatics 6 (2022), e2100177.Google Scholar
Cross Ref
- [51] . 1985. Fault tree analysis, methods, and applications - a review. IEEE Tran. on Reliability R-34, 3 (1985), 194–203. Google Scholar
Cross Ref
- [52] . 2019. Boosting operational DNN testing efficiency through conditioning. In Proc. of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 499–509. Google Scholar
Digital Library
- [53] . 2012. Reasoning about the reliability of diverse two-channel systems in which one channel is “possibly perfect”. IEEE Tran. on Software Engineering 38, 5 (2012), 1178–1194.Google Scholar
Digital Library
- [54] . 1993. Validation of ultra-high dependability for software-based systems. Commun. ACM 36, 11 (1993), 69–80.Google Scholar
Digital Library
- [55] . 2000. Software reliability and dependability: A roadmap. In Proc. of the Conference on The Future of Software Engineering (ICSE’00). ACM, New York, NY, USA, 175–188. Google Scholar
Digital Library
- [56] . 2019. How safe is safe enough for self-driving vehicles? Risk Analysis 39, 2 (2019), 315–325.Google Scholar
Cross Ref
- [57] . 2018. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations.Google Scholar
- [58] . 2019. Tackling uncertainty in safety assurance for machine learning: Continuous argument engineering with attributed tests. In SafeComp’19(
LNCS , Vol. 11699). Springer, Cham, 398–404.Google Scholar - [59] . 2008. Toward a property based requirements theory: System requirements structured as a semilattice. Systems Engineering 11, 3 (2008), 235–245.Google Scholar
Digital Library
- [60] . 1992. Estimating the probability of failure when testing reveals no failures. IEEE Tran. on Software Engineering 18, 1 (1992), 33–43.Google Scholar
Digital Library
- [61] . 2017. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1765–1773.Google Scholar
Cross Ref
- [62] . 2016. DeepFool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2574–2582.Google Scholar
Cross Ref
- [63] . 1993. Operational profiles in software-reliability engineering. IEEE Software 10, 2 (1993), 14–32.Google Scholar
Digital Library
- [64] . 2019. A pattern for arguing the assurance of machine learning in medical diagnosis systems. In Computer Safety, Reliability, and Security(
LNCS , Vol. 11698), , , and (Eds.). Springer, Cham, 165–179.Google Scholar - [65] . 2020. Reliability assessment of service-based software under operational profile uncertainty. Reliability Engineering & System Safety 204 (2020), 107193.Google Scholar
Cross Ref
- [66] . 2017. KDE-track: An efficient dynamic density estimator for data streams. IEEE Tran. on Knowledge and Data Engineering 29, 3 (2017), 642–655. Google Scholar
Digital Library
- [67] . 2022. A hierarchical HAZOP-like safety analysis for learning-enabled systems. In AISafety’22 Workshop at IJCAI’22.Google Scholar
- [68] . 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- [69] . 2020. The virtual lands of Oz: Testing an agribot in simulation. Empirical Software Engineering 25, 3 (
May 2020), 2025–2054. Google ScholarDigital Library
- [70] . 2015. Fault tree analysis: A survey of the state-of-the-art in modeling, analysis and tools. Computer Science Review 15-16 (2015), 29–62.Google Scholar
Digital Library
- [71] . 1958. The Uses of Argument. Cambridge University Press.Google Scholar
- [72] . 2015. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.Google Scholar
Cross Ref
- [73] . 1986. Density Estimation for Statistics and Data Analysis. Vol. 26. CRC Press.Google Scholar
Cross Ref
- [74] . 2019. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–30.Google Scholar
Digital Library
- [75] . 2014. Software testing with an operational profile: OP definition. Comput. Surveys 46, 3 (2014).Google Scholar
Digital Library
- [76] . 1997. Guidelines for Statistical Testing.
Technical Report . City, University of London. http://openaccess.city.ac.uk/254/.Google Scholar - [77] . 2013. Software fault-freeness and reliability predictions. In Computer Safety, Reliability, and Security(
LNCS , Vol. 8153), , , and (Eds.). Springer Berlin, Berlin, 106–117. Google ScholarDigital Library
- [78] . 1995. Twenty-five years of HAZOPs. Journal of Loss Prevention in the Process Industries 8, 6 (1995), 349–353. Google Scholar
Cross Ref
- [79] . 2019. The Purpose, Scope and Content of Safety Cases.
Nuclear Safety Technical Assessment Guide NS-TAST-GD-051. Office for Nuclear Regulation. 39 pages. https://www.onr.org.uk/operational/tech_asst_guides/ns-tast-gd-051.pdf.Google Scholar - [80] . 2009. Imprecision and prior-data conflict in generalized Bayesian inference. Journal of Statistical Theory & Practice 3, 1 (2009), 255–271.Google Scholar
Cross Ref
- [81] . 2021. Statistically robust neural network classification. In Proc. of the 37th Conf. on Uncertainty in Artificial Intelligence, Vol. 161. PMLR, 1735–1745.Google Scholar
- [82] . 2019. A statistical approach to assessing neural network robustness. In 7th Int. Conf. Learning Representations (ICLR’19). OpenReview.net, New Orleans, LA, USA.Google Scholar
- [83] . 2019. PROVEN: Verifying robustness of neural networks with a probabilistic approach. In Int. Conf. on Machine Learning. PMLR, 6727–6736.Google Scholar
- [84] . 2018. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations (ICLR).Google Scholar
- [85] . 2020. A closer look at accuracy vs. robustness. In Advances in Neural Information Processing Systems(
NeurIPS’20 , Vol. 33), , , , , and (Eds.). Curran Associates, Inc., 8588–8601.Google Scholar - [86] . 2018. Deep layer aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2403–2412.Google Scholar
Cross Ref
- [87] . 2020. A safety framework for critical systems utilising deep neural networks. In Computer Safety, Reliability, and Security(
LNCS , Vol. 12234), , , , and (Eds.). Springer, 244–259. Google ScholarDigital Library
- [88] . 2020. Interval change-point detection for runtime probabilistic model checking. In Proc. of the 35th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE’20). ACM, 163–174. Google Scholar
Digital Library
- [89] . 2021. Assessing the reliability of deep learning classifiers through robustness evaluation and operational profiles. In AISafety’21 Workshop at IJCAI’21, Vol. 2916.Google Scholar
- [90] . 2021. Detecting operational adversarial examples for reliable deep learning. In 51st Annual IEEE-IFIP Int. Conf. on Dependable Systems and Networks (DSN’21), Vol. Fast Abstract.Google Scholar
- [91] . 2017. Modeling the probability of failure on demand (pfd) of a 1-out-of-2 system in which one channel is “quasi-perfect”. Reliability Engineering & System Safety 158 (2017), 230–245.Google Scholar
Cross Ref
- [92] . 2019. Probabilistic model checking of robots deployed in extreme environments. In Proc. of the AAAI Conference on Artificial Intelligence, Vol. 33. Honolulu, Hawaii, USA, 8076–8084.Google Scholar
Digital Library
- [93] . 2019. Assessing the safety and reliability of autonomous vehicles from road testing. In the 30th Int. Symp. on Software Reliability Engineering. IEEE, Berlin, Germany, 13–23.Google Scholar
- [94] . 2020. Assessing safety-critical systems from operational testing: A study on autonomous vehicles. Information and Software Technology 128 (2020), 106393.Google Scholar
Digital Library
- [95] . 2021. Understanding local robustness of deep neural networks under natural variations. In Fundamental Approaches to Software Engineering(
LNCS , Vol. 12649), and (Eds.). Springer International Publishing, Cham, 313–337.Google ScholarDigital Library
Index Terms
Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance
Recommendations
A pragmatic approach to reasoning about the assurance of safety arguments
SCS '03: Proceedings of the 8th Australian workshop on Safety critical systems and software - Volume 33The development of safety critical systems is guided by standards. Many standards require the development of a safety case to demonstrate the acceptability of Safety Critical Systems. The safety case must provide confidence that the system is deemed ...
Computer System Safety Argument Schemes
WCSE '10: Proceedings of the 2010 Second World Congress on Software Engineering - Volume 01Safety arguments are key components in a safety case. Too often, safety arguments are constructed without proper reasoning., Inappropriate reasoning in a system’s safety argument could undermine the system’s safety claims, which in turn contributes to a ...
Creating safety assurance cases for rebreather systems
ASSURE '13: Proceedings of the 1st International Workshop on Assurance Cases for Software-Intensive SystemsThe creation of safety assurance cases is a new requirement for rebreather manufacturers, where a rebreather is simply understood as a self-contained underwater breathing apparatus. The two main potential benefits to creating safety assurance cases for ...






Comments