skip to main content
research-article

Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance

Authors Info & Claims
Published:20 April 2023Publication History
Skip Abstract Section

Abstract

The increasing use of Machine Learning (ML) components embedded in autonomous systems—so-called Learning-Enabled Systems (LESs)—has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LESs pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LESs with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose solutions to practical use. Probabilistic safety argument templates at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also scope our methods with case studies on simulated Autonomous Underwater Vehicles and physical Unmanned Ground Vehicles.

REFERENCES

  1. [1] Alves Erin, Bhatt Devesh, Hall Brendan, Driscoll Kevin, Murugesan Anitha, and Rushby John. 2018. Considerations in Assuring Safety of Increasingly Autonomous Systems. Technical Report NASA/CR-2018-220080. NASA. 172 pages.Google ScholarGoogle Scholar
  2. [2] Aminifar Amir. 2020. Universal adversarial perturbations in epileptic seizure detection. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Asaadi Erfan, Denney Ewen, Menzies Jonathan, Pai Ganesh J., and Petroff Dimo. 2020. Dynamic assurance cases: A pathway to trusted autonomy. Computer 53, 12 (2020), 3546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Asaadi Erfan, Denney Ewen, and Pai Ganesh. 2020. Quantifying assurance in learning-enabled systems. In Computer Safety, Reliability, and Security(LNCS, Vol. 12234), Casimiro António, Ortmeier Frank, Bitsch Friedemann, and Ferreira Pedro (Eds.). Springer, Cham, 270286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Ashmore Rob, Calinescu Radu, and Paterson Colin. 2021. Assuring the machine learning lifecycle: Desiderata, methods, and challenges. ACM Computing Surveys (CSUR) 54, 5 (2021), 139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Backurs Arturs, Indyk Piotr, and Wagner Tal. 2019. Space and time efficient kernel density estimation in high dimensions. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 1577315782.Google ScholarGoogle Scholar
  7. [7] Bai Tao, Luo Jinqi, Zhao Jun, Wen Bihan, and Wang Qian. 2021. Recent advances in adversarial training for adversarial robustness. In Proc. of the 30th Int. Joint Conf. on Artificial Intelligence (IJCAI’21). 43124321. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Berend David. 2021. Distribution awareness for AI system testing. In 43rd IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2021, Madrid, Spain, May 25–28, 2021. IEEE, 9698.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Bergstra James and Bengio Yoshua. 2012. Random search for hyper-parameter optimization. J. of Machine Learning Research 13, 2 (2012), 281305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Bertolino A., Miranda B., Pietrantuono R., and Russo S.. 2017. Adaptive coverage and operational profile-based testing for reliability improvement. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, Buenos Aires, Argentina, 541551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Bertolino Antonia, Miranda Breno, Pietrantuono Roberto, and Russo Stefano. 2021. Adaptive test case allocation, selection and generation using coverage spectrum and operational profile. IEEE Transactions on Software Engineering 47, 5 (2021), 881898.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Bevington Philip R., Robinson D. Keith, Blair J. Morris, Mallinckrodt A. John, and McKay Susan. 1993. Data Reduction and Error Analysis for the Physical Sciences. Vol. 7. American Institute of Physics.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Bishop Peter and Bloomfield Robin. 2000. A methodology for safety case development. Safety and Reliability 20, 1 (2000), 3442.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Bishop Peter, Bloomfield Robin, Littlewood Bev, Povyakalo Andrey, and Wright David. 2011. Toward a formalism for conservative claims about the dependability of software-based systems. IEEE Tran. on Software Engineering 37, 5 (2011), 708717.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Bishop Peter and Povyakalo Andrey. 2017. Deriving a frequentist conservative confidence bound for probability of failure per demand for systems with different operational and test profiles. Reliability Engineering & System Safety 158 (2017), 246253.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Bloomfield Robin and Bishop Peter. 2010. Safety and assurance cases: Past, present and possible future – an Adelard perspective. In Making Systems Safer, Dale Chris and Anderson Tom (Eds.). Springer London, London, 5167.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Bloomfield Robin, Fletcher Gareth, Khlaaf Heidy, Hinde Luke, and Ryan Philippa. 2021. Safety case templates for autonomous systems. arXiv preprint arXiv:2102.02625 (2021).Google ScholarGoogle Scholar
  18. [18] Bloomfield Robin, Khlaaf Heidy, Conmy Philippa Ryan, and Fletcher Gareth. 2019. Disruptive innovations and disruptive assurance: Assuring machine learning and autonomy. Computer 52, 9 (2019), 8289.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Bloomfield R. and Netkachova K.. 2014. Building blocks for assurance cases. In IEEE International Symposium on Software Reliability Engineering Workshops. IEEE, Naples, Italy, 186191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Bloomfield Robin and Rushby John. 2020. Assurance 2.0: A manifesto. arXiv preprint arXiv:2004.10474 (2020).Google ScholarGoogle Scholar
  21. [21] Burton Simon, Habli Ibrahim, Lawton Tom, McDermid John, Morgan Phillip, and Porter Zoe. 2020. Mind the gaps: Assuring the safety of autonomous systems from an engineering, ethical, and legal perspective. Artificial Intelligence 279 (2020), 103201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Calinescu R., Weyns D., Gerasimou S., Iftikhar M. U., Habli I., and Kelly T.. 2018. Engineering trustworthy self-adaptive software with dynamic assurance cases. IEEE Tran. on Software Engineering 44, 11 (2018), 10391069.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Chen Yen-Chi. 2017. A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology 1, 1 (2017), 161187.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Cotroneo Domenico, Pietrantuono Roberto, and Russo Stefano. 2016. RELAI testing: A technique to assess and improve software reliability. IEEE Transactions on Software Engineering 42, 5 (2016), 452475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Crawley Frank and Tyler Brian. 2015. HAZOP: Guide to Best Practice. Elsevier.Google ScholarGoogle Scholar
  26. [26] Dola Swaroopa, Dwyer Matthew B., and Soffa Mary Lou. 2021. Distribution-aware testing of neural networks using generative models. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22–30 May 2021. IEEE, 226237.Google ScholarGoogle Scholar
  27. [27] Fawzi Alhussein, Fawzi Omar, and Frossard Pascal. 2018. Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning 107, 3 (2018), 481508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Frankl P. G., Hamlet R. G., Littlewood B., and Strigini L.. 1998. Evaluating testing methods by delivered reliability [software]. IEEE Tran. on Softw. Eng. 24, 8 (1998), 586601.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Gehr Timon, Mirman Matthew, Drachsler-Cohen Dana, Tsankov Petar, Chaudhuri Swarat, and Vechev Martin. 2018. AI2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 318.Google ScholarGoogle Scholar
  30. [30] Guerriero Antonio. 2020. Reliability evaluation of ML systems, the oracle problem. In Int. Symp. on Software Reliability Engineering Workshops (ISSREW). IEEE, Coimbra, Portugal, 127130. Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Guerriero Antonio, Pietrantuono Roberto, and Russo Stefano. 2021. Operation is the hardest teacher: Estimating DNN accuracy looking for mispredictions. In IEEE/ACM 43rd Int. Conf. on Software Engineering (ICSE’21). Madrid, Spain, 348358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Guo Lijie and Kang Jianxin. 2015. An extended HAZOP analysis approach with dynamic fault tree. Journal of Loss Prevention in the Process Industries 38 (2015), 224232. Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Hamlet D. and Taylor R.. 1990. Partition testing does not inspire confidence. IEEE Tran. on Software Engineering 16, 12 (1990), 14021411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Harel-Canada Fabrice, Wang Lingxiao, Gulzar Muhammad Ali, Gu Quanquan, and Kim Miryung. 2020. Is neuron coverage a meaningful measure for testing deep neural networks?. In Proc. of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 851862.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Hereau Adrien, Godary-Dejean Karen, Guiochet Jérémie, Robert Clément, Claverie Thomas, and Crestani Didier. 2020. Testing an underwater robot executing transect missions in Mayotte. In Towards Autonomous Robotic Systems(LNCS, Vol. 12228), Mohammad Abdelkhalick, Dong Xin, and Russo Matteo (Eds.). Springer, Cham, 116127.Google ScholarGoogle Scholar
  36. [36] Huang Wei, Sun Youcheng, Zhao Xingyu, Sharp James, Ruan Wenjie, Meng Jie, and Huang Xiaowei. 2021. Coverage guided testing for recurrent neural networks. IEEE Tran. on Reliability (2021). early access.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Huang Xiaowei, Kroening Daniel, Ruan Wenjie, al et. 2020. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review 37 (2020), 100270.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Huang Xiaowei, Kwiatkowska Marta, Wang Sen, and Wu Min. 2017. Safety verification of deep neural networks. In Computer Aided Verification(LNCS, Vol. 10426). Springer International Publishing, Cham, 329.Google ScholarGoogle Scholar
  39. [39] Ishikawa Fuyuki and Matsuno Yutaka. 2018. Continuous argument engineering: Tackling uncertainty in machine learning based systems. In SafeComp’18(LNCS, Vol. 11094), Gallina Barbara, Skavhaug Amund, Schoitsch Erwin, and Bitsch Friedemann (Eds.). Springer, Cham, 1421.Google ScholarGoogle Scholar
  40. [40] Javed Muhammad Atif, Muram Faiz Ul, Hansson Hans, Punnekkat Sasikumar, and Thane Henrik. 2021. Towards dynamic safety assurance for Industry 4.0. Journal of Systems Architecture 114 (2021), 101914. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Johnson. C. W.2018. The increasing risks of risk assessment: On the rise of artificial intelligence and non-determinism in safety-critical systems. In the 26th Safety-Critical Systems Symposium. Safety-Critical Systems Club, York, UK., 15.Google ScholarGoogle Scholar
  42. [42] Kalra Nidhi and Paddock Susan M.. 2016. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice 94 (2016), 182193.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Katz Guy, Barrett Clark, Dill David L., Julian Kyle, and Kochenderfer Mykel J.. 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. In CAV’17(LNCS, Vol. 10426). Springer, Cham, 97117.Google ScholarGoogle Scholar
  44. [44] Kelly Timothy Patrick. 1999. Arguing Safety: A Systematic Approach to Managing Safety Cases. PhD Thesis. University of York.Google ScholarGoogle Scholar
  45. [45] Kläs Michael, Adler Rasmus, Jöckel Lisa, Groß Janek, and Reich Jan. 2021. Using complementary risk acceptance criteria to structure assurance cases for safety-critical AI components. In AISafety’21 Workshop at IJCAI’21.Google ScholarGoogle Scholar
  46. [46] Knight John. 2015. The importance of security cases: Proof is good, but not enough. IEEE Security Privacy 13, 4 (2015), 7375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Koopman Philip, Kane Aaron, and Black Jen. 2019. Credible autonomy safety argumentation. In 27th Safety-Critical Systems Symp.Safety-Critical Systems Club, Bristol, UK.Google ScholarGoogle Scholar
  48. [48] Kurakin Alexey, Goodfellow Ian J., and Bengio Samy. 2018. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security. Chapman and Hall/CRC, 99112.Google ScholarGoogle Scholar
  49. [49] Lane David, Bisset David, Buckingham Rob, Pegman Geoff, and Prescott Tony. 2016. New Foresight Review on Robotics and Autonomous Systems. Technical Report No. 2016.1. Lloyd’s Register Foundation, London, U.K.65 pages.Google ScholarGoogle Scholar
  50. [50] Lee Rebecca J., Wysocki Oskar, Zhou Cong, Shotton Rohan, Tivey Ann, Lever Louise, Woodcock Joshua, Albiges Laurence, Angelakas Angelos, Arnold Dirk, et al. 2022. Establishment of CORONET, COVID-19 risk in oncology evaluation tool, to identify patients with cancer at low versus high risk of severe complications of COVID-19 disease on presentation to hospital. JCO Clinical Cancer Informatics 6 (2022), e2100177.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Lee W. S., Grosh D. L., Tillman F. A., and Lie C. H.. 1985. Fault tree analysis, methods, and applications - a review. IEEE Tran. on Reliability R-34, 3 (1985), 194203. Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Li Zenan, Ma Xiaoxing, Xu Chang, Cao Chun, Xu Jingwei, and Lü Jian. 2019. Boosting operational DNN testing efficiency through conditioning. In Proc. of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 499509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Littlewood B. and Rushby J.. 2012. Reasoning about the reliability of diverse two-channel systems in which one channel is “possibly perfect”. IEEE Tran. on Software Engineering 38, 5 (2012), 11781194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Littlewood Bev and Strigini Lorenzo. 1993. Validation of ultra-high dependability for software-based systems. Commun. ACM 36, 11 (1993), 6980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Littlewood Bev and Strigini Lorenzo. 2000. Software reliability and dependability: A roadmap. In Proc. of the Conference on The Future of Software Engineering (ICSE’00). ACM, New York, NY, USA, 175188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Liu Peng, Yang Run, and Xu Zhigang. 2019. How safe is safe enough for self-driving vehicles? Risk Analysis 39, 2 (2019), 315325.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Madry Aleksander, Makelov Aleksandar, Schmidt Ludwig, Tsipras Dimitris, and Vladu Adrian. 2018. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  58. [58] Matsuno Yutaka, Ishikawa Fuyuki, and Tokumoto Susumu. 2019. Tackling uncertainty in safety assurance for machine learning: Continuous argument engineering with attributed tests. In SafeComp’19(LNCS, Vol. 11699). Springer, Cham, 398404.Google ScholarGoogle Scholar
  59. [59] Micouin Patrice. 2008. Toward a property based requirements theory: System requirements structured as a semilattice. Systems Engineering 11, 3 (2008), 235245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Miller Keith W., Morell Larry J., Noonan Robert E., Park Stephen K., Nicol David M., Murrill Branson W., and Voas M.. 1992. Estimating the probability of failure when testing reveals no failures. IEEE Tran. on Software Engineering 18, 1 (1992), 3343.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Moosavi-Dezfooli Seyed-Mohsen, Fawzi Alhussein, Fawzi Omar, and Frossard Pascal. 2017. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 17651773.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Moosavi-Dezfooli Seyed-Mohsen, Fawzi Alhussein, and Frossard Pascal. 2016. DeepFool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25742582.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Musa John. 1993. Operational profiles in software-reliability engineering. IEEE Software 10, 2 (1993), 1432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Picardi Chiara, Hawkins Richard, Paterson Colin, and Habli Ibrahim. 2019. A pattern for arguing the assurance of machine learning in medical diagnosis systems. In Computer Safety, Reliability, and Security(LNCS, Vol. 11698), Romanovsky Alexander, Troubitsyna Elena, and Bitsch Friedemann (Eds.). Springer, Cham, 165179.Google ScholarGoogle Scholar
  65. [65] Pietrantuono Roberto, Popov Peter, and Russo Stefano. 2020. Reliability assessment of service-based software under operational profile uncertainty. Reliability Engineering & System Safety 204 (2020), 107193.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Qahtan Abdulhakim, Wang Suojin, and Zhang Xiangliang. 2017. KDE-track: An efficient dynamic density estimator for data streams. IEEE Tran. on Knowledge and Data Engineering 29, 3 (2017), 642655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Qi Yi, Conmy Philippa Ryan, Huang Wei, Zhao Xingyu, and Huang Xiaowei. 2022. A hierarchical HAZOP-like safety analysis for learning-enabled systems. In AISafety’22 Workshop at IJCAI’22.Google ScholarGoogle Scholar
  68. [68] Redmon Joseph and Farhadi Ali. 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  69. [69] Robert Clément, Sotiropoulos Thierry, Waeselynck Héléne, Guiochet Jérémie, and Vernhes Simon. 2020. The virtual lands of Oz: Testing an agribot in simulation. Empirical Software Engineering 25, 3 (May2020), 20252054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Ruijters Enno and Stoelinga Mariëlle. 2015. Fault tree analysis: A survey of the state-of-the-art in modeling, analysis and tools. Computer Science Review 15-16 (2015), 2962.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Toulmin S.. 1958. The Uses of Argument. Cambridge University Press.Google ScholarGoogle Scholar
  72. [72] Scott David W.. 2015. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Silverman Bernard W.. 1986. Density Estimation for Statistics and Data Analysis. Vol. 26. CRC Press.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Singh Gagandeep, Gehr Timon, Püschel Markus, and Vechev Martin. 2019. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages 3, POPL (2019), 130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. [75] Smidts Carol, Mutha Chetan, Rodríguez Manuel, and Gerber Matthew J.. 2014. Software testing with an operational profile: OP definition. Comput. Surveys 46, 3 (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Strigini Lorenzo and Littlewood Bev. 1997. Guidelines for Statistical Testing. Technical Report. City, University of London. http://openaccess.city.ac.uk/254/.Google ScholarGoogle Scholar
  77. [77] Strigini Lorenzo and Povyakalo Andrey. 2013. Software fault-freeness and reliability predictions. In Computer Safety, Reliability, and Security(LNCS, Vol. 8153), Bitsch Friedemann, Guiochet Jérémie, and Kaâniche Mohamed (Eds.). Springer Berlin, Berlin, 106117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Swann C. D. and Preston M. L.. 1995. Twenty-five years of HAZOPs. Journal of Loss Prevention in the Process Industries 8, 6 (1995), 349353. Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] Regulation UK Office for Nuclear. 2019. The Purpose, Scope and Content of Safety Cases. Nuclear Safety Technical Assessment Guide NS-TAST-GD-051. Office for Nuclear Regulation. 39 pages. https://www.onr.org.uk/operational/tech_asst_guides/ns-tast-gd-051.pdf.Google ScholarGoogle Scholar
  80. [80] Walter Gero and Augustin Thomas. 2009. Imprecision and prior-data conflict in generalized Bayesian inference. Journal of Statistical Theory & Practice 3, 1 (2009), 255271.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Wang Benjie, Webb Stefan, and Rainforth Tom. 2021. Statistically robust neural network classification. In Proc. of the 37th Conf. on Uncertainty in Artificial Intelligence, Vol. 161. PMLR, 17351745.Google ScholarGoogle Scholar
  82. [82] Webb Stefan, Rainforth Tom, Teh Yee Whye, and Kumar M. Pawan. 2019. A statistical approach to assessing neural network robustness. In 7th Int. Conf. Learning Representations (ICLR’19). OpenReview.net, New Orleans, LA, USA.Google ScholarGoogle Scholar
  83. [83] Weng Lily, Chen Pin-Yu, Nguyen Lam, Squillante Mark, Boopathy Akhilan, Oseledets Ivan, and Daniel Luca. 2019. PROVEN: Verifying robustness of neural networks with a probabilistic approach. In Int. Conf. on Machine Learning. PMLR, 67276736.Google ScholarGoogle Scholar
  84. [84] Weng T.-W., Zhang H., Chen P.-Y., Yi J., Su D., Gao Y., Hsieh C.-J., and Daniel L.. 2018. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  85. [85] Yang Yao-Yuan, Rashtchian Cyrus, Zhang Hongyang, Salakhutdinov Russ R., and Chaudhuri Kamalika. 2020. A closer look at accuracy vs. robustness. In Advances in Neural Information Processing Systems(NeurIPS’20, Vol. 33), Larochelle H., Ranzato M., Hadsell R., Balcan M. F., and Lin H. (Eds.). Curran Associates, Inc., 85888601.Google ScholarGoogle Scholar
  86. [86] Yu Fisher, Wang Dequan, Shelhamer Evan, and Darrell Trevor. 2018. Deep layer aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 24032412.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Zhao Xingyu, Banks Alec, Sharp James, Robu Valentin, Flynn David, Fisher Michael, and Huang Xiaowei. 2020. A safety framework for critical systems utilising deep neural networks. In Computer Safety, Reliability, and Security(LNCS, Vol. 12234), Casimiro António, Ortmeier Frank, Bitsch Friedemann, and Ferreira Pedro (Eds.). Springer, 244259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. [88] Zhao Xingyu, Calinescu Radu, Gerasimou Simos, Robu Valentin, and Flynn David. 2020. Interval change-point detection for runtime probabilistic model checking. In Proc. of the 35th IEEE/ACM Int. Conf. on Automated Software Engineering (ASE’20). ACM, 163174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. [89] Zhao Xingyu, Huang Wei, Banks Alec, Cox Victoria, Flynn David, Schewe Sven, and Huang Xiaowei. 2021. Assessing the reliability of deep learning classifiers through robustness evaluation and operational profiles. In AISafety’21 Workshop at IJCAI’21, Vol. 2916.Google ScholarGoogle Scholar
  90. [90] Zhao Xingyu, Huang Wei, Schewe Sven, Dong Yi, and Huang Xiaowei. 2021. Detecting operational adversarial examples for reliable deep learning. In 51st Annual IEEE-IFIP Int. Conf. on Dependable Systems and Networks (DSN’21), Vol. Fast Abstract.Google ScholarGoogle Scholar
  91. [91] Zhao Xingyu, Littlewood Bev, Povyakalo Andrey, Strigini Lorenzo, and Wright David. 2017. Modeling the probability of failure on demand (pfd) of a 1-out-of-2 system in which one channel is “quasi-perfect”. Reliability Engineering & System Safety 158 (2017), 230245.Google ScholarGoogle ScholarCross RefCross Ref
  92. [92] Zhao Xingyu, Robu Valentin, Flynn David, Dinmohammadi Fateme, Fisher Michael, and Webster Matt. 2019. Probabilistic model checking of robots deployed in extreme environments. In Proc. of the AAAI Conference on Artificial Intelligence, Vol. 33. Honolulu, Hawaii, USA, 80768084.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. [93] Zhao Xingyu, Robu Valentin, Flynn David, Salako Kizito, and Strigini Lorenzo. 2019. Assessing the safety and reliability of autonomous vehicles from road testing. In the 30th Int. Symp. on Software Reliability Engineering. IEEE, Berlin, Germany, 1323.Google ScholarGoogle Scholar
  94. [94] Zhao Xingyu, Salako Kizito, Strigini Lorenzo, Robu Valentin, and Flynn David. 2020. Assessing safety-critical systems from operational testing: A study on autonomous vehicles. Information and Software Technology 128 (2020), 106393.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. [95] Zhong Ziyuan, Tian Yuchi, and Ray Baishakhi. 2021. Understanding local robustness of deep neural networks under natural variations. In Fundamental Approaches to Software Engineering(LNCS, Vol. 12649), Guerra Esther and Stoelinga Mariëlle (Eds.). Springer International Publishing, Cham, 313337.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reliability Assessment and Safety Arguments for Machine Learning Components in System Assurance

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Article Metrics

              • Downloads (Last 12 months)388
              • Downloads (Last 6 weeks)68

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Full Text

            View this article in Full Text.

            View Full Text

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!