Abstract
Adversarial attacks against supervised learninga algorithms, which necessitates the application of logging while using supervised learning algorithms in software projects. Logging enables practitioners to conduct postmortem analysis, which can be helpful to diagnose any conducted attacks. We conduct an empirical study to identify and characterize log-related coding patterns, i.e., recurring coding patterns that can be leveraged to conduct adversarial attacks and needs to be logged. A list of log-related coding patterns can guide practitioners on what to log while using supervised learning algorithms in software projects.
We apply qualitative analysis on 3,004 Python files used to implement 103 supervised learning-based software projects. We identify a list of 54 log-related coding patterns that map to six attacks related to supervised learning algorithms. Using Log Assistant to conduct Postmortems for Supervised Learning (LOPSUL), we quantify the frequency of the identified log-related coding patterns with 278 open-source software projects that use supervised learning. We observe log-related coding patterns to appear for 22% of the analyzed files, where training data forensics is the most frequently occurring category.
- [1] ast — Abstract Syntax Trees. (n.d.). Retrieved from https://docs.python.org/3/library/ast.html.Google Scholar
- [2] Model Zoo: Discover open source deep learning code and pretrained models. (n.d.). Retrieved from https://modelzoo.co.Google Scholar
- [3] . 2018. We don’t need another hero? The impact of “heroes” on software development. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. 245–253.Google Scholar
Digital Library
- [4] . 2020. Politics of adversarial machine learning. In Proceedings of the Towards Trustworthy ML: Rethinking Security and Privacy for ML Workshop, 8th International Conference on Learning Representations.Google Scholar
Cross Ref
- [5] . 2020. Verifiability Package for Paper. Retrieved February 10, 2021 from https://figshare.com/s/689c268c1de59dc7c2bf.Google Scholar
- [6] . 2016. The bones of the system: A case study of logging and telemetry at microsoft. In Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering Companion. IEEE, 92–101.Google Scholar
Digital Library
- [7] . 2010. The security of machine learning. Machine Learning 81, 2 (2010), 121–148.Google Scholar
Digital Library
- [8] . 2022. Source Code of LOPSUL. (2022). Retrieved from https://github.com/paser-group/MLForensics.Google Scholar
- [9] . 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In Proceedings of the 2018 IEEE Security and Privacy Workshops. IEEE, 1–7.Google Scholar
Cross Ref
- [10] . 2018. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv:1811.03728. Retrieved from https://arxiv.org/abs/1811.03728.Google Scholar
- [11] . 2017. Characterizing and detecting anti-patterns in the logging code. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering. IEEE, 71–81.Google Scholar
Digital Library
- [12] . 2020. Studying the use of Java logging utilities in the wild. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering. IEEE, 397–408.Google Scholar
Digital Library
- [13] . 2019. Extracting and studying the Logging-Code-Issue-Introducing changes in Java-based large-scale open source software systems. Empirical Software Engineering 24, 4 (2019), 2285–2322.Google Scholar
Digital Library
- [14] . 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37–46.
DOI: DOI: Google ScholarCross Ref
- [15] . 1999. Doing Qualitative Research. Sage Publications.Google Scholar
- [16] . 2015. Log2: A cost-aware logging mechanism for performance diagnosis. In Proceedings of the 2015 Annual Technical Conference. 139–150.Google Scholar
- [17] . 2018. HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 31–36.
DOI: DOI: Google ScholarCross Ref
- [18] . 2017. Robust physical-world attacks on machine learning models. arXiv preprint arXiv:1707.08945 2, 3 (2017), 4.Google Scholar
- [19] . 2018. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1625–1634.Google Scholar
Cross Ref
- [20] . 2019. Adversarial attacks on medical machine learning. Science 363, 6433 (2019), 1287–1289.Google Scholar
Cross Ref
- [21] . 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In Proceedings of the 23rd {USENIX} Security Symposium. 17–32.Google Scholar
Digital Library
- [22] . 2014. Where do developers log? An empirical study on logging practices in industry. In Companion Proceedings of the 36th International Conference on Software Engineering.Association for Computing Machinery, 24–33.
DOI: DOI: Google ScholarDigital Library
- [23] . 2014. Where do developers log? An empirical study on logging practices in industry. In Companion Proceedings of the 36th International Conference on Software Engineering. 24–33.Google Scholar
Digital Library
- [24] . 2017. Toward algorithmic transparency and accountability. Communications of the ACM 60, 9 (2017), 5.
DOI: DOI: Google ScholarDigital Library
- [25] . 2003. Unsupervised learning. In Proceedings of the Summer School on Machine Learning. Springer, 72–112.Google Scholar
- [26] . 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [27] . When artificial intelligence botches your medical diagnosis, who’s to blame? (n.d.). Retrieved from https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you-whos-to-blame/Google Scholar
- [28] . 2018. Characterizing the natural language descriptions in software logging statements. In Proceedings of the 2018 33rd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 178–189.Google Scholar
Digital Library
- [29] . 2020. Towards security threats of deep learning systems: A survey. IEEE Transactions on Software Engineering (2020).Google Scholar
- [30] . 2020. Towards security threats of deep learning systems: A survey. IEEE Transactions on Software Engineering (2020).Google Scholar
- [31] . 2011. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. 43–58.Google Scholar
Digital Library
- [32] . 2015. Machine learning: Trends, perspectives, and prospects. Science 349, 6245 (2015), 255–260.Google Scholar
Cross Ref
- [33] . 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4 (1996), 237–285.Google Scholar
Digital Library
- [34] . 2015. Enabling forensics by proposing heuristics to identify mandatory log events. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security.Association for Computing Machinery, 11 pages.
DOI: DOI: Google ScholarDigital Library
- [35] . 2017. To log, or not to log: Using heuristics to identify mandatory log events–a controlled experiment. Empirical Software Engineering 22, 5 (2017), 2684–2717.Google Scholar
Digital Library
- [36] . 2020. Thieves on sesame street! model extraction of BERT-based APIs. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=Byl5NREFDr.Google Scholar
- [37] . 2016. Adversarial examples in the physical world. (2016).Google Scholar
- [38] . 2020. Weight poisoning attacks on pretrained models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2793–2806.
DOI: DOI: Google ScholarCross Ref
- [39] . 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159–174. Retrieved from http://www.jstor.org/stable/2529310.Google Scholar
Cross Ref
- [40] . 2020. A qualitative study of the benefits and costs of logging from developers’ perspectives. IEEE Transactions on Software Engineering (2020), 1–1.
DOI: DOI: Google ScholarCross Ref
- [41] . 2020. A qualitative study of the benefits and costs of logging from developers’ perspectives. IEEE Transactions on Software Engineering (2020).Google Scholar
- [42] . 2017. Which log level should developers choose for a new logging statement? Empirical Software Engineering 22, 4 (2017), 1684–1716.Google Scholar
Digital Library
- [43] . 2020. Where shall we log? Studying and suggesting logging locations in code blocks. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 361–372.Google Scholar
Digital Library
- [44] . 2012. Machine learning in financial crisis prediction: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 4 (2012), 421–436.
DOI: DOI: Google ScholarDigital Library
- [45] . 2018. A survey on security threats and defensive techniques of machine learning: A data driven view. IEEE Access 6 (2018), 12103–12117.
DOI: DOI: Google ScholarCross Ref
- [46] . 2019. Which variables should I log? IEEE Transactions on Software Engineering (2019), 1–1.
DOI: DOI: Google ScholarCross Ref
- [47] . 2016. Machine learning in adversarial settings. IEEE Security and Privacy 14, 3 (2016), 68–72.Google Scholar
Digital Library
- [48] . 2014. Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE Journal of Biomedical and Health Informatics 19, 6 (2014), 1893–1905.Google Scholar
Cross Ref
- [49] . 2018. Tutorial Session: Common Pitfalls for Studying the Human Side of Machine Learning. Retrieved February 22, 2021 from https://www.facebook.com/nipsfoundation/videos/2003393576419036/.Google Scholar
- [50] . 2019. A Taxonomy and Terminology of Adversarial Machine Learning. Retrieved February 12, 2021 from https://csrc.nist.gov/publications/detail/nistir/8269/draft.Google Scholar
- [51] . 2012. Advances and challenges in log analysis. Communications of the ACM 55, 2 (2012), 55–61.Google Scholar
Digital Library
- [52] . 2018. A marauder’s map of security and privacy in machine learning: An overview of current and future research directions for making machine learning secure and private. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security. 1–1.Google Scholar
Digital Library
- [53] . 2018. Sok: Security and privacy in machine learning. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy. IEEE, 399–414.Google Scholar
Cross Ref
- [54] . 2021. Secure and robust machine learning for healthcare: A survey. IEEE Reviews in Biomedical Engineering 14 (2021), 156–180.
DOI: DOI: Google ScholarCross Ref
- [55] . 2020. Securing connected autonomous vehicles: Challenges posed by adversarial machine learning and the way forward. IEEE Communications Surveys Tutorials 22, 2 (2020), 998–1026.
DOI: DOI: Google ScholarCross Ref
- [56] 2018. NeurIPS 2018: Rethinking transparency and accountability in machine learning. Retrieved February 23, 2021 from https://hub.packtpub.com/neurips-2018-rethinking-transparency-and-accountability-in-machine-learning/.Google Scholar
- [57] . 2021. Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology 30, 1 (2021), 1–31.Google Scholar
Digital Library
- [58] . 2019. Towards automated logging for forensic-ready software systems. In Proceedings of the 2019 IEEE 27th International Requirements Engineering Conference Workshops. IEEE, 157–163.Google Scholar
Cross Ref
- [59] . 2005. AI a modern approach. Learning 2, 3 (2005), 4.Google Scholar
- [60] . 2020. Dynamic backdoor attacks against machine learning models. arXiv:2003.03675. Retrieved from https://arxiv.org/abs/2003.03675.Google Scholar
- [61] . 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85–117.Google Scholar
Digital Library
- [62] . 2020. Robust machine learning systems: Challenges, current trends, perspectives, and the road ahead. IEEE Design and Test 37, 2 (2020), 30–57.Google Scholar
Cross Ref
- [63] . 2016. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1528–1540.Google Scholar
Digital Library
- [64] . 2020. Adversarial machine learning-industry perspectives. In Proceedings of the 2020 IEEE Security and Privacy Workshops. 69–75.
DOI: DOI: Google ScholarCross Ref
- [65] . 2017. Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 587–601.Google Scholar
Digital Library
- [66] . 2017. Certified defenses for data poisoning attacks. In Proceedings of the NIPS’17 31st International Conference on Neural Information Processing Systems.Google Scholar
- [67] . 2018. Reinforcement Learning: An Introduction. MIT Press.Google Scholar
Digital Library
- [68] . 2019. A taxonomy and terminology of adversarial machine learning. NIST IR (2019).Google Scholar
- [69] . 2005. Introduction to Data Mining (1st. ed.). Addison-Wesley Longman Publishing Co., Inc.Google Scholar
- [70] . 2016. Stealing machine learning models via prediction apis. In Proceedings of the 25th {USENIX} Security Symposium. 601–618.Google Scholar
Digital Library
- [71] . 2018. Adversarial machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 12, 3 (2018), 1–169.Google Scholar
Cross Ref
- [72] . 2018. Stealing hyperparameters in machine learning. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 36–52.Google Scholar
Cross Ref
- [73] . 2020. Backdoor attacks against transfer learning with pre-trained deep learning models. IEEE Transactions on Services Computing (2020).Google Scholar
- [74] . 2018. Security risks in deep learning implementations. In Proceedings of the 2018 IEEE Security and Privacy Workshops. IEEE, 123–128.Google Scholar
Cross Ref
- [75] . 2012. Be conservative: Enhancing failure diagnosis with proactive logging. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation.USENIX Association, 293–306.Google Scholar
- [76] . 2012. Be conservative: Enhancing failure diagnosis with proactive logging. In Proceedings of the 10th {USENIX} Symposium on Operating Systems Design and Implementation. 293–306.Google Scholar
- [77] . 2012. Characterizing logging practices in open-source software. In Proceedings of the 2012 34th International Conference on Software Engineering. IEEE, 102–112.Google Scholar
Cross Ref
- [78] . 2012. Improving software diagnosability via log enhancement. ACM Transactions on Computer Systems 30, 1 (2012), 1–28.Google Scholar
Digital Library
- [79] . 2019. Studying the characteristics of logging practices in mobile apps: A case study on f-droid. Empirical Software Engineering 24, 6 (2019), 3394–3434.Google Scholar
Cross Ref
- [80] . 2017. Efficient label contamination attacks against black-box learning models.. In Proceedings of the IJCAI. 3945–3951.Google Scholar
Cross Ref
- [81] . 2019. An exploratory study of logging configuration practice in Java. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution. IEEE, 459–469.Google Scholar
Cross Ref
- [82] . 2015. Learning to log: Helping developers make informed logging decisions. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 415–425.Google Scholar
Cross Ref
Index Terms
Log-related Coding Patterns to Conduct Postmortems of Attacks in Supervised Learning-based Projects
Recommendations
The relationship between design patterns and code smells
Context-Design patterns represent recommended generic solutions to various design problems, whereas code smells are symptoms of design issues that could hinder further maintenance of a software system. We can intuitively expect that both concepts are ...
Code smells detection via modern code review: a study of the OpenStack and Qt communities
AbstractCode review plays an important role in software quality control. A typical review process involves a careful check of a piece of code in an attempt to detect and locate defects and other quality issues/violations. One type of issue that may impact ...
On learning and recognition of secure patterns
AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security WorkshopLearning and recognition of secure patterns is a well-known problem in nature. Mimicry and camouflage are widely-spread techniques in the arms race between predators and preys. All of the information acquired by our senses is therefore not necessarily ...






Comments