skip to main content
research-article
Open Access

Adversarial examples for models of code

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code using adversarial examples. The main idea of our approach is to force a given trained model to make an incorrect prediction, as specified by the adversary, by introducing small perturbations that do not change the program’s semantics, thereby creating an adversarial example. To find such perturbations, we present a new technique for Discrete Adversarial Manipulation of Programs (DAMP). DAMP works by deriving the desired prediction with respect to the model’s inputs, while holding the model weights constant, and following the gradients to slightly modify the input code.

We show that our DAMP attack is effective across three neural architectures: code2vec, GGNN, and GNN-FiLM, in both Java and C#. Our evaluations demonstrate that DAMP has up to 89% success rate in changing a prediction to the adversary’s choice (a targeted attack) and a success rate of up to 94% in changing a given prediction to any incorrect prediction (a non-targeted attack). To defend a model against such attacks, we empirically examine a variety of possible defenses and discuss their trade-offs. We show that some of these defenses can dramatically drop the success rate of the attacker, with a minor penalty of 2% relative degradation in accuracy when they are not performing under attack.

Our code, data, and trained models are available at <a>https://github.com/tech-srl/adversarial-examples</a> .

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a presentation video of my talk at OOPSLA 2020. Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code. The main idea of our approach is to force a given trained model to make an incorrect prediction, as specified by the adversary, by introducing small perturbations that do not change the program’s semantics. To find such perturbations, we present a new technique for Discrete Adversarial Manipulation of Programs (DAMP). To defend a model against such attacks, we empirically examine a variety of possible defenses and discuss their trade-offs. We show that some of these defenses can dramatically drop the success rate of the attacker, with a minor penalty of 2% relative degradation in accuracy when they are not performing under attack.

References

  1. Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=BJOFETxR-Google ScholarGoogle Scholar
  2. Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. 2091-2100. http://jmlr.org/proceedings/papers/v48/allamanis16.htmlGoogle ScholarGoogle Scholar
  3. Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019a. code2seq: Generating Sequences from Structured Representations of Code. In International Conference on Learning Representations. https://openreview.net/forum?id=H1gKYo09tXGoogle ScholarGoogle Scholar
  4. Uri Alon, Roy Sadaka, Omer Levy, and Eran Yahav. 2019b. Structural Language of Code. arXiv preprint arXiv: 1910. 00577 ( 2019 ).Google ScholarGoogle Scholar
  5. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A General Path-based Representation for Predicting Program Properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018 ). ACM, New York, NY, USA, 404-419. https://doi.org/10.1145/3192366.3192412 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019c. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 ( Jan. 2019 ), 29 pages. https://doi.org/10.1145/3290353 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. 2018a. Did you hear that? adversarial examples against automatic speech recognition. arXiv preprint arXiv: 1801. 00554 ( 2018 ).Google ScholarGoogle Scholar
  8. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018b. Generating natural language adversarial examples. arXiv preprint arXiv: 1804. 07998 ( 2018 ).Google ScholarGoogle Scholar
  9. Daniel Arp, Michael Spreitzenbarth, Malte Hübner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. DREBIN: Efective and Explainable Detection of Android Malware in Your Pocket. ( 2014 ).Google ScholarGoogle Scholar
  10. Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, and Martin Vechev. 2019. Certifying geometric robustness of neural networks. In Advances in Neural Information Processing Systems. 15313-15323.Google ScholarGoogle Scholar
  11. Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv: 1809. 05193 ( 2018 ).Google ScholarGoogle Scholar
  12. Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 ( 2017 ).Google ScholarGoogle Scholar
  13. Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. 2933-2942. http://jmlr.org/proceedings/papers/v48/bielik16.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pavol Bielik and Martin T. Vechev. 2020. Adversarial Robustness for Code. ArXiv abs/ 2002.04694 ( 2020 ).Google ScholarGoogle Scholar
  15. Marc Brockschmidt. 2019. GNN-FiLM: Graph neural networks with feature-wise linear modulation. arXiv preprint arXiv: 1906. 12192 ( 2019 ).Google ScholarGoogle Scholar
  16. Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. 2019. Generative Code Modeling with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=Bke4KsA5FXGoogle ScholarGoogle Scholar
  17. Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. 2017. Adversarial patch. arXiv preprint arXiv:1712.09665 ( 2017 ).Google ScholarGoogle Scholar
  18. Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When Deep Learning Met Code Search. arXiv preprint arXiv: 1905. 03813 ( 2019 ).Google ScholarGoogle Scholar
  19. Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 1-7.Google ScholarGoogle ScholarCross RefCross Ref
  20. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 ( 2014 ).Google ScholarGoogle Scholar
  21. Yaniv David, Uri Alon, and Eran Yahav. 2019. Neural Reverse Engineering of Stripped Binaries. arXiv preprint arXiv: 1902. 09122 ( 2019 ).Google ScholarGoogle Scholar
  22. Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2017. Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 ( 2017 ).Google ScholarGoogle Scholar
  23. Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured Neural Summarization. In International Conference on Learning Representations. https://openreview.net/forum?id=H1ersoRqtmGoogle ScholarGoogle Scholar
  24. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014a. Generative adversarial nets. In Advances in neural information processing systems. 2672-2680.Google ScholarGoogle Scholar
  25. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014b. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 ( 2014 ).Google ScholarGoogle Scholar
  26. Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2016. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435 ( 2016 ).Google ScholarGoogle Scholar
  27. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735-1780.Google ScholarGoogle Scholar
  29. Hossein Hosseini, Baicen Xiao, Mayoore Jaiswal, and Radha Poovendran. 2017. On the limitation of convolutional neural networks in recognizing negative images. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 352-358.Google ScholarGoogle ScholarCross RefCross Ref
  30. Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1 : Long Papers. http://aclweb.org/anthology/P/P16/P16-1195.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  31. Henry J Kelley. 1960. Gradient theory of optimal flight paths. Ars Journal 30, 10 ( 1960 ), 947-954.Google ScholarGoogle ScholarCross RefCross Ref
  32. Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, Davide Maiorca, Giorgio Giacinto, Claudia Eckert, and Fabio Roli. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 533-537.Google ScholarGoogle ScholarCross RefCross Ref
  33. Felix Kreuk, Assi Barak, Shir Aviv-Reuven, Moran Baruch, Benny Pinkas, and Joseph Keshet. 2018. Deceiving end-to-end deep learning malware detectors using adversarial examples. arXiv preprint arXiv: 1802. 04528 ( 2018 ).Google ScholarGoogle Scholar
  34. Alex Krizhevsky, Ilya Sutskever, and Geofrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1097-1105. http://papers.nips.cc/paper/4824-imagenet-classification-withdeep-convolutional-neural-networks.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  35. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 ( 2016 ).Google ScholarGoogle Scholar
  36. Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. In ICLR.Google ScholarGoogle Scholar
  37. Jason Liu, Seohyun Kim, Vijayaraghavan Murali, Swarat Chaudhuri, and Satish Chandra. 2019. Neural Query Expansion for Code Search. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2019 ). ACM, New York, NY, USA, 29-37. https://doi.org/10.1145/3315508.3329975 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-Driven Program Completion. CoRR abs/1705.09042 ( 2017 ). arXiv: 1705.09042 http://arxiv.org/abs/1705.09042Google ScholarGoogle Scholar
  39. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 ( 2014 ).Google ScholarGoogle Scholar
  40. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2574-2582.Google ScholarGoogle ScholarCross RefCross Ref
  41. Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian Sketch Learning for Program Synthesis. CoRR abs/1703.05698 ( 2017 ). arXiv: 1703.05698 http://arxiv.org/abs/1703.05698Google ScholarGoogle Scholar
  42. Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. 87. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Anh Nguyen, Jason Yosinski, and Jef Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 427-436.Google ScholarGoogle ScholarCross RefCross Ref
  44. Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. ACM, 506-519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372-387.Google ScholarGoogle ScholarCross RefCross Ref
  46. Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147 (Oct. 2018 ), 25 pages. https://doi.org/10.1145/3276517 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Danish Pruthi, Bhuwan Dhingra, and Zachary C Lipton. 2019. Combating Adversarial Misspellings with Robust Word Recognition. ACL ( 2019 ).Google ScholarGoogle Scholar
  48. Md Rafiqul Islam Rabin, Ke Wang, and Mohammad Amin Alipour. 2019. Testing Neural Program Analyzers. ASE-Late Breaking Results ( 2019 ).Google ScholarGoogle Scholar
  49. Goutham Ramakrishnan, Jordan Henkel, Zi Wang, Aws Albarghouthi, Somesh Jha, and Thomas Reps. 2020. Semantic Robustness of Models of Source Code. arXiv preprint arXiv: 2002. 03043 ( 2020 ).Google ScholarGoogle Scholar
  50. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '15). ACM, New York, NY, USA, 111-124. https://doi.org/10.1145/2676726.2677009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting argument selection defects. Proceedings of the ACM on Programming Languages 1, OOPSLA ( 2017 ), 104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ishai Rosenberg, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2018. Generic black-box end-to-end attack against state of the art API call based malware classifiers. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 490-510.Google ScholarGoogle ScholarCross RefCross Ref
  53. Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on source code: a neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, [email protected] 2018, Philadelphia, PA, USA, June 18-22, 2018. 31-41. https://doi.org/10.1145/3211346.3211353 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In 2015 10th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, 11-20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 ( 2008 ), 61-80.Google ScholarGoogle Scholar
  56. Andrew Scott, Johannes Bader, and Satish Chandra. 2019. Getafix: Learning to fix bugs automatically. arXiv preprint arXiv: 1902. 06111 ( 2019 ).Google ScholarGoogle Scholar
  57. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ).Google ScholarGoogle Scholar
  58. Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. 2019. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages 3, POPL ( 2019 ), 1-30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Octavian Suciu, Scott E Coull, and Jefrey Johns. 2019. Exploring adversarial examples in malware detection. In 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 8-14.Google ScholarGoogle ScholarCross RefCross Ref
  60. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1-9.Google ScholarGoogle ScholarCross RefCross Ref
  61. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 ( 2013 ).Google ScholarGoogle Scholar
  62. Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2019. Targeted adversarial examples for black box audio systems. In 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 15-20.Google ScholarGoogle ScholarCross RefCross Ref
  63. Eric Wallace, Mitchell Stern, and Dawn Song. 2020. Imitation Attacks and Defenses for Black-box Machine Translation Systems. arXiv preprint arXiv: 2004. 15015 ( 2020 ).Google ScholarGoogle Scholar
  64. Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G Ororbia II, Xinyu Xing, Xue Liu, and C Lee Giles. 2017. Adversary resistant deep neural networks with an application to malware detection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1145-1153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Wei Yang, Deguang Kong, Tao Xie, and Carl A Gunter. 2017. Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps. In Proceedings of the 33rd Annual Computer Security Applications Conference. 288-302.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adversarial examples for models of code

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Programming Languages
        Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
        November 2020
        3108 pages
        EISSN:2475-1421
        DOI:10.1145/3436718
        Issue’s Table of Contents

        Copyright © 2020 Owner/Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 November 2020
        Published in pacmpl Volume 4, Issue OOPSLA

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!