Abstract
Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code using adversarial examples. The main idea of our approach is to force a given trained model to make an incorrect prediction, as specified by the adversary, by introducing small perturbations that do not change the program’s semantics, thereby creating an adversarial example. To find such perturbations, we present a new technique for Discrete Adversarial Manipulation of Programs (DAMP). DAMP works by deriving the desired prediction with respect to the model’s inputs, while holding the model weights constant, and following the gradients to slightly modify the input code.
We show that our DAMP attack is effective across three neural architectures: code2vec, GGNN, and GNN-FiLM, in both Java and C#. Our evaluations demonstrate that DAMP has up to 89% success rate in changing a prediction to the adversary’s choice (a targeted attack) and a success rate of up to 94% in changing a given prediction to any incorrect prediction (a non-targeted attack). To defend a model against such attacks, we empirically examine a variety of possible defenses and discuss their trade-offs. We show that some of these defenses can dramatically drop the success rate of the attacker, with a minor penalty of 2% relative degradation in accuracy when they are not performing under attack.
Our code, data, and trained models are available at <a>https://github.com/tech-srl/adversarial-examples</a> .
Supplemental Material
- Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=BJOFETxR-Google Scholar
- Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. 2091-2100. http://jmlr.org/proceedings/papers/v48/allamanis16.htmlGoogle Scholar
- Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019a. code2seq: Generating Sequences from Structured Representations of Code. In International Conference on Learning Representations. https://openreview.net/forum?id=H1gKYo09tXGoogle Scholar
- Uri Alon, Roy Sadaka, Omer Levy, and Eran Yahav. 2019b. Structural Language of Code. arXiv preprint arXiv: 1910. 00577 ( 2019 ).Google Scholar
- Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A General Path-based Representation for Predicting Program Properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018 ). ACM, New York, NY, USA, 404-419. https://doi.org/10.1145/3192366.3192412 Google Scholar
Digital Library
- Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019c. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 ( Jan. 2019 ), 29 pages. https://doi.org/10.1145/3290353 Google Scholar
Digital Library
- Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. 2018a. Did you hear that? adversarial examples against automatic speech recognition. arXiv preprint arXiv: 1801. 00554 ( 2018 ).Google Scholar
- Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018b. Generating natural language adversarial examples. arXiv preprint arXiv: 1804. 07998 ( 2018 ).Google Scholar
- Daniel Arp, Michael Spreitzenbarth, Malte Hübner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. DREBIN: Efective and Explainable Detection of Android Malware in Your Pocket. ( 2014 ).Google Scholar
- Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, and Martin Vechev. 2019. Certifying geometric robustness of neural networks. In Advances in Neural Information Processing Systems. 15313-15323.Google Scholar
- Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv: 1809. 05193 ( 2018 ).Google Scholar
- Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 ( 2017 ).Google Scholar
- Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. 2933-2942. http://jmlr.org/proceedings/papers/v48/bielik16.htmlGoogle Scholar
Digital Library
- Pavol Bielik and Martin T. Vechev. 2020. Adversarial Robustness for Code. ArXiv abs/ 2002.04694 ( 2020 ).Google Scholar
- Marc Brockschmidt. 2019. GNN-FiLM: Graph neural networks with feature-wise linear modulation. arXiv preprint arXiv: 1906. 12192 ( 2019 ).Google Scholar
- Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. 2019. Generative Code Modeling with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=Bke4KsA5FXGoogle Scholar
- Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. 2017. Adversarial patch. arXiv preprint arXiv:1712.09665 ( 2017 ).Google Scholar
- Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When Deep Learning Met Code Search. arXiv preprint arXiv: 1905. 03813 ( 2019 ).Google Scholar
- Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 1-7.Google Scholar
Cross Ref
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 ( 2014 ).Google Scholar
- Yaniv David, Uri Alon, and Eran Yahav. 2019. Neural Reverse Engineering of Stripped Binaries. arXiv preprint arXiv: 1902. 09122 ( 2019 ).Google Scholar
- Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2017. Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 ( 2017 ).Google Scholar
- Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured Neural Summarization. In International Conference on Learning Representations. https://openreview.net/forum?id=H1ersoRqtmGoogle Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014a. Generative adversarial nets. In Advances in neural information processing systems. 2672-2680.Google Scholar
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014b. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 ( 2014 ).Google Scholar
- Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2016. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435 ( 2016 ).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.Google Scholar
Digital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735-1780.Google Scholar
- Hossein Hosseini, Baicen Xiao, Mayoore Jaiswal, and Radha Poovendran. 2017. On the limitation of convolutional neural networks in recognizing negative images. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 352-358.Google Scholar
Cross Ref
- Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1 : Long Papers. http://aclweb.org/anthology/P/P16/P16-1195.pdfGoogle Scholar
Cross Ref
- Henry J Kelley. 1960. Gradient theory of optimal flight paths. Ars Journal 30, 10 ( 1960 ), 947-954.Google Scholar
Cross Ref
- Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, Davide Maiorca, Giorgio Giacinto, Claudia Eckert, and Fabio Roli. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 533-537.Google Scholar
Cross Ref
- Felix Kreuk, Assi Barak, Shir Aviv-Reuven, Moran Baruch, Benny Pinkas, and Joseph Keshet. 2018. Deceiving end-to-end deep learning malware detectors using adversarial examples. arXiv preprint arXiv: 1802. 04528 ( 2018 ).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geofrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1097-1105. http://papers.nips.cc/paper/4824-imagenet-classification-withdeep-convolutional-neural-networks.pdfGoogle Scholar
Digital Library
- Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 ( 2016 ).Google Scholar
- Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks. In ICLR.Google Scholar
- Jason Liu, Seohyun Kim, Vijayaraghavan Murali, Swarat Chaudhuri, and Satish Chandra. 2019. Neural Query Expansion for Code Search. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2019 ). ACM, New York, NY, USA, 29-37. https://doi.org/10.1145/3315508.3329975 Google Scholar
Digital Library
- Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-Driven Program Completion. CoRR abs/1705.09042 ( 2017 ). arXiv: 1705.09042 http://arxiv.org/abs/1705.09042Google Scholar
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 ( 2014 ).Google Scholar
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2574-2582.Google Scholar
Cross Ref
- Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian Sketch Learning for Program Synthesis. CoRR abs/1703.05698 ( 2017 ). arXiv: 1703.05698 http://arxiv.org/abs/1703.05698Google Scholar
- Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. 87. Springer Science & Business Media.Google Scholar
Digital Library
- Anh Nguyen, Jason Yosinski, and Jef Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 427-436.Google Scholar
Cross Ref
- Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. ACM, 506-519.Google Scholar
Digital Library
- Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372-387.Google Scholar
Cross Ref
- Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147 (Oct. 2018 ), 25 pages. https://doi.org/10.1145/3276517 Google Scholar
Digital Library
- Danish Pruthi, Bhuwan Dhingra, and Zachary C Lipton. 2019. Combating Adversarial Misspellings with Robust Word Recognition. ACL ( 2019 ).Google Scholar
- Md Rafiqul Islam Rabin, Ke Wang, and Mohammad Amin Alipour. 2019. Testing Neural Program Analyzers. ASE-Late Breaking Results ( 2019 ).Google Scholar
- Goutham Ramakrishnan, Jordan Henkel, Zi Wang, Aws Albarghouthi, Somesh Jha, and Thomas Reps. 2020. Semantic Robustness of Models of Source Code. arXiv preprint arXiv: 2002. 03043 ( 2020 ).Google Scholar
- Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '15). ACM, New York, NY, USA, 111-124. https://doi.org/10.1145/2676726.2677009 Google Scholar
Digital Library
- Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting argument selection defects. Proceedings of the ACM on Programming Languages 1, OOPSLA ( 2017 ), 104.Google Scholar
Digital Library
- Ishai Rosenberg, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2018. Generic black-box end-to-end attack against state of the art API call based malware classifiers. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 490-510.Google Scholar
Cross Ref
- Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on source code: a neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, [email protected] 2018, Philadelphia, PA, USA, June 18-22, 2018. 31-41. https://doi.org/10.1145/3211346.3211353 Google Scholar
Digital Library
- Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In 2015 10th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, 11-20.Google Scholar
Digital Library
- Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 ( 2008 ), 61-80.Google Scholar
- Andrew Scott, Johannes Bader, and Satish Chandra. 2019. Getafix: Learning to fix bugs automatically. arXiv preprint arXiv: 1902. 06111 ( 2019 ).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ).Google Scholar
- Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. 2019. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages 3, POPL ( 2019 ), 1-30.Google Scholar
Digital Library
- Octavian Suciu, Scott E Coull, and Jefrey Johns. 2019. Exploring adversarial examples in malware detection. In 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 8-14.Google Scholar
Cross Ref
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1-9.Google Scholar
Cross Ref
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 ( 2013 ).Google Scholar
- Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2019. Targeted adversarial examples for black box audio systems. In 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 15-20.Google Scholar
Cross Ref
- Eric Wallace, Mitchell Stern, and Dawn Song. 2020. Imitation Attacks and Defenses for Black-box Machine Translation Systems. arXiv preprint arXiv: 2004. 15015 ( 2020 ).Google Scholar
- Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G Ororbia II, Xinyu Xing, Xue Liu, and C Lee Giles. 2017. Adversary resistant deep neural networks with an application to malware detection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1145-1153.Google Scholar
Digital Library
- Wei Yang, Deguang Kong, Tao Xie, and Carl A Gunter. 2017. Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps. In Proceedings of the 33rd Annual Computer Security Applications Conference. 288-302.Google Scholar
Digital Library
Index Terms
Adversarial examples for models of code
Recommendations
Understanding deep learning defenses against adversarial examples through visualizations for dynamic risk assessment
AbstractIn recent years, deep neural network models have been developed in different fields, where they have brought many advances. However, they have also started to be used in tasks where risk is critical. Misdiagnosis of these models can lead to ...
Exploring Adversarial Attacks on Learning-based Localization
WiseML'23: Proceedings of the 2023 ACM Workshop on Wireless Security and Machine LearningWe investigate the robustness of a convolutional neural network (CNN) RF transmitter localization model in the face of adversarial actors which may poison or spoof sensor data to disrupt or defeat the algorithm. We train the CNN to estimate transmitter ...
Adversarial Attacks and Defenses: An Interpretation Perspective
Despite the recent advances in a wide spectrum of applications, machine learning models, especially deep neural networks, have been shown to be vulnerable to adversarial attacks. Attackers add carefully-crafted perturbations to input, where the ...






Comments