skip to main content
research-article

GA-SCS: Graph-Augmented Source Code Summarization

Authors Info & Claims
Published:21 February 2023Publication History
Skip Abstract Section

Abstract

Automatic source code summarization system aims to generate a valuable natural language description for a program, which can facilitate software development and maintenance, code categorization, and retrieval. However, previous sequence-based research did not consider the long-distance dependence and highly structured characteristics of source code simultaneously. In this article, we present a Transformer-based Graph-Augmented Source Code Summarization (GA-SCS), which can effectively incorporate inherent structural and textual features of source code to generate an effective code description. Specifically, we develop a graph-based structure feature extraction scheme leveraging abstract syntax tree and graph attention networks to mine global syntactic information. And then, to take full advantage of the lexical and syntactic information of code snippets, we extend the original attention to a syntax-informed self-attention mechanism in our encoder. In the training process, we also adopt a reinforcement learning strategy to enhance the readability and informativity of generated code summaries. We utilize the Java dataset and Python dataset to evaluate the performance of different models. Experimental results demonstrate that our GA-SCS model outperforms all competitive methods on BLEU, METEOR, ROUGE, and human evaluations.

REFERENCES

  1. [1] Ahmad Wasi, Chakraborty Saikat, Ray Baishakhi, and Chang Kai-Wei. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 49985007. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Allamanis Miltiadis, Brockschmidt Marc, and Khademi Mahmoud. 2018. Learning to represent programs with graphs. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.Google ScholarGoogle Scholar
  3. [3] Alon Uri, Brody Shaked, Levy Omer, and Yahav Eran. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.Google ScholarGoogle Scholar
  4. [4] Alon Uri, Zilberstein Meital, Levy Omer, and Yahav Eran. 2018. A general path-based representation for predicting program properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18), Foster Jeffrey S. and Grossman Dan (Eds.). ACM, 404419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Banerjee Satanjeev and Lavie Alon. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or [email protected], Goldstein Jade, Lavie Alon, Lin Chin-Yew, and Voss Clare R. (Eds.). Association for Computational Linguistics, 6572.Google ScholarGoogle Scholar
  6. [6] Barone Antonio Valerio Miceli and Sennrich Rico. 2017. A parallel corpus of python functions and documentation strings for automated code documentation and code generation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP’17), Volume 2: Short Papers, Kondrak Greg and Watanabe Taro (Eds.). Asian Federation of Natural Language Processing, 314319. https://www.aclweb.org/anthology/I17-2053/.Google ScholarGoogle Scholar
  7. [7] Eddy Brian P., Robinson Jeffrey A., Kraft Nicholas A., and Carver Jeffrey C.. 2013. Evaluating source code summarization techniques: Replication and expansion. In Proceedings of the IEEE 21st International Conference on Program Comprehension (ICPC’13). IEEE Computer Society, 1322. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Eriguchi Akiko, Hashimoto Kazuma, and Tsuruoka Yoshimasa. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. The Association for Computer Linguistics. Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Fernandes Patrick, Allamanis Miltiadis, and Brockschmidt Marc. 2019. Structured neural summarization. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.Google ScholarGoogle Scholar
  10. [10] Guo Junliang, Xu Linli, and Chen Enhong. 2020. Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Jurafsky Dan, Chai Joyce, Schluter Natalie, and Tetreault Joel R. (Eds.). Association for Computational Linguistics, 376385. Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Han Dong, Li Junhui, Li Yachao, Zhang Min, and Zhou Guodong. 2020. Explicitly modeling word translations in neural machine translation. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19, 1 (2020), 15:1–15:17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Hu Xing, Li Ge, Xia Xin, Lo David, and Jin Zhi. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension (ICPC’18), Khomh Foutse, Roy Chanchal K., and Siegmund Janet (Eds.). ACM, 200210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Hu Xing, Li Ge, Xia Xin, Lo David, Lu Shuai, and Jin Zhi. 2018. Summarizing source code with transferred API knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18), Lang Jérôme (Ed.). ijcai.org, 22692275. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Iyer Srinivasan, Konstas Ioannis, Cheung Alvin, and Zettlemoyer Luke. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. The Association for Computer Linguistics. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Bengio Yoshua and LeCun Yann (Eds.). http://arxiv.org/abs/1412.6980.Google ScholarGoogle Scholar
  16. [16] LeClair Alexander, Haque Sakib, Wu Lingfei, and McMillan Collin. 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th International Conference on Program Comprehension (ICPC’20). ACM, 184195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] LeClair Alexander, Jiang Siyuan, and McMillan Collin. 2019. A neural model for generating natural language summaries of program subroutines. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19), Atlee Joanne M., Bultan Tevfik, and Whittle Jon (Eds.). IEEE/ACM, 795806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Li Wei, Xiao Xinyan, Liu Jiachen, Wu Hua, Wang Haifeng, and Du Junping. 2020. Leveraging graph to improve abstractive multi-document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Jurafsky Dan, Chai Joyce, Schluter Natalie, and Tetreault Joel R. (Eds.). Association for Computational Linguistics, 62326243. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Li Wei, Xiao Xinyan, Lyu Yajuan, and Wang Yuanzhuo. 2018. Improving neural abstractive document summarization with explicit information selection modeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 17871796. Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Li Yachao, Jiang Jing, Yangji Jia, and Ma Ning. 2021. Finding better subwords for tibetan neural machine translation. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 2 (2021), 24:1–24:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Liang Yuding and Zhu Kenny Qili. 2018. Automatic generation of text descriptive comments for code blocks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), McIlraith Sheila A. and Weinberger Kilian Q. (Eds.). AAAI Press, 52295236.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lin Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Workshop on Text Summarization Branches Out, Post-Conference Workshop of the Association for Computational Linguistics. 7481.Google ScholarGoogle Scholar
  23. [23] Liu Shih-Hung, Chen Kuan-Yu, and Chen Berlin. 2020. Enhanced language modeling with proximity and sentence relatedness information for extractive broadcast news summarization. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19, 3 (2020), 46:1–46:19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Liu Yang and Lapata Mirella. 2019. Text summarization with pretrained encoders. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 37283738. Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] McBurney Paul W.. 2016. Improving Program Comprehension via Automatic Documentation Generation. University of Notre Dame.Google ScholarGoogle Scholar
  26. [26] McBurney Paul W., Liu Cheng, and McMillan Collin. 2016. Automated feature discovery via sentence selection and source code summarization. J. Softw. Evol. Process. 28, 2 (2016), 120145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] McBurney Paul W. and McMillan Collin. 2016. Automatic source code summarization of context for Java methods. IEEE Trans. Softw. Eng. 42, 2 (2016), 103119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Moreno Laura, Aponte Jairo, Sridhara Giriprasad, Marcus Andrian, Pollock Lori L., and Vijay-Shanker K.. 2013. Automatic generation of natural language summaries for Java classes. In Proceedings of the IEEE 21st International Conference on Program Comprehension (ICPC’13). IEEE Computer Society, 2332. Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Movshovitz-Attias Dana and Cohen William W.. 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Volume 2: Short Papers. The Association for Computer Linguistics, 3540.Google ScholarGoogle Scholar
  30. [30] Nguyen Anh Tuan and Nguyen Tien N.. 2017. Automatic categorization with deep neural network for open-source Java projects. In Proceedings of the 39th International Conference on Software Engineering (ICSE’17) Companion Volume, Uchitel Sebastián, Orso Alessandro, and Robillard Martin P. (Eds.). IEEE Computer Society, 164166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Nguyen Xuan-Phi, Joty Shafiq R., Nguyen Thanh-Tung, Wu Kui, and Aw Ai Ti. 2021. Cross-model back-translated distillation for unsupervised machine translation. In Proceedings of the 38th International Conference on Machine Learning (ICML’21), Virtual Event (Proceedings of Machine Learning Research), Meila Marina and Zhang Tong (Eds.), Vol. 139. PMLR, 80738083.Google ScholarGoogle Scholar
  32. [32] Norouzi Mohammad, Bengio Samy, Chen Zhifeng, Jaitly Navdeep, Schuster Mike, Wu Yonghui, and Schuurmans Dale. 2016. Reward augmented maximum likelihood for neural structured prediction. In Advances in Neural Information Processing Systems, Vol. 29, Lee Daniel D., Sugiyama Masashi, Luxburg Ulrike von, Guyon Isabelle, and Garnett Roman (Eds.). 17231731.Google ScholarGoogle Scholar
  33. [33] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, 311318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Rodeghero Paige, Liu Cheng, McBurney Paul W., and McMillan Collin. 2015. An eye-tracking study of Java programmers and application to source code summarization. IEEE Trans. Softw. Eng. 41, 11 (2015), 10381054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Sridhara Giriprasad, Hill Emily, Muppaneni Divya, Pollock Lori L., and Vijay-Shanker K.. 2010. Towards automatically generating summary comments for Java methods. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE’10). ACM, 4352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Sridhara Giriprasad, Pollock Lori L., and Vijay-Shanker K.. 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11), Taylor Richard N., Gall Harald C., and Medvidovic Nenad (Eds.). ACM, 101110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Srivastava Nitish, Hinton Geoffrey E., Krizhevsky Alex, Sutskever Ilya, and Salakhutdinov Ruslan. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 19291958. http://dl.acm.org/citation.cfm?id=2670313.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Sun Haipeng, Wang Rui, Utiyama Masao, Marie Benjamin, Chen Kehai, Sumita Eiichiro, and Zhao Tiejun. 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 1 (2021), 10:1–10:17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Guyon Isabelle, Luxburg Ulrike von, Bengio Samy, Wallach Hanna M., Fergus Rob, Vishwanathan S. V. N., and Garnett Roman (Eds.). 59986008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.Google ScholarGoogle Scholar
  40. [40] Verma Pradeepika, Pal Sukomal, and Om Hari. 2019. A comparative analysis on Hindi and English extractive text summarization. ACM Trans. Asian Low Resour. Lang. Inf. Process. 18, 3 (2019), 30:1–30:39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Wan Yao, Zhao Zhou, Yang Min, Xu Guandong, Ying Haochao, Wu Jian, and Yu Philip S.. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18). ACM, 397407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Wang Qiang, Li Bei, Xiao Tong, Zhu Jingbo, Li Changliang, Wong Derek F., and Chao Lidia S.. 2019. Learning deep transformer models for machine translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, Korhonen Anna, Traum David R., and Màrquez Lluís (Eds.). Association for Computational Linguistics, 18101822. Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wei Bolin, Li Ge, Xia Xin, Fu Zhiyi, and Jin Zhi. 2019. Code generation as a dual task of code summarization. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’19). 65596569. https://proceedings.neurips.cc/paper/2019/hash/e52ad5c9f751f599492b4f087ed7ecfc-Abstract.html.Google ScholarGoogle Scholar
  44. [44] Wei Bingzhen, Ren Xuancheng, Zhang Yi, Cai Xiaoyan, Su Qi, and Sun Xu. 2019. Regularizing output distribution of abstractive chinese social media text summarization for improved semantic consistency. ACM Trans. Asian Low Resour. Lang. Inf. Process. 18, 3 (2019), 31:1–31:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wei Xiangpeng, Yu Heng, Hu Yue, Zhang Yue, Weng Rongxiang, and Luo Weihua. 2020. Multiscale collaborative deep models for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Jurafsky Dan, Chai Joyce, Schluter Natalie, and Tetreault Joel R. (Eds.). Association for Computational Linguistics, 414426. Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Xi Xuefeng, Pi Zhou, and Zhou Guodong. 2020. Global encoding for long chinese text summarization. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19, 6 (2020), 84:1–84:17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Xia Xin, Bao Lingfeng, Lo David, Xing Zhenchang, Hassan Ahmed E., and Li Shanping. 2018. Measuring program comprehension: A large-scale field study with professionals. IEEE Trans. Softw. Eng. 44, 10 (2018), 951976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] You Yongjian, Jia Weijia, Liu Tianyi, and Yang Wenmian. 2019. Improving abstractive document summarization with salient information modeling. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, Korhonen Anna, Traum David R., and Màrquez Lluís (Eds.). Association for Computational Linguistics, 21322141. Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Zhang Longtu and Komachi Mamoru. 2021. Using sub-character level information for neural machine translation of logographic languages. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 2 (2021), 31:1–31:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Zhang Mengli, Zhou Gang, Yu Wanting, and Liu Wenfen. 2021. FAR-ASS: Fact-aware reinforced abstractive sentence summarization. Inf. Process. Manag. 58, 3 (2021), 102478. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GA-SCS: Graph-Augmented Source Code Summarization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 2
      February 2023
      624 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3572719
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 February 2023
      • Online AM: 4 August 2022
      • Accepted: 29 July 2022
      • Received: 30 August 2021
      Published in tallip Volume 22, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)240
      • Downloads (Last 6 weeks)16

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!