Abstract
Automatic source code summarization system aims to generate a valuable natural language description for a program, which can facilitate software development and maintenance, code categorization, and retrieval. However, previous sequence-based research did not consider the long-distance dependence and highly structured characteristics of source code simultaneously. In this article, we present a Transformer-based Graph-Augmented Source Code Summarization (GA-SCS), which can effectively incorporate inherent structural and textual features of source code to generate an effective code description. Specifically, we develop a graph-based structure feature extraction scheme leveraging abstract syntax tree and graph attention networks to mine global syntactic information. And then, to take full advantage of the lexical and syntactic information of code snippets, we extend the original attention to a syntax-informed self-attention mechanism in our encoder. In the training process, we also adopt a reinforcement learning strategy to enhance the readability and informativity of generated code summaries. We utilize the Java dataset and Python dataset to evaluate the performance of different models. Experimental results demonstrate that our GA-SCS model outperforms all competitive methods on BLEU, METEOR, ROUGE, and human evaluations.
- [1] . 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4998–5007. Google Scholar
Cross Ref
- [2] . 2018. Learning to represent programs with graphs. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net.Google Scholar
- [3] . 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.Google Scholar
- [4] . 2018. A general path-based representation for predicting program properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18), and (Eds.). ACM, 404–419. Google Scholar
Digital Library
- [5] . 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or [email protected], , , , and (Eds.). Association for Computational Linguistics, 65–72.Google Scholar
- [6] . 2017. A parallel corpus of python functions and documentation strings for automated code documentation and code generation. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP’17), Volume 2: Short Papers, and (Eds.). Asian Federation of Natural Language Processing, 314–319. https://www.aclweb.org/anthology/I17-2053/.Google Scholar
- [7] . 2013. Evaluating source code summarization techniques: Replication and expansion. In Proceedings of the IEEE 21st International Conference on Program Comprehension (ICPC’13). IEEE Computer Society, 13–22. Google Scholar
Cross Ref
- [8] . 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. The Association for Computer Linguistics. Google Scholar
Cross Ref
- [9] . 2019. Structured neural summarization. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). OpenReview.net.Google Scholar
- [10] . 2020. Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), , , , and (Eds.). Association for Computational Linguistics, 376–385. Google Scholar
Cross Ref
- [11] . 2020. Explicitly modeling word translations in neural machine translation. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19, 1 (2020), 15:1–15:17. Google Scholar
Digital Library
- [12] . 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension (ICPC’18), , , and (Eds.). ACM, 200–210. Google Scholar
Digital Library
- [13] . 2018. Summarizing source code with transferred API knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18), (Ed.). ijcai.org, 2269–2275. Google Scholar
Cross Ref
- [14] . 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. The Association for Computer Linguistics. Google Scholar
Cross Ref
- [15] . 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), and (Eds.). http://arxiv.org/abs/1412.6980.Google Scholar
- [16] . 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th International Conference on Program Comprehension (ICPC’20). ACM, 184–195. Google Scholar
Digital Library
- [17] . 2019. A neural model for generating natural language summaries of program subroutines. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19), , , and (Eds.). IEEE/ACM, 795–806. Google Scholar
Digital Library
- [18] . 2020. Leveraging graph to improve abstractive multi-document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), , , , and (Eds.). Association for Computational Linguistics, 6232–6243. Google Scholar
Cross Ref
- [19] . 2018. Improving neural abstractive document summarization with explicit information selection modeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1787–1796. Google Scholar
Cross Ref
- [20] . 2021. Finding better subwords for tibetan neural machine translation. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 2 (2021), 24:1–24:11. Google Scholar
Digital Library
- [21] . 2018. Automatic generation of text descriptive comments for code blocks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), and (Eds.). AAAI Press, 5229–5236.Google Scholar
Cross Ref
- [22] . 2004. ROUGE: A package for automatic evaluation of summaries. In Workshop on Text Summarization Branches Out, Post-Conference Workshop of the Association for Computational Linguistics. 74–81.Google Scholar
- [23] . 2020. Enhanced language modeling with proximity and sentence relatedness information for extractive broadcast news summarization. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19, 3 (2020), 46:1–46:19. Google Scholar
Digital Library
- [24] . 2019. Text summarization with pretrained encoders. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 3728–3738. Google Scholar
Cross Ref
- [25] . 2016. Improving Program Comprehension via Automatic Documentation Generation. University of Notre Dame.Google Scholar
- [26] . 2016. Automated feature discovery via sentence selection and source code summarization. J. Softw. Evol. Process. 28, 2 (2016), 120–145. Google Scholar
Digital Library
- [27] . 2016. Automatic source code summarization of context for Java methods. IEEE Trans. Softw. Eng. 42, 2 (2016), 103–119. Google Scholar
Digital Library
- [28] . 2013. Automatic generation of natural language summaries for Java classes. In Proceedings of the IEEE 21st International Conference on Program Comprehension (ICPC’13). IEEE Computer Society, 23–32. Google Scholar
Cross Ref
- [29] . 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Volume 2: Short Papers. The Association for Computer Linguistics, 35–40.Google Scholar
- [30] . 2017. Automatic categorization with deep neural network for open-source Java projects. In Proceedings of the 39th International Conference on Software Engineering (ICSE’17) Companion Volume, , , and (Eds.). IEEE Computer Society, 164–166. Google Scholar
Digital Library
- [31] . 2021. Cross-model back-translated distillation for unsupervised machine translation. In Proceedings of the 38th International Conference on Machine Learning (ICML’21), Virtual Event (Proceedings of Machine Learning Research), and (Eds.), Vol. 139. PMLR, 8073–8083.Google Scholar
- [32] . 2016. Reward augmented maximum likelihood for neural structured prediction. In Advances in Neural Information Processing Systems, Vol. 29, , , , , and (Eds.). 1723–1731.Google Scholar
- [33] . 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, 311–318. Google Scholar
Digital Library
- [34] . 2015. An eye-tracking study of Java programmers and application to source code summarization. IEEE Trans. Softw. Eng. 41, 11 (2015), 1038–1054. Google Scholar
Digital Library
- [35] . 2010. Towards automatically generating summary comments for Java methods. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE’10). ACM, 43–52. Google Scholar
Digital Library
- [36] . 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11), , , and (Eds.). ACM, 101–110. Google Scholar
Digital Library
- [37] . 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929–1958. http://dl.acm.org/citation.cfm?id=2670313.Google Scholar
Digital Library
- [38] . 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 1 (2021), 10:1–10:17. Google Scholar
Digital Library
- [39] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, , , , , , , and (Eds.). 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.Google Scholar
- [40] . 2019. A comparative analysis on Hindi and English extractive text summarization. ACM Trans. Asian Low Resour. Lang. Inf. Process. 18, 3 (2019), 30:1–30:39. Google Scholar
Digital Library
- [41] . 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18). ACM, 397–407. Google Scholar
Digital Library
- [42] . 2019. Learning deep transformer models for machine translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, , , and (Eds.). Association for Computational Linguistics, 1810–1822. Google Scholar
Cross Ref
- [43] . 2019. Code generation as a dual task of code summarization. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS’19). 6559–6569. https://proceedings.neurips.cc/paper/2019/hash/e52ad5c9f751f599492b4f087ed7ecfc-Abstract.html.Google Scholar
- [44] . 2019. Regularizing output distribution of abstractive chinese social media text summarization for improved semantic consistency. ACM Trans. Asian Low Resour. Lang. Inf. Process. 18, 3 (2019), 31:1–31:15. Google Scholar
Digital Library
- [45] . 2020. Multiscale collaborative deep models for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), , , , and (Eds.). Association for Computational Linguistics, 414–426. Google Scholar
Cross Ref
- [46] . 2020. Global encoding for long chinese text summarization. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19, 6 (2020), 84:1–84:17. Google Scholar
Digital Library
- [47] . 2018. Measuring program comprehension: A large-scale field study with professionals. IEEE Trans. Softw. Eng. 44, 10 (2018), 951–976. Google Scholar
Digital Library
- [48] . 2019. Improving abstractive document summarization with salient information modeling. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Volume 1: Long Papers, , , and (Eds.). Association for Computational Linguistics, 2132–2141. Google Scholar
Cross Ref
- [49] . 2021. Using sub-character level information for neural machine translation of logographic languages. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 2 (2021), 31:1–31:15. Google Scholar
Digital Library
- [50] . 2021. FAR-ASS: Fact-aware reinforced abstractive sentence summarization. Inf. Process. Manag. 58, 3 (2021), 102478. Google Scholar
Digital Library
Index Terms
GA-SCS: Graph-Augmented Source Code Summarization
Recommendations
Retrieval-based neural source code summarization
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringSource code summarization aims to automatically generate concise summaries of source code in natural language texts, in order to help developers better understand and maintain source code. Traditional work generates a source code summary by utilizing ...
Automatic source code summarization with graph attention networks
AbstractSource code summarization aims to generate concise descriptions for code snippets in a natural language, thereby facilitates program comprehension and software maintenance. In this paper, we propose a novel approach–GSCS–to ...
Highlights- Highlight the importance of the structural information in code snippets.
- Design ...
Improving topic model source code summarization
ICPC 2014: Proceedings of the 22nd International Conference on Program ComprehensionIn this paper, we present an emerging source code summarization technique that uses topic modeling to select keywords and topics as summaries for source code. Our approach organizes the topics in source code into a hierarchy, with more general topics ...






Comments