StructCoder: Structure-Aware Transformer for Code Generation

There has been a recent surge of interest in automating software engineering tasks using deep learning. This paper addresses the problem of code generation, where the goal is to generate target code given source code in a different language or a natural language description. Most state-of-the-art deep learning models for code generation use training strategies primarily designed for natural language. However, understanding and generating code requires a more rigorous comprehension of the code syntax and semantics. With this motivation, we develop an encoder-decoder Transformer model where both the encoder and decoder are explicitly trained to recognize the syntax and data flow in the source and target codes, respectively. We not only make the encoder structure-aware by leveraging the source code's syntax tree and data flow graph, but we also support the decoder in preserving the syntax and data flow of the target code by introducing two novel auxiliary tasks: AST (Abstract Syntax Tree) paths prediction and data flow prediction. To the best of our knowledge, this is the first work to introduce a structure-aware Transformer decoder that models both syntax and data flow to enhance the quality of generated code. The proposed StructCoder model achieves state-of-the-art performance on code translation and text-to-code generation tasks in the CodeXGLUE benchmark, and improves over baselines of similar size on the APPS code generation benchmark. Our code is publicly available at https://github.com/reddy-lab-code-research/StructCoder/.


INTRODUCTION
Code generation is the problem of generating code in a specified target language given source code that is either imperfect or in a different language, or generating code from a natural language description.In this paper, we consider the problem of generating target code given source code in a different language (code translation) or a natural language description (text-to-code generation).Code translation has applications in migrating legacy codebases to contemporary programming languages and porting existing software to various other platforms [1,27,35].Text-to-code generation models can potentially increase programmers' productivity by simplifying and speeding up the software development process, as developers often write code to solve a problem or implement logic that is stated in natural language [1].Transformer-based deep learning methods have recently gathered significant attention in this domain.However, these existing models do not effectively utilize the code structure, especially during the decoding of target code.To address this limitation, we propose StructCoder which models the syntax and data flow in both source and target codes with a structure-aware encoder and decoder.
Traditional code translation tools have been designed using hand-crafted rules based on the Abstract Syntax Tree (AST) [27].One such popular tool is Babel 1 which converts modern JavaScript code to older versions for backward compatibility.Other notable source-to-source translators (2) We pretrain StructCoder using a structure-based DAE objective where the input code as well as its AST and DFG are partially corrupted and the model is trained to generate the original input code and also perform the auxiliary tasks.(3) Our experiments demonstrate that the proposed model achieves state-of-the-art performance on the code translation and text-to-code generation tasks in the CodeXGLUE [18] benchmark, and outperforms similarly sized baselines on the APPS code generation benchmark.
The subsequent sections of this paper are organized as follows.Section 2 discusses existing methods for modeling code structure and developing pretrained transformers for code.Section 3 provides a detailed description of our proposed methodology.In Section 4, we present experimental results, comparing our model against the baselines on code translation and text-to-code generation datasets.We also conduct an ablation study and discuss more aspects StructCoder's performance.Finally, Section 5 concludes the paper.

Leveraging Structure to Generate Code
To leverage code structure in deep models, many approaches have utilized ASTs.Some approaches modeled code completion as a language modeling task by ordering the code tokens using a depthfirst traversal of AST.Li et al. [15] used an LSTM appended with parent-child attention while Alon et al. [2] encoded each root-to-leaf path with an LSTM.Kim et al. [13] used the Transformer to encode the sequenced AST by encoding AST paths into self-attention.For text-to-code generation, Rabinovich et al. [23] proposed a modular decoder to recursively generate target AST.Brockschmidt et al. [3], Sun et al. [28], Yin and Neubig [34] construct ASTs by generating production rules based on a grammar.Jiang et al. [11] proposed an LSTM decoder equipped with AST enhanced attention, to generate a sequence of production rules by attending to previously generated rules and one future rule.To go beyond the standard preorder traversal for AST node generation, Jiang et al. [12] used a Reinforcement Learning framework for dynamically selecting the branch to expand at an intermediate AST node, and Xie et al. [32] used two separate models for preorder and breadthfirst traversals that are jointly trained via mutual distillation.Unlike these methods, we keep the conventional Transformer decoder architecture intact and introduce auxiliary structure-related components on top of the decoder's final hidden representations, so that StructCoder is trained to preserve target code structure while not requiring the generation of such structures (AST/DFG) during inference.Building on top of the conventional Transformer architectures not only allows us to utilize existing pretrained models for better initialization but also makes the advances in the area of Transformers more easily applicable to our model.

Pretrained Transformers for Code
The recent state-of-the-art results on most natural language generation tasks are obtained by pretraining huge deep learning models on large datasets with carefully designed pretext tasks.Since code generation is very similar to text generation and there is abundant unsupervised code data available through open source code repositories, pretraining code generation models using similar pretext tasks has been successful.Most recent state-of-the-art pretrained models for code utilize the Transformer [29] architecture and are discussed below.
CodeBERT [5] performs encoder-only pretraining using Masked Language Modeling and Replaced Token Detection as pretext tasks on the CodeSearchNet dataset.Transcoder [27] is an unsupervised translation model which pretrains both encoder and decoder using Denoising Autoencoding and Back-Translation with only monolingual datasets.PLBART [1] is pretrained with DAE objective using 680M Java and Python functions.DOBF [14] attempts to understand code  structure with a deobfuscation pretext task where every occurrence of a sampled identifier is replaced by an uninformative token.Code Transformer [36] modifies the attention computations in the encoder according to AST-based distances.CodeT5 [30] pretrains a T5 model [25] with code data in 8 programming languages.In contrast to PLBART, which treats code data as plain sequences, CodeT5 includes identifier-aware objectives in the training, which helps maintain the correctness of the code.However, CodeT5 does not include any structural information of the code in training.Zhu et al. [35] improve code translation performance by introducing a fine-grained snippet-level translation task during pretraining.GraphCodeBERT [7] utilizes code structure in the form of Data Flow Graph (DFG) which contains semantic information as opposed to the syntatic information in AST.However, the decoder is completely unaware of the code structure in all of the above methods.
Our model advances the domain of code generation by being the first one to train a structure-aware Transformer encoder and decoder by modeling both syntax and data flow.A summary of the pretext tasks and code structures used by the above Transformer-based methods along with our approach is provided in Table 1.

STRUCTCODER
StructCoder is a Transformer based encoder-decoder model where both encoder and decoder are structure-aware.We build our model using T5 architecture and add the relevant components for modeling code structure.For code inputs, the encoder (refer to Section 3.2) inputs the tokenized source code sequence along with its AST and DFG and employs structure-aware self-attention.The structure-aware decoder (refer to Section 3.3) simultaneously learns to generate the target code sequence as well as to perform target AST and DFG related tasks.The notations used to describe our methodology in this section are summarized in Table 2.

Preliminaries
A Code can be a function or a program, and is represented as a sequence of tokens  = (  Each node  ∈  has a type denoted by ..We use the tree-sitter library 5 to parse codes and generate syntax trees according to a context-free grammar for each programming language. A code  also has a corresponding DFG represented as G = ( , ,     ), where  = { 1 ,  2 , ...,  | | } is the set of variables extracted from code , and  ∈ {0, 1} | | × | | is the adjacency matrix where    = 1 if and only if value of   is directly obtained from   , and and only if variable   is derived from token   .For extracting DFG, we utilize the implementation 6 of Ren et al. [26] where the tree-sitter generated AST is traversed to recursively identify the variables and data flow relations between them using a language-specific deterministic function.
The goal of code translation is to transform a code  = ( 1 , ...,  | | ) in a source language to code  = ( 1 , ...,  | | ) in a different target language such that the translated code  is semantically equivalent to the input code .In text-to-code generation, the goal is to generate target code  from a natural language description.

Enc oder wi t h St r uc t ur e-Awar e Sel f -At t ent i on
AST l eaf -l eaf si mi l ar i t i es ( I ndi c at es t he s i mi l ar i t y of pat hs f r om t he r oot t o a pai r of l eav es .) DFG adj acency mat r i x ( I ndi c at es dat a f l ow bet ween v ar i abl es .) Code-DFG Li nki ng Mat r i x ( Connec t s i dent i f i er c ode t ok ens t o appr opr i at e v ar i abl es i n t he DFG. ) Code-AST Li nki ng Mat r i x ( Connec t s c ode t ok ens t o appr opr i at e l eav es i n t he AST. )

Dat a Fl ow Gr aph ( DFG) Abst r act Synt ax Tr ee ( AST)
Fig. 1.Structure-aware encoder: The input sequence to the encoder consists of source code concatenated with the AST leaves and DFG variables, where the AST leaves are embedded using the root-leaf paths in the AST.
The modified structure-aware self-attention mechanism of this Transformer encoder utilizes code-AST/DFG linking information, leaf-leaf similarities in the AST, and the (asymmetric) DFG adjacency matrix to compute the attention matrix.

Structure-Aware Encoder
Given source code , its corresponding AST T , and DFG G, the input sequence to the encoder is which consists of the code tokens, special tokens ⟨⟩ and ⟨⟩, AST leaves, and DFG variables.For text input, the leaves and variables are simply ignored in the input.The encoder architecture is illustrated in Fig. 1 and is described in detail below.

Input Embedding.
As StructCoder consists of a Transformer encoder, each token in the input sequence has to be embedded in R  .We embed the code tokens along with special tokens by using a lookup table, and use a unique embedding for all DFG variables.The DFG information will be used by the encoder in structure-aware self-attention.We compute the embedding of a leaf  in an AST as a function of the path from the root to the leaf  in the AST.Let ( 1 ,  2 , ...,  | | ) be the nodes on the path from root  =  1 to leaf  =  | | .We utilize nodetype embedding    (•) ∈ R  to encode a node's syntax along with a node-height embedding  ℎℎ (•) ∈ R  to encode the order of nodes on this path.The leaf embedding  () is computed as where ⊙ denotes element-wise multiplication.self-attention which computes attention scores between tokens based on the structural relations between them.Code-code: Following T5, we compute attention scores (before softmax) between code tokens by adding the query-key dot product with weights   ,  ∈ R   × and a lookup embedding  : Z ≥0 −→ R for relative position.Denoting embedding of  by   , we have Leaf-leaf: To calculate attention scores between leaves, we introduce a similarity-based transformation to replace the relative positional embedding in equation 2. Let (  1 , ...,   |  | ) be the nodes on the path from root to leaf   .We define similarity between two leaves   and   as which is based on the number of common nodes on the paths from root to leaves  1 and  2 .The  transformation is used to reduce the skewness of the distribution of similarity values.The attention scores between leaves are then computed as follows.
where   ,   ∈ R are learnable parameters.Variable-variable: Following Guo et al. [7], the attention scores between DFG nodes are computed using only the query-key dot product.They are set to −∞ if corresponding edges are absent in the DFG.
Code-leaf/variable: For interaction between code tokens and AST leaves (or DFG variables), we only compute the query-key dot product and do not use any positional information.Inspired by the work of Guo et al. [7], we set the attention score to −∞ for cases where the leaf (or variable) is not linked to the code token.We show the equations only for interactions between code tokens and leaves as those for interactions between code tokens and variables are similar.

Structure-Aware Decoder
The decoder in StructCoder constitutes the original T5 decoder with additional layers at the end for AST paths prediction and data flow prediction tasks that are introduced in this section.Fig. 2 illustrates the structure-aware decoder which predicts the next target code token along with the AST root-leaf path to this token and the data flow relations between this token and all past tokens.The addition of these auxiliary tasks does not increase the number of generated tokens, which is important since the decoding is done in an autoregressive manner.Let ℎ 1 , ℎ 2 , ..., ℎ | | be the hidden states generated by the Transformer decoder.Decoders of existing transformer models including T5 employ a linear layer with weights  ∈ R | V | × followed by softmax transformation to extract a probability distribution   on the token vocabulary space V  for the  ℎ position.
And the sequence generation task is trained using language modeling loss as shown below for one sample.
where   (  ) refers to the predicted probability for true target token   at the  ℎ position.
In addition to sequence generation, StructCoder also learns target syntax using AST paths prediction task, and learns to match target DFG using a data flow prediction task.

AST Paths Prediction (APP).
In this task, the goal is to encourage the decoder to be aware of all root-leaf paths in the target AST.Since the type attribute of a node captures important syntactic information, we predict the type of each ancestor on each root-leaf path.
Let    be the leaf node containing the  ℎ target token   and let (  1 , ...,   |   | ) be the nodes on the root-   path.To predict type of node    (which is at height |   | −  in the tree), we use a linear layer with weights   ( |   | − ) ∈ R | Y | × followed by a softmax transformation to predict a probability distribution on the set of node types Y.
The APP cross-entropy loss for a sample is given by   ′ ,   ′ such that variable   ′ is derived from   , variable   ′ is derived from   , and value of variable   ′ is derived from   ′ ".Thus, the DFP loss for a sample can be written as where The overall loss function for training StructCoder (given below) is a combination of the language modeling objective, and the APP and DFP losses with weights  1 and  2 , i.e.L = L  +  1 L  +  2 L    .

Pretraining
We pretrain StructCoder on the CodeSearchNet [9] dataset7 containing about 2M code and comment pairs, with a structure-based DAE task along with NL-PL bimodal dual generation to generate code from text and vice-versa.For the denoising task, we corrupt random spans in the code sequence by replacing them with ⟨⟩ or a random token or deleting them.The span lengths are sampled from a Poisson distribution with a mean of 12 tokens.We corrupt 35% of the code tokens in total, similar to [1].To improve the understanding of code structure, we also randomly drop 35% of the DFG variables and AST leaves, and 35% of the ancestors for each leaf from the input to StructCoder.The model is then trained to predict the uncorrupted code along with the AST root-leaf paths and data flow edges.We initialize our model for pertaining with CodeT5's weights (for faster pretraining) except for the AST and DFG related weights, which are randomly initialized.

Implementation Details
We implement StructCoder by extending the CodeT5-base architecture containing 12 T5 blocks with hidden dimension 768, and 12 attention heads in each block.StructCoder comprises a total of 224M trainable parameters, while CodeT5-base contains 223M.We employ the AdamW [17] optimizer with a learning rate of 2e-4 for pretraining and 1e-5 for finetuning.We ran the pretraining for 175K batches with a batch size of 20 code-comment pairs.For finetuning, we used batch sizes of 25, 32, and 20 for CodeXGLUE translation, CONCODE, and APPS datasets, respectively.The fine-tuning was run for 50K, 300K, and 40K batches for the three datasets, respectively.The loss weights of auxiliary tasks,  1 and  2 , are both set to 0.1.To facilitate minibatch training with available resources, we set the maximum number of DFG variables in the input to 65, the maximum number of AST leaves to 250, and the maximum root-leaf path length to 17 (by trimming paths from the root's side).We set the maximum source length (no. of code/text tokens) to 400 for pretraining, 320 for translation, 320 and 600 for text-to-code generation on CONCODE and APPS.We set the maximum target length to 400 for pretraining, 256 for translation, 150 and 512 for text-to-code generation on CONCODE and APPS, respectively.We implement our model using the PyTorch [21] and Hugging Face [31] libraries.Additional implementation and experimental setup details are provided in the Appendix.

EXPERIMENTS
We evaluate StructCoder on the code translation and text-to-code generation tasks from the CodeXGLUE8 [18] benchmark, and on the text-to-code generation task from the APPS benchmark [8], and compare with previously published results on these tasks. 9For CodeXGLUE tasks, we use the metrics from the CodeXGLUE leaderboard which include (i) BLEU [20] score which measures n-gram overlap, (ii) exact match (xMatch) which checks if the prediction is the same as ground truth, and (iii) CodeBLEU [26] which combines BLEU score with keywords-based weighted n-gram match as well as syntax and semantic matches based on AST and DFG.APPS evaluates generated codes based on test cases where the evaluation metrics include (i) 'test case average' which is the average percentage of test cases passed, and (ii) 'strict accuracy' which is the percentage of generated codes that pass all test cases.

Code Translation
The code translation dataset from CodeXGLUE consists of two tasks for translating between Java and C# functions in either direction and contains 10K training samples, 500 validation samples, and 1000 test samples.Table 3 presents the results of StructCoder alongside the baselines on the two code translation tasks.The Naive Copy baseline simply copies source code to target, and the Transformer model does not include any pretraining.RoBERTa (code) [18], CodeBERT, and GraphCodeBERT involve encoder-only pretraining while PLBART and CodeT5 incorportate encoder-decoder pretraining like StructCoder.StructCoder achieves the best results on the two translation tasks which can be attributed to the structure-aware encoder-decoder design of our model.From Table 3, we observe that the encoder-decoder pretraining of PLBART, CodeT5, and StructCoder is very beneficial to code translation.Also, the encoder-only pretrained models improve over Transformer by a huge margin.GraphCodeBERT which utilizes DFG offers minor improvements over CodeBERT and we also observed in our ablation study that DFG-related components contribute less to the performance gains of StructCoder compared to AST-related components.

Text-to-Code Generation
The text-to-code generation task from CodeXGLUE uses the CONCODE [10] dataset and the goal here is to generate a Java function given a natural language description.This dataset contains 100K training samples, 2K validation samples, and 2K test samples.Table 4 presents the results of our model alongside the baselines on the text-to-code generation task.Among the baselines, GPT-2 [24] is pretrained on natural language to predict next token, CodeGPT [18] is pretrained from scratch like GPT-2 but using code data, CodeGPT-adapted [18] is pretrained from GPT-2 initialization using code data, and CoTexT [22] pretrains the T5 model further on code data using MSP objective.The decoder-only baselines which include GPT-2 based models are outperformed by the rest which are all encoder-decoder models.StructCoder again achieves the best performance on all metrics for the text-to-code generation task.APPS [8] is a text-to-code generation benchmark in python which evaluates generated codes based on test cases.The inputs here contain detailed questions and possibly some starter code as well.The dataset contains 10K problems equally divided into train and test splits.The test set contains 1K introductory level, 3K interview level, and 1k competition level problems.Table 5 shows the results of StructCoder, CodeT5, and GPT-2 [8] models of two sizes.These GPT-2 models were pretrained exclusively on python code from GitHub which gives them an edge in this particular task.The 'strict accuracy' metric is more important than the 'test case average' as it does not give partial credit to a generated code that does not pass all test cases.StructCoder achieves the best 'strict accuracy' on all subsets, notably outperforming the bigger GPT-2 model which is about 7 times the size of StructCoder.Data Flow Prediction task in the decoder (DFG(dec)); (iv) enabling AST in encoder (AST(enc)); (v) enabling AST Paths Prediction task in the decoder (AST (dec)); (vi) enabling all proposed structure-based components/tasks; and (vii) adding structure-based DAE pretraining to (vi).We report the CodeBLEU metric along with its different components for each of these models in Table 6.Among the different components of the CodeBLEU metric, weighted BLEU gives more weight to programming language keywords, AST match computes the percentage of subtrees in the ground truth target AST that occur in the generated code, and DFG match computes the percentage of DFG edges in the ground truth that occur in the generated code.

Model Analysis
Enabling each of the four [(ii)-(v)] structure-based components individually results in an increase in AST match and data flow match metrics over the baseline [(i)] in most of the cases.The DFG components in the model [(ii),(iii)], however, do not seem to always increase BLEU and weighted BLEU scores.Among the four components [(ii)-(v)], enabling AST paths prediction task [(v)] yields the best BLEU and weighted BLEU, and modeling AST in the input [(iv)] yields the best AST match.Enabling all the components [(vi)] gives the best results on AST match, data flow match, and overall CodeBLEU scores.We also observed that structure-based DAE pretraining [(vii)] led to significant performance gains on both tasks.Table 6.CodeBLEU and its different components on Java-C# and C#-Java translation for the validation sets by adding the proposed structure-based components to a smaller T5 model.('enc' and 'dec' indicate whether the proposed structure-based components/tasks were included in the encoder and decoder, respectively.AST stands for Abstract Syntax tree, DF for Data Flow, and 'wBLEU' for weighted BLEU.) 4.3.2Auxiliary Tasks.We measure the performance of StructCoder on the auxiliary tasks of APP (AST Paths Prediction) and DFP (Data Flow Prediction) as follows.When predicting the next target token, we use the ground truth for target sequence until the previous step as input to the decoder.The decoder then predicts the next token as well as the DFG edges incident on this token and the types of nodes on the path from root to the leaf node containing this token in the AST.On Java-C# translation, StructCoder achieves 94% accuracy on APP task and 94.7% average precision on DFP task where positive class prevalence is just 0.8%.On C#-Java translation, StructCoder achieves 96.3% accuracy on APP task and 82.9% average precision on DFP task where positive class prevalence is just 0.5%.For both the translation tasks, there are 298 classes for node type in APP task.

4.3.3
Case Study.Fig. 3 shows an example from Java-C# translation task with predictions from StructCoder and the best baseline CodeT5.We observe that our structure-aware encoder-decoder architecture is able to generate better target code than CodeT5.Referring to Fig. 3, CodeT5 generates both the 'for' loops with variable 'i', leaving variable 'c' undefined.It also misses the first 'if' statement and creates a syntax error from unbalanced braces.CodeT5 also translates the type of argument 'remap' as an integer instead of an integer array.On the other hand, StructCoder generates the 'for' loops by defining variable 'c' and the model predicts (with a probability greater  than the 97 ℎ percentile) most of the DFG edges incident on the variable 'c' inside these 'for' loops and also in the first 'if' statement.The only error in StructCoder's output is the treatment of '@in.cells' as an array of 'Cell' objects instead of a Dictionary with Values of type 'Cell'.Such errors motivate the design of better models that align the variables and functions between source and target for code translation.Also, for token '[]' in args, StructCoder correctly predicts the parent node type 'array rank specifier'.More examples are included in the Appendix.

Inference Time.
To analyze the impact of adding the proposed structure-based components on the overall computational complexity experimentally, we measured the inference times on CodeXGLUE translation tasks by including/excluding the different proposed structure-based components.We report the results by running inference on a GPU (NVIDIA Tesla P40 with 12288 MiB memory) for 200 samples from the test set using the maximum batch size that can fit on the GPU with a beam size of 10.The batch sizes used are 6 when AST is included in encoder, 8 when DFG but not AST is included in encoder, 10 when only the code tokens are fed to the encoder.We run decoding till maximum target length is reached so that the model's decoded sequence lengths do not impact the inference times.We did not include preprocessing (tokenization, AST, and DFG construction) time while measuring the inference time because preprocessing took negligible time compared to forward pass. 10 Fig. 4 shows the average inference time per sample and average input length per batch for model versions including and excluding AST and DFG related components in the encoder.Since the decoder's structure-based components are inactive during inference, they do not impact inference time, and hence are not considered here.Note that excluding both AST and DFG is equivalent to CodeT5.Adding both AST and DFG to encoder increased the inference time by 28%-29% compared to using no structures in the encoder, while the input sequence length increased by 75%-83%.(The increase in inference time being much less than expected may be due to efficient matrix manipulations on GPUs and implementation-specific details of pytorch/huggingface which are out of scope for this work.)In our implementation, we compute the full squared attention matrix with diagonal size being equal to the total input length (number of code tokens + number of DFG variables + number of AST leaves), and then mask the attention scores that we want to be zero.But the attention between code tokens and DFG variables / AST leaves, and among the DFG variables is sparse, which motivates more efficient implementations of our method.

Performance on Code Summarization
While the primary focus of this work is on code generation, we have also tested the performance of StructCoder on three languages in the CodeXGLUE summarization benchmark, which is a code-to-text generation task.The results are shown in Table 7. StructCoder outperforms CodeT5 by a substantial margin in the case of Go but not in the case of the other languages.  1For all the 200 samples combined, tokenization took 0.27s (0.30s), and AST and DFG construction took 0.62s (0.88s) for codes in Java (C#).

CONCLUDING DISCUSSION
This work proposes a structure-aware Transformer encoder-decoder model called StructCoder for code generation.Our encoder modifies traditional input embeddings and employs a structure-aware self attention mechanism to model AST and DFG relations in source code, and the decoder is trained to recognize target syntax and data flow using two novel auxiliary tasks to predict the node types on all root-leaf AST paths and data flow edges of target code.We also pretrained our model using a structure-based DAE task to improve its performance.Experiments on code translation and textto-code generation tasks demonstrate the performance gains of StructCoder over state-of-the-art baselines.We believe that this work would encourage future research in this field to give careful consideration to code structure while building models for code generation.While automated code generation holds the potential to benefit software development and migration, it comes with inherent risks.The models cannot consider constraints like security, efficiency, and modularization when generating code which makes their deployment and maintenance challenging.Also, the performance improvements in code generation models largely rely on the scaling-up of both the model and the training, which requires significant computational resources.Thus, future research in this area can look into designing more efficient models, and models that generate code conforming to certain preset standards.

Fig. 2 .
Fig.2.Structure-aware decoder generates the next token in the target code as well as predicts the node types on the root-leaf path to the leaf containing this token in the target AST and also the DFG edges incident on this token.

4. 3 . 1
Ablation Study.To emphasize the importance of the novel structure-based components introduced in this work, we conducted an ablation study on the two code translation tasks from CodeXGLUE.For this experiment, we used a smaller T5 architecture with hidden dimension 256, 5 encoder and decoder layers, and 8 heads in each multi-head attention layer.The ablated models we tested here include the smaller T5 model (i) without any of the proposed structure-based components (No structure (baseline)); (ii) enabling DFG in the encoder (DFG(enc)); (iii) enabling , Vol. 1, No. 1, Article .Publication date: February 2024.

Fig. 3 .
Fig. 3. Case study: An example from Java-C# translation task comparing the outputs from StructCoder and CodeT5.StructCoder only makes one error by assuming that 'cells' is an array of 'Cell' objects instead of dictionary with values of type 'Cell'.CodeT5, however, misses the first 'if' statement, produces unbalanced '}', and does not define variabe 'c'.The blue arrows in StructCoder output show the correctly predicted (probability > 97 ℎ percentile) data flow edges incident on variable 'c'.)

Fig. 4 .
Fig. 4. (a) Inference time (in seconds) per sample averaged over 200 samples, and (b) average input length per batch for the 200 samples in the CodeXGLUE translation tasks for model versions including/excluding AST/DFG related components in the encoder.Since the decoder's structure-based components are not active during inference, we did not consider them in this plot.

Table 2 .
Notations used in this paper.
Predicted probability of each token in vocabulary V at  ℎ position    ∈ [0, 1] |Y| Predicted probability of each node type for the  ℎ AST node on the root-   path.      ∈ [0, 1] Predicted probability of data flow from   to   .   ∈ {0, 1} Ground truth data flow indicator from   to   L  Language modeling loss L  AST Paths Prediction (APP) loss L    Data Flow Prediction (DFP) loss  1 ,  2 DFP loss weight, APP loss weight )3.3.2DataFlow Prediction (DFP).In this task, the decoder learns to predict all the data flow edges in target code.The probability =  (ℎ            ℎ  +      ℎ  +      ℎ  +     ) (11) where  (.) denotes the sigmoid function.Suppose G = ( , , ) is the true target DFG.There is a data flow from  ℎ to  ℎ position in target sequence if and only if "target DFG contains variables , Vol. 1, No. 1, Article .Publication date: February 2024.

Table 3 .
Results on code translation tasks from CodeXGLUE benchmark.(*Since CodeT5 is a competitive baseline and did not report CodeBLEU in their paper, we tested this model using their finetuned checkpoint and provided the results.)

Table 4 .
Results on text-to-code generation task from CodeXGLUE benchmark.

Table 5 .
[8]ults on the APPS dataset along with model size in #billion parameters.The results for GPT-2 models were obtained from[8].