Neural Exception Handling Recommender

Practical code reuse often leads to the incorporation of code fragments from developer forums into applications. However, these fragments, being incomplete, frequently lack details on exception handling. Integrating exception handling into a codebase is not a straightforward task, requiring developers to understand and remember which API methods may trigger exceptions and which exceptions should be handled. To address that, we introduce EHBLOCK, a learning-based exception handling recommender for Java code snippets. EHBLOCK analyzes a given code snippet and suggests whether a try-catch block is necessary. It employs a Relational Graph Convolutional Network (R-GCN) to learn exception handling from complete code. R-GCN considers program dependencies in the surrounding context, allowing EHBLOCK to learn the identities of APIs and their relations with corresponding exception types that need to be handled. Our empirical evaluation shows that EHBLOCK achieves a 12.3% improvement in F-score compared to the state-of-the-art approach in determining the need of try-catch blocks.


INTRODUCTION
Online forums, such as StackOverflow (S/O), play a crucial role in helping developers learn software libraries.While the code snippets provided in S/O answers serve as valuable starting points, they often lack completeness, containing missing details and ambiguous references.Zhang et al. [8] conducted an empirical study on the manual adaptations of S/O code snippets by developers into their Github repositories.One common adaptation task involves adding a try-catch block to wrap the code snippet and listing the handled exceptions in a catch clause, tasks not automated by existing tools.
Among various automated approaches to detect exception handling, the most advanced category follows information retrieval (IR).XRank [3] recommends a ranked list of API method calls in the code potentially involved in exceptions within a catch-try block.XHand [3] suggests the exception handling code in a catch block for a given code.Both use a fuzzy set technique to compute the associations between API calls (e.g., newBufferedReader) and exceptions (e.g., IOException).Despite the success of IR approaches, pre-defining a threshold for feature matching in the retrieval of an exception type or an API element is non-trivial.The effectiveness of IR techniques depends on the correct value of such a pre-defined threshold.Additionally, they compute only the association scores between an API element in the code snippet and the potential exception type(s), discarding the surrounding context of other API elements.Moreover, they rely on the lexical values of code tokens and API elements, whose names can be ambiguous in incomplete code snippets.
This paper introduces EHBlock, a learning-based exception handling recommender that assesses a given Java code snippet and recommends whether a try-catch block is needed for the snippet.Motivated by studies reporting frequent repetition of exception handling for API elements across different projects [3], we leverage the Relational Graph Convolutional Network (R-GCN) [5] to represent the program dependence graph (PDG) and capture control and data dependencies among API elements in the context.To evaluate EHBlock, we have conducted an experiment, using a dataset of 5,726 Github projects with 19,379 code snippets containing try-catch blocks.Empirical results demonstrate that EHBlock relatively improves F-score by 12.3% over the state-of-the-art approach, XRank [3], in try-catch block necessity checking.ICSE-Companion '24, April 14-20, 2024, Lisbon, Portugal Yi Li, Tien N. Nguyen, Yuchen Cai, Aashish Yadavally, Abhishek Mishra, and Genesis Montejo EHBlock leverages the complete code from the training corpus as context, providing parsable code and identities of API elements.

Code Features
For code feature vectors, we aim to capture lexical and structural features for a statement, while the Program Dependence Graph (PDG) captures program dependencies among statements.At the lexical level, a statement is represented as a sequence of sub-tokens, tokenized using CamelCase or Hungarian convention.Only variables, methods, fields, and class names are retained, with one-character sub-tokens removed for noise reduction.Word embedding [4] and Gate Recurrent Unit [1] are used to build feature vectors for subtokens.At the syntactic level, we aim to capture code structure via the Abstract Syntax Tree (AST).EHBlock parses the code, extracts the AST subtree for the given statement, and feeds it to the Tree-LSTM model [6] for a feature vector capturing the structure of the statement.If the code is incomplete, PPA [2], a partial program analysis tool, is used to produce the AST in a best-effort fashion.

Exception Handling Prediction
To detect if a code snippet requires a try-catch block, we employ the R-GCN model [5].The code is processed by DeepPDA [7], parsing any (in)complete code to build the PDG.The R-GCN, akin to CNN in image processing, uses a sliding window along graph nodes.A window for a node includes its neighboring nodes in the PDG.To process a window, the model generates a feature representation matrix for the central node.From the representation vectors for all statements (nodes), the R-GCN model produces outputs at the output layer.These outputs connect to a fully connected layer, transforming the matrix into a vector   representing the given code .EHBlock performs classification by applying a softmax function to   to determine if a try-catch block is needed for .

PRELIMINARY EMPIRICAL EVALUATION
We conducted an experiment to evaluate EHBlock.We collected a dataset from Github comprising 5,726 Java projects with the highest ratings, using libraries such as jodatime, JDK, Android, xtream, GWT, and Hibernate.This dataset encompasses 19,379 code snippets, each containing at least a try-catch block as positive samples, and an equal number of randomly selected code snippets without any of such block as negative samples.
Baselines.We compared EHBlock against XRank (a component of the FuzzyCatch tool [3]).XRank computes an exception risk score for each API call, considering it as requiring a try-catch block if its score surpasses a threshold.
Procedure.Utilizing the Github dataset, we randomly partitioned both the positive and negative sets into 80%, 10%, and 10% for training, tuning, and testing, respectively.
Results.As depicted in Table 1, EHBlock demonstrates high performance.With a Precision of 68%, it accurately determines the need for a try-catch block in 2 out of 3 cases.Achieving a Recall of 79%, EHBlock identifies 4 out of 5 cases requiring a try-catch block.Decisions regarding the necessity of a try-catch block or the exception types rely on pre-defined thresholds in XRank for association scores.These thresholds might not be universally suitable across libraries, especially for incomplete code snippets with similar API method names in different packages or libraries.Moreover, XRank, utilizing an IR approach, struggles to distinguish the exactly-matched names from different libraries due to the use of single entries in the dictionary for such cases (e.g., toString or getText are the popular API method names in various JDK packages).

CONCLUSION
In conclusion, EHBlock is the first neural-network model to automated exception handling recommendation.It is designed to capture the basic insights to overcome key limitations of the stateof-the-art IR approaches.With the learning-based approach, it does not rely on a pre-defined threshold for explicit feature matching.Our empirical evaluation shows that EHBlock improves 12.3% in F-score compared to the state-of-the-art approach in determining the need of try-catch blocks.
Instead of deterministically deriving the exceptions to be handled in a code snippet, EHBlock employs a deep learning model (DL) to analyze the snippet and determine the necessity of a try-catch block.By learning from try-catch blocks in complete code from opensource projects during training, our DL model predicts whether a try-catch block is needed.In contrast to XRank [3], which focuses on learning associations between API elements and exception types, 316 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Table 1 :
Try-Catch Block Prediction Comparison