Poirot: Deep Learning for API Misuse Detection

API misuses refer to incorrect usages that violate the usage constraints of API elements, potentially leading to issues such as runtime errors, exceptions, program crashes, and security vulnerabilities. Existing mining-based approaches for API misuse detection face challenges in accuracy, particularly in distinguishing infrequent from invalid usage. This limitation stems from the necessity to set predefined thresholds for frequent API usage patterns, resulting in potential misclassification of alternative usages. This paper introduces Poirot, a learning-based approach that mitigates the need for predefined thresholds. Leveraging Labeled, Graph-based Convolutional Networks, Poirot learns embeddings for API usages, capturing key features and enhancing API misuse detection. Preliminary evaluation on an API misuse benchmark demonstrates that Poirot achieves a relative improvement of 1.37--10.36X in F-score compared to state-of-the-art API misuse detection techniques.


INTRODUCTION
Software libraries serve as a valuable mechanism for software reuse, providing functionality via Application Programming Interfaces (APIs).A common challenge in library-based programming is API misuse, where incorrect usages violate API constraints, leading to various issues, e.g., run-time errors, program crashes, null-pointer exceptions, and security vulnerabilities [6].To address API misuses, researchers have proposed API misuse detectors that analyze code snippets for potential misuses.API misuse detection approaches can be broadly classified into two categories.The first category mines API specifications from documentation and checks code against them [4,9].However, these approaches face limitations due to insufficient specifications in API documentation, hindering users' understanding and correct usage.The second category relies on pattern mining, where frequent API usage patterns are mined from a large code corpus and considered correct.Given code is checked for deviations from these patterns to identify potential misuses.However, these mining-based detectors often struggle to differentiate infrequent but valid usage from invalid usage [1].This limitation arises from the need of pre-defined thresholds for frequent usage patterns, posing an inherent challenge in the mining approaches.
This paper introduces Poirot, a learning-based API misuse detection approach.Poirot leverages key insights to overcome the issue inherent in the mining-based methods.Firstly, we capitalize on the principle of regularity in API usages [5], recognizing that API elements are not randomly placed in source code.The intended usage patterns designed by library developers exhibit a certain regularity in large code corpora.This regularity provides a foundation for a deep learning (DL) model to effectively identify API misuses.Secondly, we use a graph representation, Augmented Usage Graph (AUG) [6], to depict the dependencies among API elements and other program elements within an API usage.Subsequently, we utilize the Labeled, Graph-based Convolutional Network (Label-GCN) [2] to model the API usage by encoding the graph.

API MISUSE DETECTION MODEL 2.1 API Usage Graphs
Let us first present the API usage graph representation (AUG) [6].
Definition 1 (API elements).An API element is either a class, a method, or a field that is provided in a library to enable the accesses to the library's functions via a variable declaration with a certain class, a method call to an API method, or a field access to an API field.
Definition 2 (API Usage).An API usage consists of a set of API elements and control structures (i.e., conditions and repetitions), together with other program elements (e.g., variables, parameters, etc.) in specific combinations and orders to perform a programming task.
We adapted the API usage graph from MuDetect [6], which is illustrated in Figure 1  For example, String, FileInputStream, and FileNotFoundException represent the API classes and types.The nodes FileInputStream.init,InputStream.read,and System.out.println represent the API method calls.Second, the edges among the nodes represent the data or control dependencies.For example, an edge marked with para from String to FileInputStream.initmeans that a string is used as a parameter for a constructor call to FileInputStream.There is another edge from FileInputStream to InputStream.readrepresenting that a FileInput-Stream variable is a receiving object for the call to InputStream.read.

Graph-based Convolutional Network for API Misuse Detection
To begin, we construct the API usage graph (AUG) as previously outlined.Then, we generate feature vector representations for all nodes in the AUG.The nodes, denoted as   , are replaced by their corresponding vector representations, labeled as   .This augmented graph serves as input to the third module, a Labeled Graph-based Convolutional Network (Label-GCN) [2], designed to learn the inherent graph structure among nodes.The Label-GCN model processes the representation vectors for all nodes, generating outputs at the output layer.These outputs are then connected to a fully connected layer, transforming the matrix into a vector  that serves as the contextualized embedding representing the given API usage.The next step involves classification, where Poirot utilizes a softmax function on the contextualized embedding  to decide the presence of a misuse in the given usage.During training, the model leverages labels associated with API misuses and benign usages.
Notably, our model possesses the capability to identify misused API elements.The output nodes of the Label-GCN model signify decisions regarding misuses for corresponding API elements.For training purposes, misused API elements in each method of the training data are used as labels.During prediction, a softmax function is applied to each output node of Label-GCN, classifying the node (i.e., the API element) as either a misuse or not.

EMPIRICAL EVALUATION
We conducted an experiment to evaluate Poirot in MuBench+, a real-world, benchmark of API misuses [6].The comparison results for API misuse detection at the method level are presented in Table 1.Poirot demonstrates improvement over all baselines across all metrics.Notably, for Precision, it exhibits improvements of 11.7x, 3.4x, 3.7x, 4.0x, and 1.38x over GrouMiner, Jadet, Tikanga, DMMC, and MuDetect, respectively.In terms of Recall, Poirot shows enhancements of 9.16x, 4.23x, 3.74x, 2.65x, and 1.36x over those baselines, respectively.The F-score relative improvements are 10.36x, 3.86x, 3.72x, 3.32x, and 1.37x.
In an evaluation with an unbalanced dataset (9 to 1 ratio between correct usages and misuses), using all 5,317 correct usages and 591 randomly selected misuses, Poirot achieves a Precision of 38.6%, Recall of 16.7%, and an F-score of 23.3%.Thus, Poirot attains higher Precision but lower Recall compared to the balanced dataset case.
In conclusion, we present Poirot, a ML approach that mitigates the need for predefined thresholds in mining-based API misuse detection approaches.Leveraging Labeled GCN, it learns embeddings for API usages, capturing key features to enhance the detection.
. First, the rectangle nodes represent the API 302 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Figure 1 :
Figure 1: An API Usage and Corresponding API-Usage Graph

Table 1 :
Comparison on Method-level API Misuse Detection