VulnHunt-GPT: a Smart Contract vulnerabilities detector based on OpenAI chatGPT

Smart contracts are self-executing programs that can run on a blockchain. Due to the fact of being immutable after their deployment on blockchain, it is crucial to ensure their correctness. For this reason, various approaches for static analysis of smart contracts have been proposed, but they may be on the one hand imprecise or on the other hand difficult to train. In this paper, we propose a novel approach for detecting smart contract vulnerabilities using OpenAI's Generative Pre-trained Transformer 3 (GPT-3) language model. Our approach, called VulntHunt-GPT, uses GPT-3 to examine Ethereum smart contracts in order to identify the most popular vulnerabilities according to OWASP. We train VulntHunt-GPT on a dataset of smart contract functions and vulnerabilities to improve its accuracy. Our experiments show that VulntHunt-GPT outperforms almost all the existing state-of-the-art approaches in detecting a variety of vulnerabilities, including reentrancy attacks, integer overflow, and uninitialized storage. In addition, we conduct a case study to demonstrate the effectiveness of VulntHunt-GPT in detecting real-world smart contract vulnerabilities. We show that VulntHunt-GPT can identify previously unknown vulnerabilities in popular smart contracts, highlighting its potential for improving smart contract security. Our approach provides a promising direction for using natural language processing techniques to improve smart contract security and reduce the risk of smart contract exploits.


INTRODUCTION
Blockchain aims at enhancing cybersecurity with a system able to increase, among others, immutability and resilience to attacks.Over the years, stakeholders proposed multiple implementations of this technology, proposing myriads of heterogeneous projects.Ethereum [20] implements this paradigm in a generalized manner, offering the possibility to run a complete program directly on the blockchain.Such programs take the name of Smart Contracts, whose number is growing in conjunction with the possible use cases.The execution of these programs requires a fee to pay, which is the commission for the computational power offered by nodes of the network.Despite some of these smart contracts seeming to be secure, a simple error in the design phase can be paid with millions of dollars [13].The problem resides in the nature of blockchain, which is by definition distributed and immutable, meaning that once the contract is deployed, the code cannot be modified anymore.
We have recently been aided by an increase in these attacks, which are not generated by a simple programming error but, in certain cases, are triggered by rogue nodes that exploit weaknesses common to the blockchain structure.Considering that smart contracts are gaining popularity in multiple sensitive contexts, such as the Internet of Medical Things (IoMT) [15], financial [14] and automotive [5]; safety analysis and remediation has evolved into a requirement that must be addressed during the design process.Smart contracts vulnerability assessment is a process that aims at detecting potential vulnerabilities and defects within the source code.To begin with this process, experts analyze the code and if some vulnerability has been found, they suggest recommendations for improving the security.
The development of security analysis tools shortened the time required to detect code flaws.Although the performance of these tools is acceptable and provides significant assistance to programmers, not all vulnerabilities can be detected by these tools in all instances.The advent of Machine Learning (ML) has created a new field that uses intelligent models to prevent threats in a system.Such models have been extended also to smart contracts vulnerability assessment and are simplifying the role of the expert.In this paper, we want to exploit GPT (Generative Pretrained Transformer) models, which have been gaining popularity over the last few months among non-experts for the creation of text and for the response to generic questions.GPT models are not limited to text creation or prediction but can be extended to all Natural Language Processing (NLP) tasks.Smart Contract vulnerability assessment can be a threat as a typical NLP problem, with the aim of processing the source code in order to detect possible vulnerabilities.Our study aims to evaluate the quality of the GPT-3 model for vulnerability detection in comparison with already deployed ML models within this context.A use case based on Etherum Smart Contracts is taken into consideration since it is the most used platform for smart contract deployment.The document is structured in five sections: • The second section discusses the other approaches used for vulnerability identification within the context of smart contracts and an overview of the possible application of Chat-GPT; • The third section discusses smart contract vulnerabilities according to OWASP Top 10 in order to highlight which will be category of vulnerabilities considered in the design phase; • In the fourth section, the proposed approach is discussed, with an explanation of all the components of the system; • In the fifth section, results are discussed by considering the number of detected vulnerabilities and the execution time; • In the last section, the conclusion, and future development are presented.

RELATED WORKS
The increasing usage of blockchain within software solutions as a means for enhancing security paved the way for new threats.Despite blockchain providing a distributed technique for storing data in a secure way, it can also be seen as drawbacks where once a smart contract has been deployed it is not possible to modify it.General approaches to typical smart contract vulnerability detection are described in this section, together with an overview of the recent and expanding Large Language Models (LLMs) applications.

Smart contract vulnerabilities detection
Over the last few years, the number of applications ranging from financial services, life sciences, and healthcare that leverage smart contracts for taking advantage of blockchain is increasing [7].Threats to and flaws developed in smart contracts have expanded along with this exponential expansion.The main methods for assessing smart contract vulnerabilities were only concerned with financial risks.Torres et al. [18] investigate the problem of honeypots: a smart contract that pretends to leak its funds to an arbitrary user.Once the funds have been leaked, only the honeypot creator will be able to retrieve them.HoneyBadger is a security tool based on symbolic analysis and pattern-based detection.In the evaluation, it has been able to identify more than 90 honeypot smart contracts as well as 240 victims in the wild, with an accumulated profit of more than $90,000 for the honeypot creators.However, the potentiality of this tool is limited to the honeypot detection.Using a relatively trivial approach is proposed in [10] with the detection of three types of bugs: prodigal, suicidal, and greedy.Maian tool processes the bytecode of smart contracts and tries to create a chain of transactions to find and confirm bugs.Such a tool, similar to that one proposed by HoneyBadger, has a reduced potential due to limited detection categories available.
Another interesting tool coming from the same authors of Hon-eyBadger is Osiris [17].It is based on Oyente, which is an integer bug detector, able to assess vulnerabilities like arithmetic problems or time manipulation.Osiris, in contrast to Oyente, has the ability to integrate symbolic execution and taint analysis, exceeding Oyente's performance in this way.
Generally, static code analysis is widely used in multiple contexts whose adoption for the analysis of smart contracts is preventing a huge number of threats.Tikhomirov et al. [16] propose SmartCheck, which performs static analysis for the smart contract by translating the code in XML-based intermediate and checks it against XPath patterns.Limitations of such a solution are related to the complexity and the performance of the framework.On the same line of research, Feist et al. [2] propose Slither, a static analysis for Ethereum smart contract: it is able to automate detection as well as code optimization opportunities.Such a tool outperforms SmartCheck changing the approach which does not leverage XML anymore but on a proprietary approach.Such solutions are examples of how static code analysis can be used for vulnerability detection but have limitations in terms of the categories of vulnerabilities detectable.Neither SmartCheck nor Slither offers support for Bad Randomness and Front Running threats, which have been highlighted by DASP to take part in the top 10 Vulnerabilities.
Mossberg et al. [8] presented an evolution of these tools with the tool Manticore, which can execute symbolic execution for the analysis of smart contracts and binaries.It can generate input as well as detect errors, extending vulnerability detection to faulty randomness vulnerabilities, but without solving the problem of front-running ones.
A really similar approach, able to cover almost all the vulnerabilities is proposed by Mythril, which combines symbolic execution, SMT solving, and taint analysis to detect a variety of security vulnerabilities.

Table 1: Considered Tools
Tool URL limited to the code analysis but can be extended also to the secure hardware generation.As investigated by Nair et al. [9], with a good prompt engineering process it is possible to make ChatGPT able to suggest techniques for implementing secure solutions.The authors also underlined possible vulnerabilities introduced by the model if the assistant behavior is not correctly programmed.
Code generation is another interesting application of ChatGPT, whose efficacy in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity has been evaluated in [12] [11].The research reveals a commendable overall success rate of 71.875%, making this tool a good tool for assisting code development and debugging.
While LLMs are constantly upgrading with new threats, statistical approaches for detecting vulnerabilities are more limited since are based on well-known patterns.The aim of current research is to investigate this lack by extending the ChatGPT applications to the context of smart contracts vulnerability assessment.

SMART CONTRACT VULNERABILITIES
Smart Contracts are usually projected for manipulating and storing found; which makes them the perfect target for cyber attacks.In favor of attackers, there are two points: they are publicly available and despite the source code can be obfuscated by publishing only bytecode, they can be anyway decompiled.Errors in the compilation can be crucial since blockchains do not permit modification after deployment.In what follows we introduce the top 10 Vulnerabilities defined by Decentralized Application Security Project (DASP) [4] in order to take them into consideration for the development of our model.

Reentrancy
Description: A reentrancy attack is a type of vulnerability that takes advantage of the way smart contracts handle external function calls.In a reentrancy attack, an attacker exploits a situation where a smart contract calls an external contract during its execution.The attacker deploys a malicious contract and interacts with the vulnerable contract to exploit the flaw.
Attack Vector: A smart contract traces the balance of a series of external address and allows them to withdraw money with a public function withdraw().A malicious smart contract

Access Control
Description: Access Control vulnerabilities are common to all applications, not only to smart contracts.Despite wrong visibility property offer to the attackers direct access to private values or functions; the access control problems are more sophisticated.The usage of tx.origin for the validation of callers handles authorization logic by demanding these requests to proxy or proxy contracts.
Attack Vector: A very popular schema for giving root privileges is to define the initialization address as the owner of the contract.Anyway, the initialization function can be called by anyone else, also after it has been previously called, enabling anyone to be the contract owner.

Arithmetic Problems
Description: Integer Overflow or Underflow is not a new threat but can be dangerous if considered within a smart contract context where unsigned integers are very popular.If an arithmetic problem happens, code flows that seem to be well-written can be attack vectors, causing a possible denial of service or found losses.
Attack Vector: A function withdraw() is responsible for giving to the users the amount of Ether that users previously deposited in the contract.An attacker may try to withdraw an amount higher than his current balance modifying the check defined by a hypothetical function withdraw(), making the check always positive.

Unchecked Low-Level Calls
Description: One of the major Solidity charachteristics are the low-level calls such as call(), callcode(), delegatecall() and send().They do not behave like the others, which return errors blocking the code to be execute, but return a boolean value set to false and the following code will be executed if proper condition is not be set.
Attack Vector: When programmers do not check the return value of call(), which is used to send Ether to a smart contract that is not able to accept them, the EVM substitute return value with false.Since this value is not checked, the modifications done by smart contract are applied and cannot be reverted.

Denial of Service (DoS)
Description: This category of vulnerability includes multiple problems: gas limit reached, not controlled exception throws, access control violations, and other problems able to cause service interruption.Within the smart contract context, in particular considering the Ethereum blockchain, differently from applications, which can be re-started or restored, the smart contract can be put offline forever.
Attack Vector: each block has an upper bound for the gas fee, namely the Block Gas limit.If such a limit is overcome, the transaction will be rejected.It is particularly dangerous if the attacker is able to modify the gas fee necessary.

Bad Randomness
Description: Randomness is difficult to obtain in Ethereum.Despite Solidity offers random functions, the return value can be in some way predictable.
Attack Vector: The smart contract leverages on a number of blocks as a seed for randomness in a game; the attacker leverages on this to win the game.

Front running
Description: Since the Ethereum blockchain is public everyone can see the content of not yet confirmed transactions; this means that if a user is revealing a solution to a complex cryptographic puzzle or other sensible information every one can see the response.
Attack Vector: A public smart contract publish an RSA number; with a call to submitSolution(), the caller is rewarded.Consider the case in which someone submits the solution but someone else sees the transaction and sends it with a higher fee causing that the second transaction will be accepted before.

Time manipulation
Description: A smart contract may use the current time at different points of its execution code.A typical function called is block.timestamp().Considering that miners can arbitrarily modify mining time, such a time should not be used for random number generation.
Attack vector: A game pays for the first player of the day.A malicious miner can set the timestamp to midnight and since the current time is really near to midnight (also if it is before), the other nodes decide to accept the block.

Short Address
Description: These attacks are a side effect of EVM acceptance of arguments containing padding.Attackers can leverage this aspect using special addresses with the aim of creating errors when clients encode arguments with Attack vector: Considering the case in which an API uses a function that accepts a 20-byte address and an amount and transfers the amount to the address passed to the function.The attacker adds 12 zero bytes to the address in order to create a 32-byte address; and an amount of 20 tokens.Then the smart contract wrongly computes the receiving address and sends money to the attacker.

DESIGN
Different from approaches proposed in the context of smart contract vulnerability assessment, VulnHunt-GPT revolves around the development of a highly effective and versatile smart contract vulnerability detector, powered by OpenAI ChatGPT.The system encompasses several key components and methodologies to achieve its objectives, as detailed below.

Model Setup
For the implementation, we adopted the model gpt-3.5-turbowhich is able to give a response about a huge amount of context.Despite this ability, the model is not able to give detailed information in relation to a specific context, including the vulnerability assessment and the way in which we want the response.For this purpose, two main activities are performed: Prompt Engineering and Context Enrichment.The first phase is performed using OpenAI API, while the second one leverages online available tools that use Index and Embeddings offered by the GPT model.

Prompt
Engineering.Prompt engineering is the process of strategically designing and refining prompts or questions to elicit desired responses from language models.It involves formulating input instructions or queries in a way that maximizes the chances of obtaining accurate and relevant outputs from the model.
In the context of language models like GPT-3.5, prompt engineering can be used to influence the generated text by providing specific instructions or context.By carefully crafting the input prompts, users can guide the model toward producing more desirable or informative responses.
In our case, the desired output is a text where a splitting between: • Vulnerabilities: the vulnerabilities detected into the smart contract; • Remediation: the action to perform for solving the detected vulnerabilities.
It is possible to engineer the prompts by using the GPT Turbo 3.5 Application Programming Interface (API).Such APIs are used to define the roles of the chat and provide three roles: System, User, and Assistant.For each role, a message that specifies the behavior of the actor can be sent.In Figure 2 the definition of the behavior that assistants have to assume for our proposal.

Context Enrichment.
As introduced at the beginning of this section, the context must be set to include information related to smart contract vulnerabilities.GPT 3.5 Turbo leverages the concept of embeddings and indexes for understanding the input and the sequence of words.More in particular embeddings refer to dense numerical representations of words or tokens within a language model.Essentially, embeddings encode words into numerical vectors that can be understood and manipulated by the model.These vectors represent words in a multi-dimensional space, such that similar words are represented by vectors that are close to each other.GPT-3.5 uses embeddings to encode input words and generate output words.On the other hand, indexes can refer to the index of a word or token within a sequence.In the context of language models, indexes are often used to refer to specific positions of words or tokens within a sentence or text.For example, when analyzing a text, you can refer to the index of a word to access specific information about it.
Two main tools are available as Python libraries are used for automatize the creation of embeddings and indexes: LangChain and LlamaIndex.LangChain offers a software development framework designed to simplify the creation of applications using large language models (LLMs); which can be used to streamline the development process by providing pre-built components, libraries, or APIs that facilitate working with LLMs.In our case, this framework can be used for the generation of embeddings.On the other hand, LlamaIndex is used for data management capabilities to efficiently organize and preprocess smart contracts for LLM applications.Once the data is processed and organized by LlamaIndex, it is possible to leverage LangChain's higher-level interface to interact with the LLM for tasks like text generation starting from smart contract source code.
In the proposed model setup phase, the set of Top 10 vulnerabilities described in the previous sections, are taken as textual input from the LlamaIndex library, which elaborates the text and returns a vector of indexes used for vulnerability detection.This approach is highly flexible and can be easily extended to newer vulnerabilities by indexing the new ones using the textual description.

Architecture
After that model has been created and deployed with the needed contextual information, an appropriate architecture for communicate with the model is deployed.The proposed architecture is depicted in Figure 3, as it is possible to see, the main component of our system is a Python script responsible for applying preprocessing to the smart contract taken as input and then interacting with OpenAI API, which recall the gpt-3.5 model for the analysis.To be more clear, the proposed architecture is composed of three main phases before obtaining the results where possible vulnerabilities are described: (1) Preprocessing.In this phase, the textual representation of smart contract code is taken as input from the GUI.Preprocessing is needed because it is possible that the user inserts additional information not relevant for the purpose of the vulnerabilities detector (e.g.I have this smart contract deployed) and must be removed.Moreover, in this phase, the text is formatted according to OpenAI API in order to avoid errors in communication.
(2) API Communication.Once the text has been correctly preprocessed, the request is created and sent to the OpenAI servers by using the available API.In this phase, a token key is used for two reasons.In the first instance, the previous model setup phase is restored, without the need for recreating the environment each time; and secondly, for guaranteeing the privacy of users and preventing access from malicious nodes.(3) Analysis.When an OpenAI server receives the request from the Python script, it automatically forwards it to the gpt-3.5 model for analysis.At this time the vulnerability analysis happens on the smart contract, and once completed, the response is given back to the user throw the GUI.
The response generated from the model follows the definition declared in the model setup phase and will be composed of both detected vulnerabilities and suggested remediations.

Running Example
To better understand the use case of the proposed system, a running example has been executed.Two smart contracts have been considered for the example: in the first one access control and bad randomness vulnerabilities were introduced, while the second one does not contain vulnerabilities.
B. Boi et al.The response related to the first smart contract is depicted in Figure 4 and as reported it detects two vulnerabilities.The first vulnerability arises from the function makeBet() which refers to a problem related to a typical bad randomness vulnerability.More in particular, the function leverages the number of blocks to determine if the current caller is a winner or not.As remediation, the tool suggests using a verifiable source of randomnesses such as Chainlink VRF, or Oraclize.The second vulnerability, instead, arises from the function destroy(), which is a typical function inserted in smart contracts for self-destroy the smart contract from the blockchain and withdraw all the funds to the owner.An access control vulnerability can happen since the check is performed on the sender address but is done in a static way by only the address without ensuring which is the current owner of the smart contract.The remediation actions are in line with the typical countermeasure, and they also suggest some practical mechanisms for preventing attacks such as OpenZepplin's Ownable contract.
Figure 5 depicts the response to the second smart contract, which was supposed to not detect any vulnerability.As expected, it does not detect any vulnerability but gives the user a suggestion for the improvement related to the storage usage required from the structure owners used in the deleteOwner() function.This last example puts in light another relevant aspect of LLM, which is able to give suggestions derived from the existing knowledge of the model.

RESULTS
To begin with the results analysis it's necessary to define which are the parameters used for the selection of the tool.SmartBugs [3] is a framework for the analysis of smart contracts; it contains 19 tools for automation and two datasets.This represents a good starting point for the comparison but we will limit the selection to only 9 tools, which have been chosen on the basis of four criteria, which leverage the same parameters of our proposal: • Availability: tools must be publicly available and must support a Command Line Interface (CLI); • Input: tools must accept Solidity smart contract, tools which require EVM bytecode are excluded; • Vulnerabilities Identification: tools described as analysis tools that are limited only to the artifact build such as graphs of flow control are excluded.
These tools are evaluated against the SmartBugs Curated dataset, which considers 69 smart contracts split into ten categories referring to the top 10 OWASP vulnerabilities.Table 2 shows the number Similarly to what is done in [1], the performance of the proposed approach will be measured by considering: • Vulnerabilities detected: the number of vulnerabilities identified using the tool.• Execution time: the time needed for the analysis of a smart contract, till the response.
In order to perform the evaluation, an iMac with a 3.33 GHz Intel Core i5 6 core processor and 16 GB 2667 MHz DDR4 RAM has been used.

Vulnerabilities Detected
Analysis of the number of vulnerabilities found is achievable after the execution of all the instruments under consideration.Since it is possible for a smart contract to contain several vulnerabilities, the number of identified vulnerabilities is greater than the number of smart contracts that have been taken into consideration.Table 2 reports the number of vulnerabilities detected in the smart contract.In the initial investigation, it can be seen that extremely specialized tools like HoneyBadger and Maian perform dreadfully in categories unrelated to their intended use, as was expected.Statistical analysis tools like Oyente, Osiris, Slither, and Smartchecker outperform specialist ones, indicating that a generalized approach can be used.In any case, none of the eight vulnerabilities related to bad randomness were discovered, proving the limitations of statistical analysis.
Symbolic analysis introduced by Manticore performs similarly to the other tools, with an increasing number of vulnerabilities detected for the group of access control.Finally, Mythril can be considered as the most complete tool, able to detect almost all the vulnerabilities, since it combines symbolic analysis with SMT and tain analysis.
The proposed approach provides a thorough analysis across all categories, making it the most comprehensive tool.Despite the fact that it cannot detect a large number of vulnerabilities, such as Mythril, it can detect at least one from each category, unlike all of the tools analyzed.

Execution Time
The execution time has been considered as the total and averaged execution time over all the smart contracts contained in the Smart-Bugs Curated dataset.Table 4 reports these two metrics for all the considered tools.HoneyBadger and more in particular Main have a relatively high execution time which can be caused by the approach used by these tools: the first one is interested only in finding honeypots using pattern-based detection; while the second one spends a lot of time for the creation of the entire trace of transactions.Manticore is the worst tool in terms of execution time due to the symbolic execution Statistical-based tools like Slither and Smartcheck have reduced execution time tanks to the strategies put in place, which is related to the translation of code and analysis using static tools.Oyente, Osiris, and Honeybadger have good detection times but their detection rate is really reduced if compared with the others due to the limitations described in the previous subsection.
Differently from the other tools within this context, VulnHunt-GPT execution time depends on OpenAI servers, which are able to detect vulnerabilities in a really short time with respect to others.

CONCLUSION
The usage of LLMs is not limited to the content generation and to the Machine Learning and, in particular, LLMs are increasing in popularity within the context of cybersecurity.The possibility of analyzing and interpreting code snippets, together with knowledge representation capability, makes this model the most suitable for such tasks.Moreover, they are always more user-friendly, making them available to a huge audience.The efficiency of such models highly depends on the training phase, so the, more recent the models will be the more performant the predictions will be.
Conventional methods of evaluating the vulnerability of smart contracts suffer from a lack of completeness in detection, as every instrument may identify every threat listed in the OWASP Top 10.From this perspective, the suggested method performs better than the others since it finds at least one vulnerability across all ten categories.Despite this benefit, it's crucial to underline that, in order to get better outcomes, GPT-based models should be viewed as a component of a more comprehensive testing set.They can assist in identifying vulnerabilities, but they cannot offer a complete fix for the issue.Future developments in this research will focus on assessing the existing methodology against more recent LLM models, such as Bard AI, and the evaluation of false positives detected by the models.

Figure 2 :
Figure 2: Definition of system and user roles

Figure 4 :
Figure 4: Bad Randomness and Access Control detection.

Table 3 :
Vulnerability detection taxonomy in considered tools