Smart Contract and DeFi Security Tools: Do They Meet the Needs of Practitioners?

The growth of the decentralized finance (DeFi) ecosystem built on blockchain technology and smart contracts has led to an increased demand for secure and reliable smart contract development. However, attacks targeting smart contracts are increasing, causing an estimated $6.45 billion in financial losses. Researchers have proposed various automated security tools to detect vulnerabilities, but their real-world impact remains uncertain. In this paper, we aim to shed light on the effectiveness of automated security tools in identifying vulnerabilities that can lead to high-profile attacks, and their overall usage within the industry. Our comprehensive study encompasses an evaluation of five SoTA automated security tools, an analysis of 127 high-impact real-world attacks resulting in $2.3 billion in losses, and a survey of 49 developers and auditors working in leading DeFi protocols. Our findings reveal a stark reality: the tools could have prevented a mere 8% of the attacks in our dataset, amounting to $149 million out of the $2.3 billion in losses. Notably, all preventable attacks were related to reentrancy vulnerabilities. Furthermore, practitioners distinguish logic-related bugs and protocol layer vulnerabilities as significant threats that are not adequately addressed by existing security tools. Our results emphasize the need to develop specialized tools catering to the distinct demands and expectations of developers and auditors. Further, our study highlights the necessity for continuous advancements in security tools to effectively tackle the ever-evolving challenges confronting the DeFi ecosystem.


INTRODUCTION
The emergence of Ethereum and blockchains with smart contract capabilities led to the development of decentralized applications (dapps), opening up new possibilities for innovation.The Decentralized Finance (DeFi) ecosystem, which is built on these technologies, has experienced significant growth since 2020, with the total value locked (TVL) reaching an all-time high of 180 billion USD on December 2021 [2].Unfortunately, this massive amount of value locked in DeFi has also made them an attractive attack target.Despite the efforts to write secure dapps, attackers have successfully exploited vulnerable smart contracts causing losses of 6.45 billion dollars [2], underscoring the need for effective security measures.
Although there has been significant research and focus on smart contract security, it remains unclear how effective automated security tools are against real-world exploits, what impact these tools have on the industry, and how they are utilized in developing and auditing smart contracts.In this paper, we aim to answer the following research questions.
RQ1: Which vulnerability types can be detected by automated security tools?How frequently do these vulnerabilities occur in real-world attacks?What is the severity level of the vulnerabilities that could have been detected by automated security tools in real-world attacks?Finally, what types of vulnerabilities cannot be detected by current automated security tools?(Section 4.1) RQ2: To what extent can security tools be used to prevent real-world high-profile attacks?What is the effectiveness of automated security tools against each vulnerability category?Which high-profile attacks could have been potentially avoided by using semi-automated security tools that require user input?(Section 4.1) RQ3: What is the landscape of security tools used by developers and auditors?To what extent do developers prefer opensource tools?How prevalent are academic tools in practice?What percentage of practitioners use semi-automated tools that can prevent specific vulnerability types that are out-of-scope for automated security tools?How much time do auditors typically spend using security tools during audits?(Section 4.2) RQ4: What are the key characteristics of security tools that are prioritized by auditors and developers?Do practitioners weigh the trade-off between false positives and false negatives when selecting security tools, and how?Additionally, are ease of use, documentation, and report quality important factors when selecting security tools for both developers and auditors?(Section 4.3) RQ5: How effectively do security tools address various classes of errors according to auditors and developers?Specifically, which types of errors are inadequately covered by current security tools?Additionally, what is the perception of auditors regarding the usefulness of security tools?(Section 4.3) Methodology.To address RQ1-RQ2, we conducted an extensive empirical evaluation of five automated security tools using a dataset of 127 high-impact real-world attacks.In Section 3.1, we describe the dataset, the selection criteria we followed for the tools, and our benchmarking process.To answer RQ3-RQ5, we conducted surveys with 49 developers and auditors working in top DeFi protocols.Our methodology for performing the surveys is presented in Section 3.2.
Findings.Through our extensive analysis, we have obtained the following findings regarding the current state of security tools' effectiveness and usage in the industry.
RQ1: Our empirical analysis revealed that the selected automated security tools can identify 14 different types of vulnerabilities.Among the attacks in our dataset, a total of 32 out of 127 exploits were associated with vulnerabilities in these 14 categories.These 32 vulnerabilities resulted in a total damage of approximately 271.5 million USD.Notably, the top two types of vulnerabilities in the attack dataset involve concepts such as coding logic or sanity checks or on-chain oracle manipulation, which in turn cannot be detected by current automated security tools.
RQ2: The evaluation indicates that automated security tools could have potentially prevented 11 out of 32 in-scope attacks, resulting in a total loss of 149 million USD.However, security tools tend to generate numerous insignificant reports, leading to a potentially overwhelming number of false positives.All detected vulnerabilities were related to reentrancy, highlighting the effectiveness of security tools against this type of vulnerability but also the inefficiency against other types.Furthermore, our analysis indicates that existing security tools neglect protocol layer vulnerabilities.Interestingly, semi-automated tools could potentially prevent 47 attacks involving code logic absence, sanity checks, and logic errors.
RQ3: Our survey results show that developers tend to use lightweight tools that can be easily integrated into the development life cycle, such as linters, while auditors use more sophisticated tools with greater bug-finding capabilities (e.g., static analyzers).The majority of developers (92%) prefer open-source tools, while over half of the participants reported using in-house security tools.We found that academic tools used in research evaluations and benchmark studies are not commonly used in practice.Furthermore, about 59% of developers and 48% of auditors use tools that can detect logic-related bugs, which are often the root cause of high-impact attacks.The majority of auditors (76%) reported using security tools for up to 20% of their audit time.
RQ4: The results of our survey indicate that developers prefer security tools with low false negative rates, while auditors prefer tools with low false positive rates since they are responsible for triaging reports.In addition, auditors place a greater emphasis on the tool's setup process and its bug-finding capabilities, while developers prioritize tools that can be easily integrated into their development workflows.Both auditors and developers consider ease of use, documentation, and report quality to be important factors when selecting security tools.
RQ5: Our findings reveal that both developers and auditors consider logic-related bugs and oracle manipulation vulnerabilities as significant threats that are inadequately addressed by security tools.They express the need for better support for these types of vulnerabilities.While the over half (52.4%) of auditors find security tools helpful for auditing, a notable proportion (38.1%) do not find them useful, highlighting the need for further improvement in the development and use of security tools for auditing purposes.
Availability: All the data and analysis from this study are accessible at https://github.com/StefanosChaliasos/sc-defi-security/.

BACKGROUND
The literature on the evaluation of automated security tools for smart contracts has been primarily focused on assessing their effectiveness by constructing various benchmarks (see Figure 1).Ferreira et al. [18] developed Smartbugs, an extendable evaluation framework that facilitates the integration and comparison between multiple security tools that analyze EVM and Solidity smart contracts.In [14], the authors employed 9 automated analysis tools using two datasets; one consisting of 47K contracts for consistency evaluation, and the other one, 69 annotated vulnerable contracts for precision evaluation.Ren et al. [42] proposed a comprehensive 4-step evaluation process for minimizing bias in the assessment of automated tools.
Contrary to previous work, our study aims to evaluate the realworld impact of automated security tools.Perez and Livshits [37] surveyed 23K smart contracts reported as vulnerable in 6 academic papers and found that only 1.98% of them had been exploited since deployment, highlighting a potentially high number of false positives in existing techniques.In contrast, we focus on assessing automated tools' false negatives and gaining a deeper understanding of their limitations.Zhang et al. [60] performed a systematic investigation of 462 defects reported in CodeArena audits and 54 exploits to study the extent to which existing tools could detect them.Our work takes a different approach by actually running the tools against exploits and reporting both cases where the tools have false negatives and cases where the tools lacked appropriate oracles.Wan et al. [49] surveyed 156 practitioners to understand their perceptions and practices on smart contract security.Our study on the other hand, focuses on surveying dapp developers and auditors to investigate how they use smart contract security tools.
In contrast to previous studies, this paper presents a mixedmethods investigation into the effectiveness and usage of security tools.The aim is to provide a comprehensive overview of the current status and offer valuable insights for researchers and practitioners to advance the state-of-the-art in smart contract and DeFi security.

METHODOLOGY
We provide an overview of the methods we employed to evaluate the capability of current security tools to find real-world vulnerabilities and understand practitioners' experience when using such tools.Specifically, we describe the dataset containing real-world exploits, the tool selection criteria, and the benchmarking process.Further, we focus on the design of the surveys, participant demographics, and how we analyzed the results.

Empirical Evaluation on Attacks
Dataset.We use the dataset of DeFi attacks presented by Zhou et al. [64] as a basis for our analysis.The dataset includes a comprehensive analysis and classification of 181 real-world, high-impact DeFi attacks.Attack details involve underlying vulnerabilities in smart contracts, corresponding exploits, and monetary losses.The vulnerabilities are categorized into five layers including Network, Consensus, Smart Contract, DeFi Protocol, and Auxiliary Service.Our work focuses on the Smart Contract and the DeFi Protocol layers, because these are typically the layers where developer errors occur and security tools focus their analyses.Hence, we filtered out all vulnerabilities related to other layers.This resulted in a dataset of 127 attacks.Figure 2 presents the vulnerability types as reported

Vulnerability
Layer # Tools Solhint [39] Slither [16] Mythril [11] ConFuzzius [47] Oyente [33] Absence in [64] while Figure 3 depicts the total impact of the corresponding attacks.Additionally, we downloaded the source code1 and bytecode of the smart contracts that were attack targets.We chose this dataset because it reflects the real-world attacks that have occurred in the smart contract and DeFi ecosystem.While other related works [14,42] have employed datasets of known vulnerable contracts or contracts with induced vulnerabilities, we believe that our selection of real-world attacks provides a more representative sample of the types of vulnerabilities smart contract developers and auditors should be aware of because they have led to major losses in deployed protocols.Furthermore, the contracts in the dataset have greater complexity than minimal examples, making reasoning about them more challenging.
Tools Selection.To select the security tools for our study, we first conducted an advanced keyword search on Google Scholar2 and followed references to identify additional tools.We also searched for security tools in GitHub repositories.The above process resulted in 75 tools.
Next, we applied a number of criteria to narrow down our selection.Specifically, we focused on (1) the availability of source code (51 tools), (2) maintenance (14 tools)3 , (3) ability to run automatically without input (7 tools), ( 4) popularity among practitioners (e.g., prioritize tools with more GitHub stars and survey results), and (5) repeated use in academic papers (i.e., higher reference count and usage in evaluations/comparisons).We also included at least one tool based on the following techniques: linting, static analysis, fuzzing, and symbolic execution.Notably, focusing on tools that are based on different analyses methods is an important dimension of our study.
Figure 2 depicts the vulnerabilities that each selected tool can identify.Note that the tools cannot detect every programming error related to a vulnerability type.For example, in the case of "Other Inconsistent, improper or unprotected access control", Slither can only detect some of the bugs that can lead to this defect type.In the supplementary material, we provide a comprehensive overview

Vulnerable Contract
Maps identified vulnerabilities to the dataset [64] entries

DeFi attacks dataset [64]
Figure 4: Evaluating the effectiveness of security tools.
of the tool selection process and a detailed mapping between tool vulnerabilities and the vulnerability categories of Zhou et al. [64].
Benchmarking. Figure 4 summarizes our benchmarking approach.To obtain results from the selected tools we utilized the SmartBugs framework [18] (see also Section 2). 5 Next, we manually tracked all vulnerability types that each tool could detect and mapped them to the vulnerabilities of Figure 2, i.e., the vulnerabilities coming from the dataset.We used a post-processing script to integrate this information with the output of the SmartBugs framework and fed the data into an SQLite database for further analysis.Adding support for more tools is straightforward, as it only requires including the tool in SmartBugs and provide a CSV file that describes the mapping of the tool's detected vulnerabilities to our toolchain.
After retrieving all results, we performed various sanity checks to verify that the results were consistent.In the case of a true positive, two authors independently examined if the result is correct.Solhint identified a number of defects of the following type: "Function/State Visibility Error", in five different exploited contracts.However, all cases were false alarms.Finally, we did not try to manually verify the rest of the results (i.e., potential false positives), and we argue that most of the reports should have been either false positives or vulnerabilities that cannot be exploited in practice, as the contracts in question had millions of USD in TVL, and hence attackers would have had high motivation to attack them.

Surveys
Protocol.To better understand how developers and auditors perceive and use security tools, we conducted a survey campaign.To do so, we followed Kitchenham and Pfleeger's guidelines [28] (also used in similar studies [10,49]).Further, we employed best practices [44] to boost practitioner participation.Our survey was anonymous and we made all questions optional.In addition, we added an "other" option where possible to increase response rates.Questions were divided into three categories: (1) Demographics to understand the respondents' background.
(2) Familiarity/usage of security tools during development and auditing to assess if practitioners are well acquainted with security tools and how they use them.(3) Experience with security tools to understand how satisfied practitioners are and how these tools can be improved.
To fine-tune our campaign, we performed the following steps.Two authors independently designed two slightly distinct surveys, one for developers and one for auditors.Then, they converged on the questions that should be included in the first versions of the surveys.Moving forward, the first round of the surveys took place with a set of  = 3 per survey where we asked the respondents to provide feedback.After that first iteration, we adjusted the questions and performed the same with  = 5 per survey.We used the feedback and responses to fine-tune the multiple-choice questions.
Respondent selection and demographics.Our aim was to focus our surveys on practitioners with experience working on protocols with high TVL, which in turn, are the main targets of attackers.Instead of focusing on getting as many responses as possible, we focused on obtaining high-quality responses.Although this strategy might bias our results, it was essential to focus on developers of top protocols and auditors who assess such protocols to understand the direct impact of security tools.
To recruit respondents, we first contacted developers from the top 200 DeFi protocols, as reported by Defillama [2]. 6For auditors, we looked at the auditing companies with the most audit reports for the top 200 protocols and contacted auditors from those companies.We also contacted the top 100 auditors from Code4Arena [1], as independent auditors are also involved in auditing high-profile projects.We received a total of 49 responses: for the developers survey, out of the 266 messages/emails sent, we successfully received 27 responses, resulting in a response rate of 10%.Similarly, for the auditor survey, we received responses from 22 out of the 132 messages sent, corresponding to a response rate of 16%. Figure 5, presents an overview of the demographics of our survey participants.
Data analysis.We analyzed the results based on question types.For multiple-choice and Likert-scale questions, we reported respondent percentages per option.For open-ended questions, we followed an inductive approach in which two authors separately performed open card sorting and regularly discussed emerging themes until an agreement was reached.In the rest of this work, we report percentages given the total responses to each question.

RESULTS
In this section, we present the findings of our mixed-method investigation aimed at addressing our research questions.

Effectiveness and Impact of Security Tools on Real-World Exploits
Recently, automated security tools for detecting vulnerabilities in smart contracts have received increased attention.Previous studies [14,37,42] have evaluated their effectiveness by measuring recall and precision on datasets containing contracts sourced from  blockchains (e.g., Ethereum) or manually crafted vulnerable contracts.Additionally, Zhang et al. [60] surveyed automated tools to determine their ability to detect various vulnerability categories.However, a key question that remains unanswered is the real-world impact of these tools, particularly in preventing significant exploits.
To address this question, we conducted a comprehensive analysis of vulnerabilities in DeFi protocols that have led to significant exploits and assessed the effectiveness of automated security tools in preventing these exploits.Additionally, we quantified the potential funds that could have been saved by utilizing these tools.
Automated tools scope.Figure 2 illustrates the scope of the selected security tools.We find that the automated security tools have oracles for the vulnerabilities that lead to the exploit for only 25% of the 127 attacks studied.These attacks cause a total of 271 M USD in monetary losses, amounting to 12% of the total damage incurred by attacks in the dataset (c.f. Figure 3).Notably, the automated security tools do not have oracles for detecting certain critical vulnerabilities, such as absence of code logic or sanity checks and oracle manipulation.Conversely, the tools tend to focus on vulnerabilities that do not appear to be frequently targeted by adversaries in highprofile attacks, such as integer overflows and underflows, as well as unhandled or mishandled exceptions (see Figure 2).Tool effectiveness on real-world vulnerabilities.Out of 32 attacks that automated security tools can reason about the underlying vulnerabilities, only 11 of them could have been detected and potentially prevented if the tools were used (see Figure 6). 7 Figure 7 depicts the results of the tools.Slither detects the most vulnerabilities, but it also reports many false positives (FP).This can be detrimental to the usability of security tools as the number of reports that cannot lead to exploits may overwhelm users.Furthermore, our evaluation indicates that all tools detect vulnerabilities that were not utilized to exploit the assessed contracts, with static analysis and linting tools reporting a greater number of potential false alarms in comparison to other methods.
Detecting different vulnerability types.Notably, all of the 11 aforementioned attacks were caused by reentrancy vulnerabilities, suggesting that the focus on reentrancy by academic researchers [15,30,40,55] has led to the development of effective tools for this category.Despite the effectiveness of these tools in detecting reentrancy vulnerabilities, there are still major issues.Of the five selected tools, only three were able to detect at least one vulnerability that led to a significant exploit.Additionally, 10 of the vulnerabilities could only be detected by Slither.Automated security tools (see Figure 2) are unable to detect "Absence of coding logic or Sanity check" and "Logic errors."Thus, it is crucial to determine how many attacks could have been prevented by tools capable of detecting such errors, such as property-based fuzzers, formal verification, and model-checking tools.Notably, such tools could have potentially prevented 37% (47/127) of the exploits in the dataset, amounting to 1,116,118,649 USD in damage.When combining these tools with automated security tools, the total number of (potentially) preventable exploits in the dataset rises to 75, accounting for 59% of the attacks and 1,359,921,690 USD (58%) of the total damage.Our results complement those of Zhang et al. [60], who found that 79.5% of real-world bugs cannot be detected by automated tools alone.However, their research did not consider the effectiveness of semi-automated tools.Zhang et al. [60] also observed that logical errors often have generalized abstract models, indicating that human involvement could be crucial in constructing testing oracles.This finding is consistent with our preliminary 7 Given that the tools were available in the time of the attack.
findings.We leave it to future work to evaluate the practical effectiveness of semi-automated tools that can detect logic-related bugs and to assess the difficulty of writing specifications/properties for smart contracts that have been exploited.
Potential preventable losses.Our analysis shows that the total funds that could have been saved if selected tools were employed amount to 149,792,690 USD, highlighting the importance of security tools in protecting smart contracts.

Conclusions for RQ 1, RQ 2
• In a subset of 32 attacks that automated security tools could have detected, only 11 of the exploited vulnerabilities were detected, highlighting a significant missed opportunity to enhance the security of smart contracts.• All of the detected vulnerabilities were related to reentrancy, indicating the effectiveness of the tools in detecting this type of vulnerability but also highlighting the inefficiency of automated tools in detecting other vulnerabilities.• The top two types of vulnerabilities, absence of coding logic or sanity checks and on-chain oracle manipulation, cannot be detected by current automated security tools.Moreover, we observe that the majority of protocol-layer vulnerabilities are out of the scope of security tools.• Semi-automated tools may be able to prevent 47 attacks that involve absence of code logic or sanity checks and logic errors.Discussion.Despite almost a decade of research and development, automated security tools are still inefficient in detecting vulnerabilities in real-world contracts with high TVL, while reporting many potentially insignificant issues.Hence, further research is needed to improve the effectiveness and usability of these tools to better protect against financial losses due to vulnerabilities in smart contracts, while it is important to add support for more vulnerabilities.

Familiarity and Usage of Security Tools
In this section, we aim to explore the role of security tools in the smart contract development lifecycle and DeFi audits, specifically focusing on how practitioners utilize these tools.To address this question, we survey both developers and auditors.In the following, we present the results of the surveys and analyze their implications for the development of secure dapps and effective DeFi audits.
Tool familiarity and usage for developers and auditors.Figure 8 illustrates the different types of tools that both developers and auditors have used.The category of tools that most practitioners have used is developer toolkits, followed by IDEs.These tools are primarily used for developing, deploying, debugging, and testing smart contracts.Interestingly, we observe that developers tend to favor lightweight tools such as linters, while auditors prefer tools with greater bug-finding capabilities, such as static analyzers and symbolic executors.Furthermore, developers have more experience using runtime monitoring tools, as most audits are performed on contracts before their deployment.Additionally, we found that developers have used an average of 4.5 different types of tools, while auditors have used an average of 5.  Reported tool usage.We further investigated the types of security tools that practitioners use during development to secure decentralized applications (c.f. Figure 9).Only two participants reported that they do not use any tool for this purpose.The majority (92%) of participants report that their organization utilizes open-source tools, and many invest effort into developing internal tools or extending existing open-source tools.Additionally, 25% of participants reported that their organization uses third-party services typically provided by auditing firms.The prevalence of open-source tools highlights the importance of collaboration and community-driven efforts to improve security in the decentralized application ecosystem.Open-source tools have the potential to reach a wider audience and have a greater impact, ultimately leading to more secure and reliable decentralized applications.

Utility of specific tools during development and auditing.
Next, we explore which specific tools developers and auditors use during the development and auditing of dapps.Figure 10 displays the results of our investigation.The distribution of tool usage closely mirrors that of Figure 8. It's worth noting that many auditors' responses included "other" choices.This is because, in auditing companies, it is common for in-house security analysis tools to be developed and used in audits.Another noteworthy result is that various academic tools, such as Maian, Oyente, and Securify2, which are commonly used in scientific paper evaluations and benchmarking studies [14,42,59], are not used in practice.This highlights the need for academia to adapt its comparisons and benchmarks to tools that are actually used by the community.As we observed in Section 4.1, automated security tools cannot detect logic-related vulnerabilities.Thus, it is crucial to determine how many developers and auditors currently use tools capable of detecting such errors (i.e., formal verification and property-based fuzzing).Our surveys reveal that 59% of developers and 48% of auditors utilize at least one such tool.
Property-based tests and application specifications.As already mentioned, some tools require additional inputs, such as specifications of the smart contracts under test.Hence it is essential to understand who is responsible for providing these inputs.40% of the respondents indicated developers as responsible for writing specifications/property tests for semi-automated security tools, followed by 29% for auditors and developers, 20% for auditors, and 11% of the respondents were unsure.As the effectiveness of such tools heavily relies on the quality of the provided inputs, we argue that both auditors and developers should participate in that process.
Time spent on tool usage by auditors.Another critical question to consider is how much time auditors spend running, fine-tuning, and validating the results of security tools.The results of our survey, indicate that the majority (76%) of auditors spend a small proportion (between 0%-20%) of their time using such tools.19% spend between 21% to 40%, while 5% spent 41% to 60%.This suggests that auditing is still primarily a manual effort.While there is certainly potential for tools to improve and automate certain aspects of the auditing process, it will continue to be predominantly a manual effort.
• Overall we observe that developers tend to employ more lightweight tools, including linters, whereas auditors utilize tools with greater bug-finding capabilities (e.g., static analyzers).In addition, developers, use runtime monitoring tools more than auditors.• Academic tools that appear in the context of research evaluations and benchmark studies such as Oyente, are not used in practice.• 59% of developers and 48% of auditors utilize tools that can reveal logic-related bugs that are the root cause of many high-impact attacks.• The majority of auditors (76%) spent only up to 20% of their time using security tools during audits, indicating that the auditing process is mainly a manual effort.Call to action.To bridge the gap between research and practice, researchers must consider three key factors.First, they should determine if the tools they create will be incorporated into development processes or employed during audits, focusing on prioritizing relevant features.Secondly, emphasizing the detection of vulnerability types that currently cannot be detected by existing security tools is vital.Finally, the evaluation of scientific papers should include benchmarks based on genuine real-world attack scenarios for more accurate and relevant results.
Discussion.As different security tools may use varying techniques, with some being more resource-intensive than others, it is important to match tools appropriately to different stages of the development lifecycle.According to our survey results, developers tend to prefer tooling that can be used during the development process, such as linters, static analyzers, or after deployment, i.e. runtime monitoring tools.Therefore, it is crucial to develop tools that can be easily integrated into developers' daily routines.One such example is Foundry's property-based fuzzer [3], which, despite being a relatively new tool, is already being utilized by a significant number of developers.

What Makes Security Tools Valuable to Practitioners
In this section, we aim to understand the factors that practitioners consider important when using security tools.Specifically, we explore the value that security tools provide in the context of detecting smart contract vulnerabilities and assess auditor satisfaction with the results generated by security tools.By examining these aspects, we can gain insight into what makes security tools valuable to practitioners and how they can be further improved to better serve the needs of the DeFi ecosystem.
Importance of tools' characteristics.Results from a Likert-based question on security tool characteristics are presented in Figure 11.This survey question sought to understand the importance that both auditors and developers place on various aspects of security analysis.The results indicate that both groups consider all of the enumerated characteristics important, but there are some differences in the degree to which each characteristic is prioritized.For developers, low false negatives are perceived as more important mainly because they want reassurance that their applications are safe, whereas, for auditors, low false positives are considered to be more crucial, since it's their job to triage the reports.Additionally, ease of use is a bit more important for auditors, while some auditors do not place as much importance on report quality.positively or negatively affect the use of security tools.Many auditors emphasized the importance of tool setup, in addition, to ease of use.One participant highlighted the relation of time to configure / the severity of issues found.For developers, easy integration into the development life cycle (e.g., continuous integration), ease of customization, and the social aspect of other people using the tools and detecting important bugs in real-world applications were the most frequently mentioned factors.Overall, these findings highlight the diverse needs and priorities of practitioners when it comes to security tool features and underscore the importance of developing tools that meet a wide range of requirements.
Exploring practitioners' perspectives on challenging vulnerabilities and available tooling for detecting such vulnerabilities.Figure 12a sheds light on the most challenging vulnerabilities faced by both developers and auditors during the development and manual audit process, respectively.Developers identified logic errors, oracle manipulation, and absence of coding logic as the most difficult vulnerabilities to detect during development, which aligns with the state of most common defects in high-profile real-world attacks (see Figure 2).Auditors identified logic errors as the most challenging vulnerability to detect manually, followed by several vulnerabilities that existing tools have broad support for, such as 1 2 3 4 5 9.5% 28.6% 33.3% 19% 9.5% Figure 13: Auditors' satisfaction ratings of security tools used for auditing, on a scale from 1 (not at all satisfied) to 5 (extremely satisfied).
integer overflows and reentrancy vulnerabilities, indicating that tools are indeed useful for identifying such bugs.
Regarding vulnerabilities that cannot be detected by automated security tools, both developers and auditors cited oracle manipulation and logic errors as the most challenging (c.f. Figure 12b).Additionally, both groups identified improper asset locks or frozen assets as a vulnerability that requires better support from tools.Overall, these findings emphasize the importance of developing more sophisticated security tools to detect crucial vulnerabilities that current tools may either not support or miss.
How auditors evaluate security tools for auditing smart contracts.Our survey results indicate that a majority of participants found security tools helpful when auditing smart contracts, with 52.4% rating them as 4 or 5 on a 5-point scale (c.f. Figure 13).However, a significant portion of respondents (38.1%) did not find security tools to be helpful or found them only somewhat helpful (rated 1-2).This suggests that there is still room for improvement in terms of the effectiveness and usability of security tools.In particular, participants highlighted the need for security tools to address more complex vulnerabilities that pose a greater threat to DeFi applications.
Conclusions for RQ 4 and RQ 5.
• Developers prioritize low false negatives in security tools, while auditors prioritize low false positives in security tools (in comparison to developers), as it is their job to triage reports.Furthermore, auditors emphasize the importance of tool setup and bug-finding capabilities, while developers emphasize easy integration into the development lifecycle.• Both developers and auditors want better support for tooling related to logic-related and oracle manipulation vulnerabilities.• While 52.4% of auditors find security tools helpful for auditing, a significant portion (38.1%) do not find them useful, highlighting the need for further improvement in the development and use of security tools in auditing.
Call to action.Security tools should detect crucial vulnerabilities, such as logic-related and protocol-layer vulnerabilities (e.g., oracle-manipulation bugs), that can result in significant losses in practice.However, it is equally important for security tools to meet high usability and interoperability standards to be adopted by practitioners.

Implications
Effectiveness, coverage, and need for manual inspection.Our analysis shows the limited effectiveness of automated security tools in detecting DeFi vulnerabilities.Figure 2 reveals that only 11 out of 32 (34%) vulnerability types in our dataset were detected by the tools, emphasizing the insufficiency of current automated tools for comprehensive security assurance in DeFi ecosystems.Additionally, given the limited number of vulnerabilities covered by security tools (32/127), smart contract security relies heavily on manual inspections by designers, developers, or auditors.Our survey data indicates that only 59% of developers and 48% of auditors utilize tools capable of identifying logic-related errors, stressing the need for a holistic auditing approach combining automated tools and manual reviews.
Emphasizing semi-automated tools for addressing critical vulnerabilities.Our findings point to the necessity of semiautomated security tools capable of detecting critical vulnerabilities in the smart contract ecosystem.Automated tools, while able to detect reentrancy vulnerabilities, fall short in covering logic-related bugs and protocol-layer vulnerabilities, such as oracle manipulation.Semi-automated tools, which incorporate user input to provide oracles for detecting security issues, present a promising solution.We encourage academia to focus on developing advanced tools that can effectively identify and prevent high-impact vulnerabilities, complementing practical tools already developed by the practitioner community [3,20].For example, Liu and Li [32] have made progress in this direction by utilizing dynamic analysis techniques to identify invariants, which can subsequently be utilized as inputs for semi-automated tools.

Threats to Validity
We use a standard methodology [17] to identify validity threats, which we mitigate where possible.This section discusses threats to internal, construct, and external validity for both the empirical analysis and the surveys.

Empirical analysis.
Internal.One potential threat to internal validity is that the tools' results may be unsound.To mitigate this risk, we cross-checked the results and manually verified the essential findings.We also conducted sanity checks to ensure that the processing of the analysis results was correct.Another potential threat is that the dataset from Zhou et al. [64] may have incorrect data.To address this issue, we manually verified the important findings, while the dataset is open to the public for further verification.
Construct.A potential threat to construct validity is the setup of the tools used in the analysis.To mitigate this risk, we followed the documentation, the setting used in the tool papers, and ran the tools on both source code and bytecode when available.However, we note that ConFuzzius had the highest failure rate per contract, and while we attempted to mitigate any compilation errors, the tool failed in some cases because it could not deploy the targeted contract.Con-Fuzzius (and fuzzers in general) typically require more fine-tuning per execution, which is out of the scope of this work as we aimed to measure the out-of-shelf solutions available to practitioners with minimal setup.Another potential threat is mapping vulnerabilities from tools to the vulnerability types of Zhou et al. [64].To address this issue, multiple authors performed the mapping independently, and we iterated over the mapping until reaching an agreement.We further consulted our mapping with the authors of [64].
External.A potential threat to external validity is the size of the DeFi attacks dataset.We used the most extensive dataset with attacks that have fine-grained information.Furthermore, we have automated the whole process, so adding more attacks to the analysis is straightforward.Another potential threat is that we did not include all available tools in the analysis.In this work, we focused on tools most likely to be used by practitioners (c.f. Figure 10).For each analysis technique, we selected the most well-established tool.We further included the most frequently used tool in academic paper evaluations (i.e., Oyente).Finally, we ran several not-maintained academic and industry tools (Securify2, SmartCheck, Conkas, Maian) and observed that their results did not change the paper's overall conclusions.

Surveys.
Internal.Our survey responses may be subject to a potential threat to internal validity, as some respondents may not understand some of the questions well.To reduce this risk, we highlighted in the invitation message that all questions are optional, and that they can skip any question that they do not understand.Additionally, to mitigate this threat, we designed our survey in an iterative fashion, as discussed in Section 3.2).
External.Our goal is to survey developers and practitioners that work on projects with high TVL that are typically the targets of adversaries.Hence, we meticulously selected who to invite and did not share our surveys on social media or email lists to focus on the quality of responses rather than quantity.Furthermore, focusing on developers and auditors who work on top protocols and are more experienced with security tools can pose an external validity threat, because our results might not represent the broader ecosystem.Another risk involves the fact that a large percentage of the participants work for the same organizations.To mitigate this risk, we sent up to three invites per organization.

RELATED WORK
Smart Contract Attacks and Security.To detect vulnerabilities in smart contracts, various tools using different techniques have been developed.Static analysis [5,6,16,27,48] is one such approach, where the source code or bytecode is analyzed without execution.In contrast, dynamic analysis examines the smart contract while executing it.Fuzzing [20,25,54] is a testing technique where inputs are automatically generated to test the system's behavior.Symbolic execution [11,19,33,34] and formal verification [38,46] are other well-known and frequently used techniques.However, formal verification typically requires users to provide specifications of intended behavior.In our study, we included one tool from each category that can be executed automatically, providing a comprehensive assessment of available solutions.
DeFi Attacks and Security.DeFi attacks present unique challenges compared to those in traditional financial systems, primarily due to two key factors [41,51]: (i) the transparency in DeFi's application design, bytecode availability, and P2P transaction propagation; and (ii) the composability of DeFi applications.Several studies have examined DeFi attacks, including Zhou et al.'s fivelayered framework for incident categorization and evaluation [64].Other significant works have focused on specific security issues.For instance, the Flash Boys paper [12] was the first to explore the front-running issue, while Zhou et al. pioneered the study of sandwich attacks [63], which takes advantage of users' slippage settings in decentralised exchanges.DeFiRanger [53] extracted DeFi actions and identified price oracle manipulation attacks using pattern matching.DeFiPoser [62] employed SMT solvers to compose DeFi protocols, aiming to generate attacks.Collectively, these studies highlight the complexity and unique challenges posed by DeFi attacks.Additionally, our work underscores the limitations of traditional security tools that primarily focus on the smart contract layer neglecting the protocol layer.
Surveys on smart contract vulnerabilities and security tools.Atzei et al. [4] performed the first survey of smart contract attacks.Chen et al. [8] conducted a more comprehensive survey of 40 vulnerabilities, 29 attacks, and 51 defense locations and underlying causes, while Demolino et al. [13], categorized bugs based on typical developer pitfalls.Harz et al. [23] investigated 10 smart contract verification tools, exhibiting various aspects of their security characteristics.Hu et al. [24] assessed 39 analysis tools in terms of input type and methodology.Finally, Kushwaha et al. [29] presented a comprehensive survey of 86 analysis tools, the most of any research publication and article, and examined their analysis approaches and tool type.In constrast to these studies, we focus on the real-world impact of security tools by evaluating them against high-profile attacks and surveying practitioners.
In a different spirit, Groce et al. [22] conducted an analysis of 23 audits conducted by a prominent blockchain security company, employing a combination of automated tools and manual reviews.While security tools were utilized in 21 of the audits, it is noteworthy that only 4 out of 246 identified vulnerabilities were explicitly detected by automated tools, specifically Slither.This finding supports the conclusion of our work that automated security tools require improvement to enhance their practical utility.Furthermore, despite the study being three years old, it identifies data validation (equivalent to absence of coding logic or sanity checks) as the most common vulnerability within the audited contracts.This observation underscores the persistent threat posed by this category of bugs to the overall ecosystem.
Surveys of program analysis and security tools.Outside the realm of smart contracts security, Christakis et al. [10] empirically investigate what appeals to practitioners the most about a program analyzer [10], while [45] evaluates the usability of security tools.Johnson et al. [26] and Witschey et al. [52] explored why security tools are underused despite their benefits.On the contrary, in this work, we focus on how practitioners use security tools in the DeFi ecosystem.Finally, to the best of our knowledge, we are the first to survey auditors regarding security tool usage.

CONCLUSIONS
In conclusion, our evaluation of automated security tools, combined with surveys of developers and auditors, reveals that existing tools have limited effectiveness in detecting high-impact vulnerabilities, with only 8% of the attacks in our dataset being detected by automated tools.This indicates that smart contract and DeFi security has not been fully addressed yet.While reentrancy vulnerabilities can be detected, the tools do not adequately address logic-related bugs and protocol-layer vulnerabilities.We propose that researchers should prioritize the development of techniques that cover a wider range of vulnerabilities, including logic-related bugs, even if they partially require user input.Additionally, we suggest developing distinct tools for developers and auditors, as they have varying requirements regarding the capabilities of security tools.We hope that our findings can provide valuable insights and guidance for practitioners and researchers working in this dynamic and challenging area.

Figure 3 :
Figure 3: Overall descriptive statistics of the analysed attacks.

Figure 9 :
Figure 9: Different tool types used during development.

Figure 10 :
Figure 10: Security tools used by developers and auditors.

Figure 11 :
Figure 11: Importance of security tool characteristics.

Figure 12 :
Figure 12: Practitioner perspective on security tools and vulnerabilities.
Summary of tool results.D (Detected).ODI (Other Detected Issues): other findings including false positives, defects that cannot be exploited (e.g. in protected functions), or exploitable defects not included in the dataset (i.e., not used in the attacks).TA (Total Attacks). 3.