ChatGPT-Resistant Screening Instrument for Identifying Non-Programmers

To ensure the validity of software engineering and IT security studies with professional programmers, it is essential to identify participants without programming skills. Existing screening questions are efficient, cheating robust, and effectively differentiate programmers from non-programmers. However, the release of ChatGPT raises concerns about their continued effectiveness in identifying non-programmers. In a simulated attack, we showed that Chat-GPT can easily solve existing screening questions. Therefore, we designed new ChatGPT-resistant screening questions using visual concepts and code comprehension tasks. We evaluated 28 screening questions in an online study with 121 participants involving programmers and non-programmers. Our results showed that questions using visualizations of well-known programming concepts performed best in differentiating between programmers and non-programmers. Participants prompted to use ChatGPT struggled to solve the tasks. They considered ChatGPT ineffective and changed their strategy after a few screening questions. In total, we present six ChatGPT-resistant screening questions that effectively identify non-programmers. We provide recommendations on setting up a ChatGPT-resistant screening instrument that takes less than three minutes to complete by excluding 99.47% of non-programmers while including 94.83% of programmers.


INTRODUCTION
Rust is an emerging system programming language focusing on memory safety and efficiency [41].It provides memory-safe guarantees via compile-time checks [18]; consequently, programmers must adhere to various syntactic constraints to satisfy the verification [71].As a system programming language, it employs several zero-cost abstractions [32] to transform data without sacrificing performance [33] (e.g., generic types).Although Rust has a steep learning curve [18], it has attracted many programmers due to its safety and efficiency [34].Since 2016, Rust has been the most popular programming language in the open-source community [43][44][45][46][47][48][49], with many projects refactoring code in Rust [36,37,52].
As Rust continues to evolve, knowing how to use Unsafe Rust is essential for Rust security [5].Safety isolation is one of the revolutionary innovations introduced by Rust [51].It divides the portions the compiler can ensure safety into Safe Rust and adds the unsafe keyword as the superset [33].The primary document defines Unsafe Rust as a keyword and a set of operations [50].Any code with unsafe operations must be wrapped in an unsafe block.If not, programmers will trigger compilation errors.Without strict compiler checks in the unsafe scope, Rust developers may become insensitive to satisfying safety requirements, which is error-prone to causing undefined behavior (UB).
Table 1: A list of APIs with similar side effects and their document slices in Rust 1.70.These APIs accept mutable pointers as input and return a typed owner.In previous research, it has been reported that misuse of these APIs may result in double-free issues.The consistency and clarity of these documents need improvement: Related descriptions are not always located within the Safety section, and the description of the safety requirement (underlined) or side effect (bolded) is insufficient.We expect a specific error type to be defined.
The ownership of buf is effectively transferred to the String which may then deallocate, reallocate or change the contents of memory pointed to by the pointer at will.Ensure that nothing else uses the pointer after calling this function.
How does Rust document safety requirements for unsafe operations?We observed that most safety requirements are specified on a Safety label in Rust std.The Rust standard library [26] provides documents for unsafe APIs that are relatively comprehensive.As shown in Figure 1, we chose one API as the typical example.When calling ManuallyDrop::take [55], it has a safety requirement to be manually reviewed: Users cannot use this container again.Otherwise, it would trigger undefined behavior.Its implementation calls unsafe function ptr::read [59] (line 5), prompting us to think they may have analogous safety descriptions.
Unfortunately, Unsafe Rust does not provide developers with unified safety descriptions or systemic safety requirements.Recent research found that misusing several unsafe APIs may result in memory-safety issues [69], where overlapped owners can be created and cause double free [6,11], such as ManuallyDrop::take and *mut T::read.Table 1 lists them with documents in Rust 1.70.Like ManuallyDrop::take, using a Safety label to start the safety description is intuitive.The majority of the listed APIs adhere to this criterion, such as implementations for String, Vec<T>, CString, and Box<T>.However, the Rc<T> lacks the Safety section, and the related issue caused by read is described in the outer section.At last, the texts of side effects exhibit differences: Only Box<T> explicitly states the potential double-free that may arise.
The unsafe API documents should systematically classify safety requirements for users to comply with.This paper comprehensively categorizes fine-grained safety requirements when crossing unsafe boundaries.In general, this paper seeks to address the following research questions (RQs): • RQ-1.What finer-grained safety properties (requirements) should be satisfied across the Unsafe Rust boundary?( §3) • RQ-2.Can those safety properties cover existing vulnerabilities caused by Unsafe Rust? ( §4) • RQ-3.How helpful are those safety properties for real-world Unsafe Rust programming?( §5) For each RQ, our study introduces several sub-experiments.To answer RQ1, we extracted all public unsafe APIs within the Rust standard library [26] and manually audited the document.We categorized the safety requirements across the unsafe boundary as Safety Properties (SPs).We completed the data labeling for those APIs and then conducted a correlation analysis to find interpretable results.To answer RQ2, we examined all Rust CVEs [19] until 2023-07-08 and filtered through the root causes by misusing unsafe code, categorizing them according to Safety Properties to validate our classification.Then, we collected and analyzed the distribution of unsafe APIs within the crates.io[21] ecosystem.To answer RQ3, we surveyed experienced Rust developers.We provided participants with the definition of each SP and its minimal Proof of Concept (PoC).We studied whether the developers acknowledged our categorization and whether each Safety Property was beneficial for unsafe programming.
Reviewing documents of unsafe APIs in Rust std, we performed an audit on 416 unsafe APIs.As a result, we identified and defined 19 safety properties (SP), categorized into two major categories: precondition and postcondition.Subsequently, we completed the SP labeling for all unsafe APIs, creating two datasets for correlation analysis.The results revealed six crucial SPs that users need to satisfy when dereferencing.Next, we classified the existing Rust CVEs based on safety properties, with 196 of 404 resulting from unsafe code.Notably, 86.73% of these errors were attributable to misuse of the standard library.Therefore, we conducted a statistical analysis of std unsafe API usage for the Rust ecosystem, which included 103,516 libraries on crates.io.Finally, we conducted user surveys targeting developers with over one year of Rust experience, having written over 5,000 lines of code and using over 1,000 lines of unsafe code.The evaluations for each SP were rated on four dimensions: precision, significance, usability, and frequency.We received 50 valid responses and conducted data analysis on them.
Our main contributions are listed as follows: • We performed the first empirical study by learning unsafe API documents from the standard library to classify safety requirements across unsafe Rust boundaries.

BACKGROUND 2.1 Working with Unsafe Rust
Rust is subdivided into Safe Rust and Unsafe Rust, with Unsafe Rust being a superset [51].Safe Rust ensures type and memory safety, preventing undefined behavior [5].However, it lacks lowlevel controls over implementation details (e.g., manual memory management).Unsafe Rust is an essential design feature to achieve low-level control at the system level [25].It is employed if it has performance requirements or needs to interact with operating systems, hardware, or other programming languages.unsafe Keyword.unsafe keyword can be used in declarations and code blocks.The first scenario indicates that the functions cannot be called in the safe code.Misuse may trigger undefined behavior.In code blocks, it signifies the scope that may violate safety guarantees without compiler-time checks, and the code requires manual auditing to ensure safety.This keyword acts as a railing, separating the safe and unsafe portions: All unsafe parts are encapsulated within this scope.The trust relationship between safe and unsafe parts is asymmetric [25].When using an unsafe block, careful inspection is required to ensure that the data from the safe portion adheres to the contracts of the unsafe APIs.Conversely, when writing safe code, it is assumed that the unsafe code is correct and would not trigger undefined behavior.unsafe Operations.Safe Rust and Unsafe Rust are designed for different scenarios.Safe Rust is a safe programming language designed for tasks that do not require low-level interactions.Contrariwise, Unsafe Rust fully leverages the capabilities of a systems-level programming language.Unlike languages such as C/C++, which are inherently unsafe, Unsafe Rust still requires adherence to certain contracts from the safe portion, such as ownership.The main differences in Unsafe Rust are that you can 1) Dereference raw pointers; 2) Call unsafe functions; 3) Implement unsafe traits; 4) Mutate static variables; and 5) Access fields of unions [25].These operations provide flexibility but come with the responsibility of the users to manually ensure correctness and safety.Recent empirical research [71] suggests that Rust's safety mechanisms could be more learner-friendly.This study explored the learning challenges of its safety mechanisms by analyzing Stack Overflow comments and conducting user surveys, but it is restricted to Safe Rust.Instead, learning and utilizing Unsafe Rust is a prerequisite for advanced Rust developers.

Undefined Behavior in Rust
The undefined behavior in Rust is limited to include [23]: • Dereferencing (using the * operator on) dangling or unaligned raw pointers.This categorization is based on the side effects introduced by unsafe code.Since no formal model of Rust's semantics defines precisely what is and is not permitted in unsafe code [23], additional behavior may be deemed vulnerable.In this paper, we additionally introduce the following issues as program vulnerabilities if they are triggered by unsafe code: • Causing a memory leak and exiting without calling destructors.
These undefined behavior and vulnerabilities serve as the basis for classifying safety requirements.Other errors fall outside the scope (e.g., deadlocks and logic errors).

STUDYING UNSAFE DOCUMENTS IN STD
This section presents how we extract and define systematic safety requirements as Safety Properties (SP) from the existing unsafe documents in the standard library [26].Our classification allows us to clarify the primary conditions and constraints necessary for Unsafe Rust, hence answering RQ1.

Preprocess on Rust Documents
Rustdoc [24] is the document system for the Rust programs, which enables the description of functionalities, requirements, expected results, and sample code snippets for APIs and crates.Intuitively, input requirements and side effects within an unsafe API must be explicitly specified in Rustdoc.We found that the document in the standard library is one of the most comprehensive resources for safety annotations within the Rust ecosystem.We thus audited documents of all public unsafe methods within the standard library as the knowledge base.
3.1.1Design Goals.Table 1 reveals that even in the standard library: (i) the expression of the same safety requirement is not universally consistent; (ii) the enumeration of the safety requirements and side effects is not always sufficient.Thus, we manually categorize and define a series of finer-grained safety requirements as Safety Properties (SP), which need to satisfy the following design goals: GOAL 3.1.Generality: SP abstracts safety requirements not specific to one particular API's intricacies.GOAL 3.2.Unambiguous: SP intends to adopt the existing terminology and explanations as much as feasible in Rust.GOAL 3.3.Nonoverlapping: SP does not overlap, although they may be correlated.GOAL 3.4.Composability: An Unsafe API's safety requirements can comprise several SPs.GOAL 3.5.Essentiality: Failure to comply with any SP would cause undefined behavior or additional vulnerabilities.GOAL 3.6.Practicality: SP is valuable and needs to be seriously considered in real-world programming scenarios.GOAL 3.7.Unilingual: SP disregards the Foreign Function Interface (FFI) and the intrinsic requirements of other programming languages.
By adhering to these principles, the extracted safety properties aim to provide a comprehensive and practical understanding of the safety considerations associated with Unsafe Rust.It maintains compatibility with Rust's existing terminology and avoids unnecessary complexities related to FFI.

Preprocessing.
We noticed the redundancy in the standard library, such as std and core having an intersection.Thus we performed the following preprocessing for all unsafe APIs within std/core/alloc in Rust 1.70, including stable and nightly channels: FILTER 3.1.For the methods exposed by both core and std, we kept only one of them.FILTER 3.2.For methods belonging to similar numeric types, we kept only one implementation.FILTER 3.3.For compiler intrinsics, we retained only those with no stable counterpart.
As a result, we obtained a collection of unsafe APIs comprising 416 unsafe methods, with 127 being folded as 11 unique APIs by Filter 3.2 (e.g., unchecked_mul::<u8>/::<u16> [63,64] are merged).By applying the preprocessing step, we aimed to streamline and consolidate an unsafe API collection for further analysis and investigation.

What Safety Properties Should We Satisfy?
A code audit of all API documents within the collection was conducted to determine what safety properties correspond with the design goals.As shown in Table 3, we divided all safety properties into two main categories with 19 subdivisions.

Working
Procedure.We simultaneously studied documents, defined safety properties, and labeled APIs.Regarding methodology, we performed two rounds of audits, with double-checking from the first and second authors.During the audit of each API, we focused on five sections: the functionality description, the safety description, the subchapters (including outer sections), the example code snippets, and the source code with its comments.
First Round: Initial Establishment.We labeled each std unsafe API with SPs that previously existed.The initial set of safety properties was empty.A new SP item was established if a safety requirement emerged and was not recorded in the current set.Therefore, any newly identified SP should be introduced for the first time during the first-round audit.If there are overlapped SPs, we merged them and re-checked related APIs to determine if they could be consolidated.
Second Round: Cross Checking.We observed that the API descriptions exhibited variations, including inconsistency and insufficiency.In the first round, 19 SPs were finally defined.In the second round, we focused on cross-checking and ensured that all unsafe APIs were appropriately labeled.We have paid particular attention to identifying any missing SP labels for each API that were not initially captured in the first round.
In summary, the first round of auditing ensured the completeness of SP categorization, while the second round enhanced the completeness of the required SP labels for each unsafe API.

Categories.
We have divided the safety properties into two categories based on the state of the function execution as in program testing [7], with no overlap between the sub-items.
Precondition Safety Property (PRE-SP).The precondition assumes that if the input values do not satisfy the safety requirements, the function call will trigger undefined behavior or additional vulnerabilities in Section 2.2.Thus, any given API can be regarded as a black box for single-step execution [66] at the call site, regardless of its internal implementation.PRE-SP complies with the initial Table 3: Safety properties learned from unsafe API documents in Rust standard library.Precondition and postcondition are two primary categories, and they have 19 subitems in total.We present the sum of the labeled unsafe API for each safety property and provide a detailed definition.Each safety requirement was also given a typical unsafe API as an example.Note that each SP may contain various sub-scenarios.Value may be moved, although it ought to be pinned.impl<P: Deref> Pin<P>::new_unchecked 1 Send [53] and Sync [54] are unsafe traits (markers) that are automatically implemented by the compiler when it determines that they are required.Therefore, they lack associated methods. 2 The difference between DualOwner and AliasingMutating is that DualOwner only focuses on objects instead of pointers and references.characteristic of unsafe function (i.e., it cannot ensure safety for arbitrary inputs).In Table 3, we summarize 12 PRE-SPs.For example, swap [60] has the description "Both x and y must be properly aligned.", thus categorized into Aligned.
Postcondition Safety Property (POS-SP).The previous assumption leads to the deduction that the function can be safely called if the proper inputs are supplied.However, this assurance only concerns the current program point.POS-SP focuses on the potential safety issues that may arise from the subsequent operations, assuming that the input values satisfy all PRE-SPs needed.In Table 3, we finally summarize 7 POS-SPs.For example, zeroed [56] has the description "There is no guarantee that an all-zero byte-pattern represents a valid value of some type T. ", thus categorized into Untyped.
The PRE-SP items are not nonoverlapping within POS-SPs through this separation.It can be verified by a Rust design, where creating raw pointers is always safe, but dereferencing them is unsafe [23].Similarly, we assume that PRE-SPs only affect the safety of function calls, while POS-SPs focus on the subsequent usage of inputs and return values.Furthermore, POS-SPs are only considered under the premise that all PRE-SPs are satisfied.Specifically, we merged Aliasing and Mutating based on the ground truth that all relevant APIs shared the same labels in these items.We empirically inferred that the primary side effect of breaking Aliasing rules is erroneously Mutating immutable data.We empirically inferred that the primary side effect of breaking Aliasing rules is erroneously Mutating immutable data.

Correlation Analysis on Safety Properties
We obtained a valuable dataset after completing the labeling for unsafe API collection.Although one of our design goals focuses on nonoverlapping, it is still necessary to investigate potential correlations between different SPs.This notion is from the empirical intuition that data satisfying Dereferenceable should always meet Allocated.

Methodology.
We conducted a correlation analysis based on two datasets, and their results demonstrate the anticipated differences.We will explain their characteristics first and then discuss the results of both datasets.Large Dataset.The large dataset directly uses the original collection with labeled data, which includes the entire set of unsafe APIs.The labels of functionally related APIs may be similar.The intent of keeping a large dataset is to emphasize the quantity, as having adequate data can expose potential correlations.
Small Dataset.The small dataset is created by applying additional filter (Filter 3.4) to the large dataset.This is done to counteract the potential bias from excessive similar APIs.The small dataset intends to eliminate the redundancy of potentially irrelevant data and concentrate on diversity.FILTER 3.4.APIs must have the same labels and satisfy at least one of the following requirements: • Implementations of the same method with different mutability.
• Implementations of the same trait for different types, including mono-morphizations in the trait or struct definitions.
• Functions with the same name implemented for different types within the same namespace.• Encapsulation of intrinsic functions.

Correlation Matrix.
As depicted in Figure 2, we built correlation matrices for both the large and small datasets, retaining only the elements with moderate correlation and above (correlation coefficient > 0.20).The large dataset has a higher susceptibility to interference from redundant APIs.For example, there are 30 implementations of the trait SliceIndex<[T]> [62], all of which are labeled with RelativeBound and Allocated only.The correlation between them is thus higher in the large dataset, changing from 0.43 to 0.48.In the small dataset, the diverse functionality among APIs is more likely to result in the loss of pertinent data that could affect correlations.For example, the small dataset's correlation between Aligned and AliasingMutating decreases from 0.51 to 0.39.At last, Encoding, Unreachable, SystemIO, and Pinned achieve the best independence, as they have no substantial correlation with any other SPs in both matrices.
Case Study.Based on two diagrams in Figure 2, we extracted all the pairs with at least a moderate CC, as listed in Table 4.Among the six pairs with no POS-SPs, we empirically found that they are related to dereferencing operations.Although dereferencing was not considered a distinct item in the SP category, such operations are pervasive in the inner code of unsafe methods.We inferred knowledge about safety requirements for dereferencing that was not explicitly categorized: The first-class SP for a valid pointer with the highest priority is Allocated, followed by Dereferencable,

VERIFYING REAL-WORLD UNSAFE CODE
This section presents the practical usability of our classification in real-world scenarios and the frequency of unsafe API usage in the Rust ecosystem.We classify existing CVEs [19] to validate SP coverage and conduct a statistical analysis of unsafe API usage on crates.io[21].

Classifying Existing Unsafe CVEs
4.1.1Workflow.The workflow consists of two primary steps: Create a database of CVEs caused by misusing unsafe cust and classify them into safety properties by manual code review.CVE Set.We employed the CVE dataset from the program (https://cve.mitre.org)and searched on the CVE list using the keyword "Rust".The results are sorted by CVE ID in chronological order (i.e., submission date).The final CVE dataset ranged from CVE-2017-20004 [12] to CVE-2023-30624 [14].We initially filtered CVE based on CVE descriptions, primarily retaining memory-safety issues.Unrelated CVEs were removed, such as leaking sensitive information.We filtered those CVEs triggered in the panic path because this study does not specifically work for panic safety [25].Additionally, the retained CVEs cannot be located in a deprecated or yanked crate and should have a link to the source code to support a code audit.
CVE Audit.We performed a manual audit of error snippets that led to security issues.The first and second authors double-checked the results.We painstakingly investigated whether misusing unsafe code was the root cause of each CVE.Due to the short descriptions provided on the CVE website, we utilized various sources, including issues, pull requests, contributors (e.g., RUSTSEC [27]), and fixed code, to pinpoint the source code related to each CVE.Any CVEs that did not satisfy the front criteria were removed from our dataset.Around 86.73% of the 196 CVEs in the final dataset were attributed to misusing unsafe APIs from the standard library.In contrast, the remaining CVEs were caused by dereferencing raw pointers, using non-std unsafe functions, or outside FFI.Based on the descriptions and code reviews, we further classified each CVE into the SPs it violated.CVE Example. Figure 3 presents an example of a classified CVE derived from CVE-2021-45709 [2,13].Based on the explicit error description in its issue, we were able to locate the buggy source code and confirm incorrect usage of from_raw_parts_mut [61].This API was annotated by the following SPs: ConstBound, Rela-tiveBound, Allocated, Dereferencable, Aligned, ConsistantLayout, AliasingMutating, and Outliving.This CVE violates the requirement of Aligned, leading to undefined behavior.SP and Time-span Distribution.Figure 4 depicts the classification results.RelativeBound, Initialized, and Thread had a significant number of CVEs (at least 23).Following them, there are fewer CVEs associated with Aligned, Outliving, ConsistentLayout, and the other 6 SPs (ranging from 2 to 14). 7 SPs have no corresponding CVEs.Except for ConsistentLayout, ConstBound, and Allocated, the time span of CVEs for each SP ranges from as early as August 2019 to as late as December 2021 from a temporal perspective.This period contains approximately 91.84% of listed CVEs.
Benchmark on SP.Our classification results provide a set of CVE lists for each SP that can be used as a benchmark.This benchmark can be employed to evaluate the effectiveness of research prototypes or bug-detection tools designed for particular SPs.As far as we know, the Rust community needs a unified, ground-truth-supported benchmark to support effectiveness comparisons based on safety issues.We advocate for setting such a benchmark to serve as a basis for the community.We also list some open-source code detection tools for Rust programs, all of which serve for specific SPs, as shown in Table 5.
Case Study.The most CVEs were caused by Thread (70) violations.This is predominantly the result of user-defined types that unconditionally implement the Send [53]/Sync [54] traits or fail to ensure that Send/Sync implementations have the correct bounds.Such violations may result in data races and memory-safety issues across the thread boundaries.Initialized (37) was the second most common SP, and its typical scenarios are as follows: 1) Create an uninitialized buffer and pass it to the user-defined Read [59] implementation, allowing safe code to read uninitialized memory; 2) Increase buffer length without reserving memory, causing writeout-of-bound or dropping uninitialized memory issues; 3) Create an uninitialized NonNull pointer.
Discussion.We observed that the statistic in Figure 4 may not accurately reflect the frequency of SP usage in the real world.For example, a significant portion of CVEs on Initialized (85.3%) and Thread (92.9%) were discovered by the sslab-gatech using their static analyzer Rudra [6], which is designed to detect bugs inspired by specific bug patterns.This observation suggests that analyzers designed for bug patterns may effectively identify vulnerabilities that violate specific SPs.On the other hand, many undiscovered vulnerabilities related to specific SP may not have been registered on https://cve.mitre.org.

Statistics on crates.io Ecosystem
The findings from Section 4.1 indicate that 86.73% of the classified CVEs were caused by misusing unsafe APIs in std.This observation prompted us to conduct a statistical analysis of the usage of unsafe APIs within the Rust ecosystem.

4.2.1
Open-source Crates Database.As crates.io is the crate management platform in the Rust community, we used all its repositories to serve as a code database.To evaluate the frequency of unsafe API usage, we matched regular expressions to source code without compilation.Using function name as the criterion, we merged identical unsafe APIs, creating a dictionary of 140 unique unsafe API strings.Then we removed 20 strings that have the same name as other safe functions in the std (e.g., add [57,58]).To improve the accuracy of the statistics, we only included per-file instances if the unsafe keyword was used in the source code.

Frequency Statistics.
As of 2023-01-30, we mined all the latest crates from crates.io.The statistic indicates that 21,506 crates use Unsafe Rust among the 103,516 crates on crates.io(3,614 are yanked).For each string, we collected a statistical summary, including the number of crates in which the string appears and the total usage count of the string across all crates.The top ten most frequently used strings are listed in Figure 5, which depicts statistical results in two dimensions.We observed that the primary scenarios encompass type conversions (transmute), manual memory management (zeroed, alloc, drop_in_place), unsafe constructors (new_unchecked), deferred initialization (assume_init), unsafe indexing (get_unchecked/mut), unsafe referencing (from_ptr), and unsafe memory copies (copy_nonoverlapping).Note that the results presented above do not account for filtered strings (e.g., read, as_mut, from_raw, etc.).

SURVEYING RUST PROGRAMMERS
In this section, we conducted an online survey on Goldendata [20] to evaluate the Safety Properties in precision, significance, usability, and frequency from the perspective of experienced Rust developers.

Methodology
5.1.1Recruitment.We require participants to be at least 18 years old with a minimum of 1 year of experience in Rust programming and to have written at least 5,000 lines of Rust code, including over 1,000 lines of unsafe code.We posted our survey on the Rust-related community to recruit volunteers and emailed contributors from Rust-lang and the popular repositories on crates.io.
5.1.2Procedure.We provide each defined SP with a representative unsafe API for participants on each page.Note that the relationship between API and SP is many-to-many.Hence, the given API only targets one SP.For each API, we further supply a triplet (,  1 ,  2 ), containing a document slice  for the current SP, a sound code snippet  1 , and a misused PoC  2 . 1 and  2 are carefully designed to be short and easy to debug. 2 violates the safety requirements of the current SP, ensuring that all other SPs are satisfied.We link  to the online document for referencing, and both  1 and  2 can be redirected to the Rust Playground [22] for online execution.Furthermore, 18 out of 19  2 will trigger UB that can be captured by Miri [9], making it easier for participants to understand issues.Participants must read the definition and the triplet of each SP.Then we ask them four questions listed below: • Q1: We asked participants to rank the accuracy of SP definitions.
Can the safety requirements of each SP be explained in a concise and precise definition?• Q2: We asked participants to rank the significance of each SP.
Does violating each SP lead to unacceptable issues?Is it necessary to document such requirements explicitly in Rustdoc?• Q3: We asked participants to rank the usability of each SP.Should users consider the context of SP satisfaction in real-world unsafe programming?Does adhering to each SP help write sound code?• Q4: We asked participants to rank the frequency of each SP.How frequently do they encounter situations that require careful use of this SP?Is it often considered when crossing unsafe boundaries?
For each of the four questions, we have devised a [−1, 1] scoring scale to represent negative, neutral, and positive responses.It is crucial to note that these four questions have no objectively correct answers.The participants' responses may vary based on their unique perspectives and experiences.

Survey Results
We distributed the survey between July 10 and July 25, 2023, and received 90 responses.After review by the first and second authors, it was determined that 50 were valid.The criteria for a valid response included excluding surveys with excessively short completion times (less than 15 minutes), the same pattern throughout the entire survey, and responses inconsistent with Rust facts.
Q1: Precision Ratings.The results of Q1 illustrate participants' comprehension and endorsement of SP definitions, providing an in-depth appraisal of Goal 3.1 to 3.4.The score distribution is [6,25], with a mean of 19.3, a standard deviation of 4.8, and 16 SPs scored greater than 15.We observed that SPs with brief descriptions tended to receive higher scores such as SystemIO (24) and Leaked (25).Whereas SPs with subcategories had a lower score due to the comprehension threshold, such as ConsistentLayout (17) and AliasingMutating (14).Participants exhibited a comparatively negative attitude toward unusual SPs, specifically Aligned (6).They are also unclear about the newly introduced categories like DualOwner (12).In the optional supplementary comments, 4 participants emphasize the necessity for the document to highlight the side effects caused by violating DualOwner.
Q2: Significance Ratings.The results of Q2 indicate their perspectives on the safety issues caused by violating each SP, in accordance with Goal 3.5.The score distribution is [15,34], with a mean of 26.3, a standard deviation of 5.0, and 18 SPs scored greater than 20.We noticed that participants tend to be positive concerning memory safety.Rust developers devote particular attention to memory safety and evince a predictable sensitivity to the safety requirements of unsafe code.However, an exception existed, which is Unreachable (15).The majority of participants viewed Unreachable as an inconsequential problem and instinctively assumed that panic is always memory-safe, losing focus on potential threats to panic safety.
Q3: Usability Ratings.The results of Q3 represent the sensitivity to the context when crossing unsafe boundaries in real-world unsafe programming, thus addressing Goal 3.6.The score distribution is [12,28], with a mean of 21.4, a standard deviation of 4.2, and 15 SPs scored greater than 20.Notably, there is a significant correlation between the scores in Q2 and Q3.It can be explained by assuming that users may be less likely to check the requirements in real-world situations if they perceive one SP as insignificant.Conversely, if their programs are less affected by one SP in practice, they may perceive it as unimportant, which is consistent with their intuition.We observed that 89.4% of the Q2 scores are higher than Q3, indicating that participants may have a higher awareness of significance than programming habits in real-world practice.
Q4: Frequency Ratings.The results of Q4 reveal the frequency with which Rust developers encounter each SP in unsafe programming.The score distribution is [−6, 25], with a mean of 8.6 and a standard deviation of 8.8.These responses further validated the results in Section 4.2, which measured the frequency of unsafe API usage in crates.io.We found significant discrepancies in the score distribution for this question.It shows that Allocated (25), Freed (22), Leaked (20), and Initialized (17) are encountered more frequently in Unsafe Rust.We infer that these SPs are tightly connected with common scenarios, including manual memory management and deferred initialization.As for Encoding (20), users may use it frequently to interact with C code in unsafe contexts.For SPs with lower or even negative scores, we suggest that the Rust developers and community may need to pay more attention to avoid misusing them.
The survey results confirmed our classification of safety properties with statistical significance.It is necessary to define a systemic classification, as experienced Rust programmers highly care about memory safety issues caused by unsafe code.At last, it also reveals a significant variation in the occurrence frequency of different SPs in real-world Rust programs.

THREATS TO VALIDITY
For internal validity threats, using std unsafe documents as our knowledge base might not provide exhaustive coverage for investigating the categories toward security requirements.We adopted a validation based on the CVE classification to address this limitation.Also, survey participants might not be representative enough; some might be malicious respondents, cheat on the programming experience, or send multiple submissions.To ensure internal validity, various measures were implemented.First, we utilized multiple recruitment channels, such as private email invitations.Second, we clearly outlined the mandatory requirements for programming experience.Third, the first and second authors manually verified all the responses.Fourth, we imposed restrictions on the number of submissions from the same IP address.
For external validity threats, due to the ongoing development of Rust, the programming style and usage of Unsafe Rust may evolve over time.Despite considering both stable and nightly channels, future updates may introduce new unsafe APIs, modify API descriptions and implementations, or even deprecate some unsafe APIs.We also acknowledge the existence of uncommon safety descriptions that cannot be classified based solely on the std unsafe documents or existing CVEs because they have not been documented in any std unsafe document or existing CVE.

RELATED WORK
Empirical Studies on Rust Security.Researchers have conducted empirical studies to understand how to use Unsafe Rust from realworld Rust Programs [5,17,50,51,70] and existing CVEs [69].They summarize valuable bug patterns and provide insights into different aspects of safety guarantees.However, these studies do not extract safety requirements from the safety descriptions in the API documents.Several empirical studies focus on the Rust learning curve [1,18] and the programming challenges introduced by compiler errors [71].Researchers also leveraged Rust-related Stack Overflow data to understand real-world development problems [71].However, we are more concerned with experienced system engineers who are proficient than Rust beginners.They need Unsafe Rust to achieve low-level control and better understand the safety requirements when crossing unsafe boundaries.Bug detection methods in Rust.As a strongly typed programming language, formal verifications for Rust have received considerable attention [4,10,15,28,29,31,32,35,42,68].Existing studies also employ static or dynamic analysis to detect bugs in Rust programs, including symbolic execution [8,40], model checking [6,11,38,65,67], interpreter [9], and fuzzing [16,30].We have discovered that some of the above prototypes analyze errors based on bug patterns corresponding to our SP categorization.Table 5 illustrates the relationship between the relevant tools and their supported SPs.At last, we believe this paper will encourage researchers and developers to focus on diverse safety requirements when handling unsafe boundaries.In the future, more testing tools can be developed to aid in the detection of safety violations for different SPs.

CONCLUSION
As Rust is a system programming language, Unsafe Rust is integral to achieving low-level control over implementation details.With increasing system software adopting Rust, understanding the safety requirements when crossing unsafe boundaries is crucial, particularly with well-defined categorization.To this end, we conducted the first comprehensive empirical study on safety requirements across the unsafe boundary.We focus on unsafe API documents in the standard library to infer safety properties, then categorize unsafe APIs and existing CVEs.Additionally, we conducted a user survey to gain insights into four aspects of these safety properties from experienced Rust developers.Through these efforts, we aim to promote the standardization of systematic documents for Unsafe Rust in the Rust community.

Figure 1 :
Figure 1: Example of an unsafe API in Rust std.Listing 1 provides the source code of an unsafe method of struct Manual-lyDrop<T>.Listing 2 extracts the document in Rust 1.70.The document introduces the usage, functionality, and safety requirements to comply with by using a section labeled Safety.
Correlation analysis results on the small dataset (filtered).

Figure 2 :
Figure 2: Correlation matrices for both the large and small datasets.Each figure only includes the sections with weak correlation and above (correlation greater than 0.2).

Figure 3 :Figure 4 :
Figure 3: Example of the CVE classification.The buggy source code and description of CVE-2021-45709 (in RUSTSEC).This CVE violates the safety requirement of Aligned and triggers UB when using the unsafe API slice::from_raw_parts_mut.

t r a n s m u t e f r o m _ p t r z e r o e d a s s u m e _ i n i t n e w _ u n c h e c k e d g e t _ u n c h e c k e d c o p y _ n o n o v e r l a p p i n g g e t _ u n c h e c k e d _ m u t a l l o c d r o p _ i n _ p l a c eFigure 5 :
Figure 5: Statistics results on unfiltered strings (unsafe APIs) in the crates.ioecosystem.The top ten most frequently used strings across all repositories and their source code occurrences sorted by the sum of crates.

Figure 6 :
Figure 6: Survey results on Unsafe Rust programmers.Precision, significance, usability, and frequency are the four dimensions rated for each SP.Positivity, neutrality, and negativity are the options for each dimension.
• We classified 19 safety properties into two categories.All stdunsafe APIs were audited and labeled with safety properties.The labeled data were evaluated via correlation analysis, yielding interpretable results.•Wecategorized all Rust CVEs based on safety properties, forming a collection of related issues that can serve as a benchmark.Unsafe API usage statistics are collected within crates.io to understand the usage frequency of unsafe APIs.
• We conducted an online survey and confirmed our categorization of safety properties with statistical significance.

Table 2 :
Invalid value for Rust types, alone or as a field of a compound type, will trigger undefined behavior.
Value that has been initialized can be divided into two scenarios: fully initialized and partially initialized.The initialized value must be valid at the given type (a.k.a.typed).
impl<T: ?Sized> *const T::as_ref Aligned 67 Value is properly aligned via a specific allocator or the attribute #[repr], including the alignment and the padding of one Rust type.impl<T: ?Sized> *mut T::swap Consistent Layout 110 Restriction on Type Layout, including 1) The pointer's type must be compatible with the pointee's type; 2) The contained value must be compatible with the generic parameter for the smart pointer; and 3) Two types are safely transmutable: The bits of one type can be reinterpreted as another type (bitwise move safely of one type into another).impl<T: ?Sized> *mut T::read Unreachable 9 Specific value will trigger unreachable data flow, such as enumeration index (variance), boolean value, closure and etc. impl<T> Option<T>::unwrap_unchecked Exotically Sized Type 24 Restrictions on Exotically Sized Types (EST), including Dynamically Sized Types (DST) that lack a statically known size, such as trait objects and slices; Zero Sized Types (ZST) that occupy no space.trait GlobalAlloc::alloc System IO 25 Variables related to the system IO depends on the target platform, including TCP sockets, handles, and file descriptors.trait FromRawFd::from_raw_fd Thread 3 Types that can be transferred across threads (Send) or types that can be safe to share references between threads (Sync), respectively.std::marker::Sync Postcondition Safety Property Dual Owner 31 Multiple owners (overlapped objects) that share the same memory in the ownership system by retaking the owner or creating a bitwise copy.impl<T: ?Sized> Box<T>::from_raw Aliasing & Mutating 30 Aliasing and mutating rules may be violated, including 1) The presence of multiple mutable references; 2) The simultaneous presence of mutable and shared references, and the memory the pointer points to cannot get mutated (frozen); 3) Mutating immutable data owned by an immutable binding.impl CStr::from_ptr Outliving 28 Arbitrary lifetime (unbounded) that becomes as big as context demands or spawned thread, may outlive the pointed memory.impl<T: ?Sized> *const T::as_uninit_ref Untyped 20 Value may not be in the initialized state, or the byte pattern represents an invalid value of its type.core::mem::zeroed Freed 17 Value may be manually freed or released by automated drop() instruction.impl<T: ?Sized> ManuallyDrop<T>::drop Leaked 13 Value may be leaked or escaped from the ownership system.impl<T: ?Sized> *mut T::write Pinned 5

Table 4 :
SP pairs with correlation coefficients (CC) greater than 0.4 both in the large and small dataset correlation matrices.ConsistentLayout, and Aligned.RelativeBound should be considered if it has pointer arithmetic.Even though Initialized is not stated in Table4, we still view it as a prerequisite for dereferencing, as Rust's undefined behavior has explicit requirements for valid values of raw pointers.In this paper, we advocate for the safe usage of raw pointers by satisfying these 6 PRE-SPs.

Table 5 :
Open-source code analyzers that can detect specific SPs.Each static bug detection tool may not necessarily support all scenarios related to the corresponding SP.Results and Benchmark.We conducted a study on 404 CVE descriptions and performed a code review on the remaining 196 CVEs after filtering.We classified them based on SP categorization and analyzed their distribution.It has been manually verified that the causes of these CVEs do not exceed our SP classification.Finally, we generated a benchmark encompassing the various SPs in the identified CVEs.