Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors in Rust

Rust is a general-purpose programming language designed for performance and safety. Unrecoverable errors (e.g., Divide by Zero) in Rust programs are critical, as they signal bad program states and terminate programs abruptly. Previous work has contributed to utilizing KLEE, a dynamic symbolic test engine, to verify the program would not panic. However, it is difficult for engineers who lack domain expertise to write test code correctly. Besides, the effectiveness of KLEE in finding panics in production Rust code has not been evaluated. We created an approach, called PanicCheck, to hide the complexity of verifying Rust programs with KLEE. Using PanicCheck, engineers only need to annotate the function-to-verify with #[panic_check]. The annotation guides PanicCheck to generate test code, compile the function together with tests, and execute KLEE for verification. After applying PanicCheck to 21 open-source and 2 closed-source projects, we found 61 test inputs that triggered panics; 59 of the 61 panics have been addressed by developers so far. Our research shows promising verification results by KLEE, while revealing technical challenges in using KLEE. Our experience will shed light on future practice and research in program verification.


INTRODUCTION
Rust was created to ensure high performance comparable to that offered by C and C++, while emphasizing the code's safety-the Achilles heel of the other two languages [1].Rust's error handling offers a robust and expressive mechanism that encourages developers to handle errors gracefully and explicitly.
Rust groups errors into two categories: recoverable and unrecoverable errors [2].A recoverable error (e.g, File Not Found) is an error that does not cause the program to terminate abruptly.A program can retry the failed operation or specify alternative actions when it encounters a recoverable error [3].For instance, if a Rust program attempts to open a file that does not exist, it is a recoverable error because the program can then proceed to create the file [4].An unrecoverable error (e.g., Index Out Of Bounds) causes a program to fail abruptly.A program cannot revert to its normal state if an unrecoverable error occurs.It cannot retry the failed operation or undo the error.Namely, unrecoverable errors are symptoms of bugs, more dangerous than recoverable ones.
Most languages do not distinguish between these two kinds of errors; they handle both in the same way using mechanisms such as exceptions.Rust does not have exceptions [3].Instead, it has the type Result<T, E> for recoverable errors and the panic!macro (a Rust macro is like a function) that stops execution when the program encounters an unrecoverable error.By checking whether the return-type of a Rust function is Result<T, E>, developers can easily identify recoverable errors, and implement code to eagerly handle those errors before compiling or running their software.However, it is much harder to identify unrecoverable errors.This is because such errors are not signaled by any dedicated return-type; developers have to reason about program semantics intensively to reveal errors.For simplicity, this paper uses panics to consistently refer to unrecoverable errors [5].
To explore potential panics in Rust code, prior work leverages symbolic execution to verify Rust programs.Specifically, they tried to compile Rust source code to LLVM bitcode, and used KLEE [6][7][8], a symbolic execution engine for LLVM, to symbolically execute Rust test code and uncover panics.For instance, Rust verification tools (RVT) [8,9] is a collection of tools/libraries to support both random testing and verification of Rust programs.RVT provides libraries to patch the LLVM IR to support KLEE features like symbolic values.It requires users to manually define parametrized unit tests, compiles those tests together with Rust code into bitcode files, and automates the process of invoking KLEE on bitcode files.
However, it is challenging for developers who lack the domain expertise to write a parametrized unit test [10].This specialized skill set requires an in-depth understanding of the function's intricacies, potential edge cases, and symbolic execution background to call that function in specialized ways [11].Besides, even with RVT's help, it is still infeasible to do a large-scale evaluation due to too much human effort to configure and write tests for every project.This raises questions about its real-world applicability and reliability when used in Rust-based environments.As a result, many developers may hesitate to adopt symbolic execution until userfriendly tools emerge or case studies emerge that demonstrate its prowess in the Rust ecosystem.
To fill these gaps, we developed PanicCheck, a semi-push-button [12] dynamic test verification tool tailored for Rust.PanicCheck only requires engineers to annotate functions with #[panic_check].It then handles compilation, test generation, symbolic execution, and panic checking automatically.This automation enables large-scale empirical evaluation of PanicCheck on Rust code.As shown in Figure 1, with PanicCheck, we can verify function fname(...) by annotating it with #[panic_check].Given the annotated program, PanicCheck goes through four phases.Phase I compiles the program to extract the function name and parameters.Phase II determines how to create a symbolic variable for each parameter.Phase III creates and compiles a parametrized unit test to declare symbolic variables and call fname(...) with those variables.Phase IV executes the test with KLEE, to output any test inputs that trigger panics.
We evaluated PanicCheck and KLEE on real-world Rust projects, guided by two goals.First, we aimed to rigorously measure KLEE's effectiveness at finding panics in production Rust code.Second, we sought to identify areas for improvement in the symbolic execution workflow for Rust.We applied PanicCheck to 21 popular opensource Rust programs, and 2 large production-grade closed-source Rust programs that served as key infrastructures at ByteDance.We annotated hundreds of functions with #[panic_check], and passed all annotated functions to PanicCheck for program verification.In total, PanicCheck revealed 61 panics in 6 of the projects.By examining developers' later changes to their projects, we found that 52 of the panics were fixed.Furthermore, we filed bug issues or pull requests for the remaining nine panics; so far, developers have fixed seven.The results provide new insights into KLEE's capabilities as well as guides future tool development.
We made the following contributions in this paper: • We created a tool PanicCheck, which wraps the usage of KLEE and streamlines the verification process.Because little manual effort is required, PanicCheck enables us to conduct the large-scale case study, and helps us avoid human errors when creating unit tests.all 61 panics it revealed in our study, we characterized the strengths and weaknesses presented by the tool.
In the following sections, we will first introduce the technical background KLEE and RVT (Section 2.1), and describe a running example (Section 2.3).Then we will explain PanicCheck (Section 3) and our experiment in detail (Section 4).

BACKGROUND AND MOTIVATION
In this section, we will first introduce the technical background of KLEE and RVT.Then we will describe a concrete scenario of Rust code verification to motivate our research.

KLEE
KLEE is a dynamic symbolic execution engine built on top of the LLVM compiler infrastructure [13], to automatically explore paths through a program and decide what inputs cause which part of the program to execute.Theoretically speaking, it can run any program compiled to LLVM bitcode.In practice, it has been mainly applied to C/C++ programs.
Given a function-to-verify (FTV), KLEE conducts inter-procedural analysis to explore various possible execution paths, and synthesizes the constraints on symbolic variables for explored paths.For each path, KLEE uses the constraint solver STP to solve the path condition, to decide whether that path is feasible.For each feasible path, KLEE generates a concrete input triggering the path, and checks if there are any values that can cause an unrecoverable error.KLEE is known to have the following limitations [14].
• Path explosion: The number of paths through a program can be exponential in the size of the program.Therefore, unless the program under analysis is small, KLEE cannot finish checking all possible paths in a timely manner and users need to set a timeout to terminate its execution.• Bounded checks for loops: In general KLEE cannot show that a loop will always behave correctly.It only checks some of the possible executions of a loop.• Long time spent in constraint solving: When some path conditions or constraints are hard to solve, STP may spend overly long time trying to find satisfying value assignments.
The kinds of bugs KLEE can find are memory errors (e.g., buffer overflows and null-pointer dereference), division/modulo by zero, over shifts, and assertion violations [15].

Rust Verification Tools (RVT)
RVT [8,9] is a collection of tools and libraries to support both random testing and verification of Rust programs.It provides the functionalities to compile the Rust projects to LLVM bitcodes and invoke KLEE to verify the program against the LLVM bitcodes.To write a test with RVT, developers need to 1) define a test function to assert certain properties for their program (e.g., no panic will occur), 2) specify how each parameter should be symbolized using RVT's domain-specific language (DSL) to generate the test inputs.Then, developers invoke RVT to compile the program into LLVM bitcode, synthesize the constraints on symbolic variables for each execution path, and decide whether the property always holds for the function-to-verify.
It can be challenging for developers to manually write test functions with RVT due to two reasons.First, the test functions involve the traits or APIs defined by RVT (e.g., abstract_value(...)), requiring that developers have sufficient domain knowledge of RVT and KLEE.Second, when a test function needs to prepare parameter inputs of complex data types (e.g., u8 slice reference), developers have to carefully prepare compound data structures (e.g., vector), by properly composing symbolic variables.
Here, [u8] means u8 slice-a dynamically sized type representing a view into a contiguous sequence of elements of type u8 [18]; &[u8] refers to any reference to a variable of type u8 slice.Once an input is provided, the function decodes the input, and returns a tuple that includes (1) the decoded content and (2) a value of type usize (i.e., the pointer-sized unsigned integer type).
To verify the function together with all functions called by that function (e.g., zigzag_decode(result)) via RVT, developers need to manually craft a test function similar to the one shown by lines 16-28 of Listing 1.The demonstrated test code prepares a value of type &[u8], calls decode_var(...) with that value, and checks whether any panic occurs.Specifically, to prepare the input parameter, the test code first declares a vector of u8, with an initial size set to 30 (see lines [18][19].Next, it defines 30 symbolic variables of type u8, by repetitively calling the function u8::abstract_value() (see lines [20][21].These symbolic variables are important for KLEE to later verify the program via symbolic execution.After declaring 30 symbolic variables and storing them into the vector variable v (see lines [20][21][22][23], the test code tentatively makes the call decode_var(&v), where &v is a vector reference and is also of the type u8 slice reference.
In summary, when developers write a test, they need to: • Identify the types of input parameters, no matter whether they are primitive, compound, or collection types.

PANICCHECK
As described above, it can be tedious and error-prone for developers to manually write test functions, when they want to verify lots of  Phase IV feeds KLEE with the generated executable version to reveal panics.Given an annotated function, PanicCheck executes all phases by issuing the command "cargo-verify --backend=KLEE --tests".This command performs two tasks: (T1) to build the Rust program as well as all available test functions using the Rust compiler, and (T2) to execute the built code with KLEE.We implemented Phases I-III as an integral macro, which rewrites Rust code by creating and adding in a parametrized test.The macro is then loaded in the compilation process (T1), where it receives the token stream of annotated function from compiler for syntax-tree creation, analysis, and manipulation.Phase IV corresponds to the execution process (T2).Because we did not do anything in particular for Phase IV, we will focus our discussion on Phases I-III.

Context Extraction
Given the token stream of an annotated Rust function, Phase I uses a parsing library-syn [19]-to parse tokens.It also invokes APIs of syn to traverse the resulting syntax tree in order to locate the function signature, which includes the function's name, parameter list, and parameters' data types.

Symbolic Variable Creation
For each parameter extracted in Phase I, Phase II determines how to create a corresponding symbolic variable processable by KLEE.So far, PanicCheck can provide full or partial support for the symbolic variable creation of 28 data types.These 28 types include 18 primitive types, 4 compound types, and 6 collection types.1, by calling the RVT APIs data_type::abstract_value(), PanicCheck fully supports variable generation for 16 of the 22 primitive types.It also fully supports the unit type.Because the unit type has only one value "()", we do not need to generate any symbolic variable for the data type, neither does KLEE need to enumerate values.PanicCheck partially supports variable generation for the reference type.It can declare symbolic variables for shared (i.e., immutable) references, but not for exclusive (i.e., mutable) references.Typically, to generate a reference variable of type T (i.e., &T), PanicCheck needs to first create a symbolic variable of type T (e.g., u8), and then use the reference to that variable as the created reference variable (e.g., of type &u8).Due to the time limit, we did not implement PanicCheck to generate syntax trees or code for exclusive references.We plan to address this limitation in the future, by extending our current parser implementation as well as the templates for code generation.

Primitive Types. As shown in Table
Among the remaining four types, PanicCheck does not support fn or pointer as KLEE does not handle pointers well.This is because the memory address space is huge; KLEE can easily get stuck with the state explosion issue when symbolizing a pointer to enumerate address values.Notice that our treatments for references and pointers are totally different because in Rust, even though references and pointers have the same underlying data-addresses for some memory, they have different constraints and semantics with the compiler [20].Namely, references have rules enforced by the compiler: (1) they cannot outlive what they refer to (the "referent"); (2) mutable references cannot be aliased.References behave like the variables they point to.They have a type, and developers can interactive with that type to read it or (with mutable references) modify it.On the other hand, pointers are semantically more about addresses.When developers interact with pointers, they modify addresses instead of the variables pointed to.When they print pointers without using the unsafe keyword, addresses are printed out.
Additionally, PanicCheck does not support slice or str.Both slice and str are dynamically sized types-types without a statically known size or alignment [21].Because Rust must know the size and alignment of things in order to correctly work with them, dynamically sized types can only get used via references (e.g., &str) and parameters of these types must be declared as references.

Compound
Types.PanicCheck provides partial tool support for four compound types: array, enum, struct, and tuple.Two reasons can explain why the array type is not fully supported.First, developers can declare arrays to have arbitrary lengths.When an array variable contains a very large number of elements (e.g., >30), PanicCheck needs to define many independent symbolic variables, adding them to an array in order to generate a symbolic array variable.When enumerating possible states of all those element symbolic variables, KLEE will encounter the state explosion problem and work ineffectively to reveal panics.Second, when an array has a compound or collection type as its element type, e.g., array of arrays, too many primitive-typed independent variables can be nested into the array level-by-level, making KLEE fail.Based on our experience, KLEE can respond in a timely manner when an array has at most 30 primitive-typed elements, so we built PanicCheck accordingly.
PanicCheck does not fully support enum or struct because both types allow developers to define custom data structures.While custom data structures can be very different from each other, the elements of a custom data structure can also have complex data structures.It can be very challenging to properly generate symbolic variables for such data types.Therefore, currently PanicCheck only supports variable creation for three widely used built-in types: Option, Result, and String.In the future, we will conduct more advanced static program analysis to characterize custom data structures, and extend PanicCheck to generate symbolic variables for those structures.
Rust allows each tuple to have 2-11 elements.However, if a tuple has some compound-typed or collection-typed elements, the total number of independent variables in the tuple can become too large for KLEE to explore.To ensure that KLEE can often respond to PanicCheck in a timely manner, we built PanicCheck to only model tuples that are declared to have primitive-typed elements.

Collection
Types.PanicCheck provides partial support for six collection types: Vec, VecDeque, LinkedList, BTreeMap, BTreeSet, and BinaryHeap.This is mainly because each collection can have an arbitrary number of elements.When elements are symbolized as independent variables, there is no way that KLEE can fully support the state enumeration for all variables' value combinations.Consequently, we set the length of Vec, VecDeque, and LinkedList to 30 based on our experimental experience with KLEE.We noticed that KLEE becomes extremely slow and usually produces no output if the length goes beyond 30.We set the length of BTreeMap and BTreeSet to 10.This length is smaller than 30, mainly because the data types leverage B-Tree, a data structure more complex than vectors and lists.We set the length of BinaryHeap to 5 also because of the complexity of the internal data structure.
PanicCheck does not support HashMap or HashSet, because KLEE often wastes time verifying the hashing algorithm used in Rust [22,23] instead of verifying the actual program logic.During the test generation for FTV, a trait (analogous to Java interface) named Strategy1 is always declared; it declares a uniform function interface value_gen(...) that is callable by tests to declare symbolic variables.For each parameter type declared by FTV (e.g., bool), PanicCheck defines an implementation (e.g., impl Strategy1 for bool) to implement the declared trait and function; the implemented function invokes API AbstractValue::abstract_value() as needed to create symbolic variables.Note that RVT declares and defines the trait AbstractValue, so that the type that implements this trait can always generate variables processable by KLEE.Because our software design follows the Strategy design pattern, the test function Panic-Check creates is semantically equivalent instead of fully identical to the one shown in Listing 1.

LLVM Bitcode Generation.
The cargo-verify command issued by PanicCheck (see the beginning of Section 3) automatically converts the source code of generated test function into LLVM bitcode.Thanks to the command usage, PanicCheck does not need to implement anything to enable the conversion.In this conversion process, the command also (1) injects value checks to guard critical instructions and (2) inserts panic!macros for any potential failure of value checks.All such insertions are automated by KLEE.
Because the compiled LLVM bitcode is hard to read and explain, to facilitate presentation, we use Rust code in Listing 3 to present the semantics of compiled LLVM bitcode for decode_var(...).In the code, we use "..." to omit less important code details.As shown in the  simplified code, the compilation process injects two if-statements separately for the bitwise left shift operator (<<) and the plus equals operator (+=).The first if-statement ensures that the number of bits specified does not go beyond the total number of bits available in a u64 number (i.e., 64).If shift > 64, a panic is generated.Similarly, the second if-statement ensures that the result of plus equals does not overflow; if the result is larger than usize::Max, the program panics.

EVALUATION
We implemented PanicCheck for the Rust version 1.47.0-nightly(2020-08-02).The LLVM release we leveraged is 10.0.0, and the commit of KLEE version we adopted is c51ffcd377097ee80ec9b0d6f07f8ea583a5aa1d.
To investigate how PanicCheck can help with revealing panics in Rust programs, we applied PanicCheck to both open-source and closed-source projects.Specifically inside the company ByteDance Ltd., we applied PanicCheck to two internal projects (closed-source) 1 . 1 The experiment was conducted when first four authors were at ByteDance.Additionally, we created a dataset of 21 open-source projects by mining crates.io[25]-the Rust community's crate registry, in order to also apply PanicCheck to those projects.In the following part of this section, we will first introduce our open-source dataset and the experiment setup.Then we will describe the experiment results.

The Open-Source Dataset
To build the open-source dataset, we first used the keyword "parser" to search on crates.iofor projects with at least 400,000 downloads.We chose this keyword based on the advice of domain experts of formal verification, who mentioned that the Rust libraries related to parsers are less likely to be well tested and thus more likely to suffer from panic issues.Among the popular projects with at least 400,000 downloads, we selected projects using the following criteria: With the criteria mentioned above, we included 20 popular projects into our dataset.Additionally, we noticed that the open-source project GNU core utilities (coreutils) [26] was once used to effectively evaluate KLEE [13].Thus, we also included a Rust version of coreutils.Please refer to Table 2 for a full list of the open-source projects in our dataset.All these projects have code publicly available at GitHub.

Experiment Setup
In our evaluation, we did not annotate every single function of subject projects with #[panic_check] for two reasons.First, it is very time-consuming to verify every function, although some functions are more important or more frequently executed than the others.Second, as KLEE conducts inter-procedural analysis, it is quite possible that the symbolic execution of some functions can fully cover that of other functions.To verify the most frequently executed functions without incurring too much effort of redundant verification, we decided to annotate only entry functions in each selected project.Among all functions within a given project, we chose entry functions using the following criteria: • If the project has proptests [27] already defined for some functions, we treat those functions as entry functions because developers are likely to apply proptest to the most important functions.Proptest is a property testing framework.It randomly generates inputs to test whether certain properties always hold for a given program; whenever a failure is found, it automatically finds the minimal test case to reproduce the problem.• If the project has no proptest defined but contains a file lib.rs, we treat all application programming interfaces (APIs) listed in that file as entry functions.The lib.rs file of a project   implies the project to be a software library, while the APIs listed in that file are accessible by library users.Thus, those APIs are important to verify.
• If the project has no proptest or lib.rs defined but contains a main function, we treat the function as an entry function.We believe that the main function typically executes the most important functionalities.With the criteria mentioned above, we annotated in total 125 entry functions in open-source projects and more than 40 functions in closed-source projects.The column # of Functions Annotated in Table 2 shows the distribution of the 125 entry functions.In particular, there are 100 subprojects in coreutils, and each subproject defines a main function.Thus, we annotated 100 functions in coreutils.As we conducted all experiments in May 2021-October 2021, all program versions we experimented with were downloaded during that period.

Experiment Results
In Table 2, the column Build Time without PanicCheck shows the time cost of purely building each project without involving any step of PanicCheck.Build Time with PanicCheck describes the total time cost of (1) a clean build and (2) the first three steps of PanicCheck.Namely, any time difference between the two columns shows the runtime overhead incurred by PanicCheck's first three steps.By comparing the measured values for these columns, we found PanicCheck to incur 6-128 seconds to the build procedure.
Namely, PanicCheck expanded the compilation overhead by 0.4-21.2times.Such overheads were introduced macro expansion, Panic-Check compilation, and bitcode generation.Thanks to the Rust conditional compilation, such overheads will not affect the build process in production mode because PanicCheck is only executed in the testing mode.Thus, developers do not need to remove those macros when building the production binary.
Column Verification Time shows the runtime overhead of KLEE execution, corresponding to the fourth step of PanicCheck.For 56 subprojects of coreutils and 5 other projects, KLEE execution finished quickly and spent 1-886 seconds on each project.Panic-Check either reported no panic after exploring all paths or revealed the first panic it encountered.However, for another 44 subprojects of coreutils and 16 other projects, KLEE execution could not finish within the allocated time-2 hours.In particular, for nine projects, the verification procedure was stuck with the problem of state explosion: there were too many states for KLEE to enumerate.KLEE could not enumerate all states or verify any function.For the remaining (sub) projects, KLEE could not finish its exploration within two hours although it was not stuck with state explosion; its explogot slowed down by the value enumeration for variables of complex/compound data types or String.We still considered these projects to partially pass formal verification due to a time limit.
Finding 1: Among the 125 functions annotated for 21 open-source projects, PanicCheck revealed 59 panics for 59 functions but failed to verify 11 functions due to state explosion; 3 functions passed complete verification and 52 functions passed partial verification due to the time limit.
We also annotated more than 40 functions in 2 closed-source projects of ByteDance.These two projects belong to the Key Management System (KMS).KMS is an internal key management service that other internal services leverage to perform encryption and decryption.The two projects used in our experiment contain several thousand lines of code in total (no more than 10 thousand LOC).They are real-world crucial Rust projects, instead of toy examples crafted by the paper authors for research purposes.Internally, ByteDance requires KMS to have no unrecoverable error, as panics in this service can lead to serious consequences like data loss or service disruption.In our experiment, we applied PanicCheck to functions related to certificate parsing, encryption, and decryption, in order to check whether those functions have unrecoverable errors.PanicCheck revealed in total two panics in the projects, both of which were later confirmed and fixed by ByteDance developers.
In total, we found 61 panics in 23 projects (21 open-source + 2 closed-source), when we performed the experiment in 2021.To investigate developers' responses to those panics or software bugs, we further examined the more recent version of these programs as of September 2023 (before submitting this paper).to see whether those bugs were already fixed.If a program's latest version could take in the panic-triggering input and execute smoothly, we concluded that developers recently fixed the bug relevant to that panic.Otherwise, we filed a bug issue for each revealed but unresolved panic and sought developers' feedback.So far, we have observed that 52 panics were already resolved by developers before we filed any issue report.We filed 9 reports for the remaining panics; for 7 panics, developers have confirmed the reported issues and fixed bugs accordingly;   We further inspected the content of 61 panics, and recognized two major root causes.First, 45 of the panics share the same error message "unexpected invalid UTF-8 code point".These panics all occurred in subprojects of coreutils, due to the usage of a library clap [28].When these subprojects passed invalid UTF-8 strings (e.g., ./expand"È") to a clap API, the API does not properly handle the invalid inputs and thus triggers panics.Recently we observed that the clap developers improved their API implementation, to cause no panic in any of the projects invoking that API.
Second, nine panics are about calling unwrap() functions on invalid values.All these panics occurred in subprojects of coreutils.For instance, in the timeout subproject of coreutils, unwrap() was once called on the return-value of options.value_of(...) (see Listing 4).Although developers assumed that options.value_of(...) always returns normal values, it turned out that the method call can return an Err-typed value.Calling unwrap() on that value can trigger a panic and halt the program execution.Developers fixed such bugs by conducting value checks before calling unwrap() functions.To pinpoint the root causes of state explosion, we annotated functions called by the entry point during execution.However, functions with unsupported features (e.g., lifetime scope annotations) were excluded from annotating.The identified root causes are presented in Table 3. Notably, seven projects faced issues due to string enumeration; given the vast search space of strings or bytes, KLEE could not verify these projects within the specified time.For ryu, the parser's pursuit of optimal performance led to the conversion of unsigned integers into raw pointers.This caused KLEE to enumerate the memory space, significantly expanding the state space and consequently triggering state explosion.
Finding 4: PanicCheck cannot handle the string and raw pointers cases well due to the limited capability of KLEE.

THREATS TO VALIDITY
Threats to External Validity.All the empirical observations we made so far are based on our experimental dataset.These observations may not generalize well to other Rust programs.In the future, we would like to include more projects into our evaluation, so that our findings can become more representative.

LESSONS LEARNED
By enabling large-scale usage, PanicCheck reveals both strengths and weaknesses of KLEE when it is applied to Rust programs.

Advantages of Applying KLEE to Rust Programs
Our study confirms that KLEE can generate meaningful test inputs, and reveal unrecoverable errors existing in Rust programs.All errors reported by KLEE are true positives; there is no false alarm (false error) reported by KLEE.Furthermore, it can even effectively identify the unrecoverable errors overlooked by developers or manually developed test suites.One possible reason to explain this phenomenon is that developers may not be good at thoroughly testing Rust programs.When program logic is complex, developers may only focus on the main paths that are frequently executed and majorly check for design errors.Because KLEE systematically explores feasible paths in programs, it is able to capture edged cases.Additionally, KLEE examines software for errors relevant to memory accesses, division/modulo by zero, over shifts, and assertion violations; thus, the errors it finds can complement the design errors that developers focus on.

Limitations of Applying KLEE to Rust Programs
We noticed that KLEE is inapplicable to generate test cases for many functions in our dataset.Three major reasons can explain KLEE's limited applicability.First, it cannot analyze concurrent programs.
Second, it cannot symbolize the size of memory allocation.Third, it provides very limited support for pointers (i.e., memory addresses).Additionally, even though KLEE is applicable to verify some functions, it cannot finish verification within a reasonable period of time (e.g., two hours) for two reasons.First, it supports a very limited set of built-in collection types (e.g., BinaryHeap).In particular, when a collection variable contains lots of element variables, symbolizing each element variable can make the overall program state space overwhelmingly large, considerably prolonging the verification procedure.Second, KLEE does not support String variables to contain characters from big vocabularies (e.g., ASCII, UTF-8).This is because when a variable can have strings composed of very diverse characters, generating strings of certain format is almost infeasible or computationally expensive.
To better verify Rust programs with KLEE, we plan to improve PanicCheck in two ways.First, we will statically analyze programs to learn how developers' customized data types are formulated with primitive data types.In this way, PanicCheck can automatically generate symbolic variables for more compound types.Second, when a function-to-verify is called, some of the parameters it takes may require specialized values satisfying certain requirements (e.g., syntax or regular expressions), which values can be very hard for KLEE to generate even though they are not part of the path conditions to trigger panics.We plan to extend PanicCheck so that developers can provide concrete inputs for those variables, to accelerate KLEE's exploration process and reveal more panics.

RELATED WORK
Our research is related to empirical studies with KLEE and Rust verification tools.

Empirical Studies with KLEE
People conducted several empirical studies using KLEE [30][31][32][33][34][35][36][37][38].Specifically, Wang et al. [31] compared KLEE-based test suites with manually developed test suites.They observed that KLEE-based test suites have advantages in exploring error-handling code and exhausting options, but are less effective on generating valid string inputs and exploring meaningful program behaviors.Such complementarity between KLEE-based tests and human-crafted tests was also observed by Kurian et al. [38], who applied KLEE to generate test cases for safety-critical embedded software.
As a DSE engine, KLEE provides 10 path search approaches.Two of the approaches belong to random search, while eight approaches belong to heuristic search.To investigate which approach performs best, Zhang et al. [37] applied the 10 approaches to 53 GNU coreutil applications.They found that without constraint optimization, one approach of random search (i.e., random path) outperforms the others in terms of the number of completed paths, statement coverage, and branch coverage.Dong et al. [32] did a similar study through analyzing the 33 optimization flags implemented by LLVM but used by KLEE.They observed that on average, applying optimizations makes symbolic execution worse for coreutils applications.
Two studies were conducted to compare alternative implementations of KLEE and its extension [33,34].In particular, Kapus et al. [34] compared an implementation of KLEE using a partial solver based on the theory of integers, with the standard KLEE implementation using a solver based on the theory of bit vectors.They did not observe significant differences between the two.Liew et al. [33] compared two alternative implementations of a KLEE extension component: floating-point symbolic execution.They observed that the tools complement each other, and neither offers a silver bullet.
Kim et al. [30] applied symbolic execution tools (CREST-BV and KLEE) and a static analyzer (Coverity) to the same program, to compare their results.The researchers detected six bugs through symbolic execution, none of which were detected by Coverity.Busse et al. [36] hypothesized that if a static analyzer (Clang Static Analysis or Infer) produces (1) a partial program trace, and (2) conditions to trigger a bug, then KLEE can (a) guide its search to prioritize paths following that trace, and (b) prune paths using those conditions.Their experience of implementing the technique highlights two negative results.First, the partial traces are not that useful in guiding search.Second, static analyzers can rarely find non-trivial bugs.Xu et al. [35] developed a dataset of logic bombs and a framework for benchmarking symbolic execution tools automatically.
Our research is different from all the studies mentioned above, as it applies KLEE to verify Rust instead of C code.

Verification Tools for Rust Programs
Various techniques were recently created to verify Rust programs [6,7,[39][40][41][42][43][44].CBMC [45] is a bounded model checker for C and C++ programs.CRUST [40] and Kani [42] verify Rust programs by translating code into C-like languages and using CBMC.Facebook's experimental MIRAI [43] is an abstract interpreter for the Rust compiler's mid-level intermediate representation (MIR).It explicitly prioritizes a low false-positive rate for bugs rather than a low false-negative rate, and thus does not claim to provide sound verification [42].Similar to KLEE, Crux-MIR [44] conducts symbolic execution to verify programs written in C/C++ and Rust.However, it models memory usage differently from KLEE.
Prusti [41] is a Rust compiler plugin built on the Viper verification infrastructure [46].It analyzes information from the Rust compiler and synthesizes a corresponding core proof for the program.To verify correctness properties beyond memory safety, users can annotate Rust programs with specifications at the abstraction level of Rust expressions; the technique waives all annotations into the core proof to verify modularly whether these specifications hold.SMACK [39] is a software verification toolchain that translates LLVM IR code into Boogie intermediate verification language [47], which is verified by Boogie verifiers like Corral [48].SMACK was initially designed to support Clang as a frontend; Baranowski et al. [49] extended SMACK to also verify Rust code.
Lindner et al. [6,7] recently proposed two alternative approaches to verify Rust programs based on the KLEE symbolic execution.One approach is contract-based verification [6].The researchers demonstrated that by properly implementing contracts (i.e., preand post-conditions of Rust functions) in Rust programs, they enabled KLEE to find contradictions between contracts, and thus to explore the composite behaviors of functions with reduced complexity.The other approach is annotation-based verification [7].The researchers demonstrated the new approach using a safety function (eq) from the PLCopen library.Given the function, the researchers first formulated assertions directly from the overall safety properties of the PLCopen specification; then they verified the overall safety with KLEE.
The techniques mentioned above were proposed to verify Rust programs in various ways.Our work is wrapped in the usage of KLEE in PanicCheck to explore KLEE's effectiveness in verifying Rust programs.With our wrapping logic, the KLEE could be simply changed to the verification engines mentioned above.Developers can implement the underlying trait to support different verification tools they desired.

CONCLUSION
In this paper, we developed PanicCheck that relieves the programmers' burden of writing test cases and enables large-scale study of KLEE on Rust programs.We then use PanicCheck to carry out a case study to investigate how effectively KLEE can help developers reveal panics in practice.The major findings of our study include: (1) among the functions we studied, KLEE revealed in total 61 panics that reside in 6 projects; (2) 59 of the 61 panics have been addressed by developers; (3) 54 of the panics occurred because unexpected or invalid values were provided to method APIs; (4) KLEE does not work effectively when FTV involves concurrency or complex data types.In the future, we plan to further improve PanicCheck to support more Rust-specific features (e.g., lifetime annotation) and to integrate more formal verification techniques (e.g., SeaHorn [50]).In this way, we can assess the verification effectiveness of more techniques, and recommend techniques to developers accordingly.

Figure 2 :
Figure 2: The UML class diagram showing our strategy-based software design for test generation

Finding 2 :
PanicCheck revealed 61 panics in 23 projects.So far, 59 of the panics have been addressed by developers.This observation indicates the great quality of PanicCheck's outputs and high relevance of revealed panics.

Finding 3 :
54 of the 61 panics occurred because unexpected or invalid values were used to call method APIs.

•
We conducted a case study by applying PanicCheck to 23 real-world Rust projects, in order to verify hundreds of Rust functions in those projects.No prior work conducts such a large-scale study as what we did.•By observing the runtime behaviors of KLEE and analyzing

u8::abstract_value();
PanicCheck defines Phases I-III to generate a test function from a given annotated function, and defines Phase IV to execute the test function with KLEE.At the end of Phase III, PanicCheck produces two versions of the generated test function: a human-readable Rust code and an executable version for KLEE.
, and their data types.It then generates a test function.The test function is compiled into LLVM bitcode, so that KLEE is applicable to verify decode_var(...).The compilation process also injects value checks to guard critical instructions (e.g., arithmetic or bitwise operators), and adds panic!macros when value checks fail.PanicCheck streamlines the verification process by synthesizing tests for given Rust code, and invoking the existing toolchain in RVT for Rust-to-bitcode conversion as well as KLEE application.As shown in Figure1,

Table 1 :
[24]cCheck's creation of symbolic variables for different Rust data types pass-in parameter to have the data type slice.Instead, a parameter can have the data type of slice reference, which is fully supported by PanicCheck.strNoThere is no pass-in parameter to have the data type str.Instead, a parameter can have the type of str reference, which is fully supported by PanicCheck.has exactly one value "()".When a function parameter has the unit type, PanicCheck generates the constant value instead of creating any symbolic variable, and sends that value to KLEE.PanicCheck can generate a symbolic array variable for the array type [T;n], where the element type T must be primitive and n is in [1, 30].To create such a variable, PanicCheck first declares an array variable with the size specified.It then repetitively defines symbolic variables of type T and adds those variables to the array.The partial support is delimited by KLEE's capability.enumPartialPanicCheckcreatesvariables for two enum types-Option⟨T⟩ and Result⟨T, E⟩-Rust built-in types widely used to define function parameters.Although developers are allowed to define their own enum data types, self-defined enum data types often have distinct structures.Thus, PanicCheck now cannot generate variables for those types.structPartialPanicCheckgeneratesvariablesforone built-in struct-String-a built-in data type widely used to define function parameters.Although developers can also define their own struct data types, PanicCheck does not support variable generation for those self-defined data types now.PanicCheck a heap with five independent symbolic variables.The type T must be primitive.PanicCheck generates tests based on templates.It has a code template predefined for each data type it supports (as listed in Table1) to declare variables; it also has a predefined template to call FTV with the newly declared symbolic variables.Actually, to simplify code generation and the static reasoning of data types, PanicCheck implements the Strategy design pattern[24]in code templates.The pattern allows us to define alternative algorithms for a specific task (i.e., generating variables given a data type), while PanicCheck decides the actual algorithms to use at runtime depending on FTV.Fig.2illustratesour strategy-based software design for the generated test code.Here, bold text highlights the newly generated Rust trait and implementations, while plain text describes the trait predefined by RVT.
tuple Partial PanicCheck generates variables for tuples with 2-11 primitive-typed elements, because Rust allows at most 11 elements in a tuple.Collection type Vec Partial For Vec⟨T⟩, PanicCheck generates a vector of 30 T-typed elements.Each element is a symbolic variable separately generated for primitive-type T, and then added to the vector.VecDeque Partial For VecDeque⟨T⟩, PanicCheck generates a queue of 30 elements, with each element a symbolic variable separately generated for primitive-type T. LinkedList Partial For LinkedList⟨T⟩, PanicCheck creates a list of 30 elements, with each element a symbolic variable of primitive-type T. Since function-to-verify (FTV) shares a common pattern, we define a template in PanicCheck.The template contains symbolic variable declaration and FTV call.For each FTV, PanicCheck generates two semantically equivalent versions of one test function: (1) source code and (2) LLVM bitcode.3.3.1 Source Code Generation.

•
The project uses only syntax defined by the Rust 2018 edition, as PanicCheck does not support the new grammar features introduced by the Rust 2021 edition.• The entry functions (see Section 4.2 for definition), i.e., functions we will annotate with #[panic_check], do not use any selfdefined (e.g., struct) or complex data types (e.g., HashMap) that are not supported by PanicCheck.• The entry functions do not have any parameter decorated with the lifetime annotation (i.e., the apostrophe character), as PanicCheck does not analyze or validate variables declared with the lifetime annotation.

Table 2 :
The experiment result on 21 open-source projects

Table 3 :
The root cause of state explosion projects Threats to Construct Validity.Our tool implementation is limited by the Rust edition (i.e., 2018) PanicCheck currently and the KLEE/LLVM versions it uses.Namely, PanicCheck does not support new features introduced by the more recent releases of Rust, neither does it support features that are not well supported by KLEE or LLVM.This is mainly because PanicCheck is based on RVT, and RVT targets Rust 2018.Currently, PanicCheck does not analyze or validate variables declared with the lifetime annotation, neither does it generate symbolic variables for exclusive (i.e., mutable) references.When running PanicCheck on the newer version of Rust code, it will throw an unknown error without producing any unsound result.In the future, we plan to modernize RVT, and extend the modernized version with PanicCheck's implementation for both old and new language features.