Clueless: A Tool Characterising Values Leaking as Addresses

Clueless is a binary instrumentation tool that characterises explicit cache side channel vulnerabilities of programs. It detects the transformation of data values into addresses by tracking dynamic instruction dependencies. Clueless tags data values in memory if it discovers that they are used in address calculations to further access other data. Clueless can report on the amount of data that are used as addresses at each point during execution. It can also be specifically instructed to track certain data in memory (e.g., a password) to see if they are turned into addresses at any point during execution. It returns a trace on how the tracked data are turned into addresses, if they do. We demonstrate Clueless on SPEC 2006 and characterise, for the first time, the amount of data values that are turned into addresses in these programs. We further demonstrate Clueless on a micro benchmark and on a case study. The case study is the different implementations of AES in OpenSSL: T-table, Vector Permutation AES (VPAES), and Intel Advanced Encryption Standard New Instructions (AES-NI). Clueless shows how the encryption key is transformed into addresses in the T-table implementation, while explicit cache side channel vulnerabilities are note detected in the other implementations.


INTRODUCTION
Cache side-channel attacks leak information through a microarchitectural covert channel -the cache.By observing changes in the shared cache state, a spy process can bypass process isolation and read secret data from a victim process.Cache side-channel attacks have been demonstrated on processors of different architectures and on different algorithms, e.g., RSA [1], AES [2][3][4][5], and ElGamal [6].Speculative side-channel attacks such as Spectre [7], Meltdown [8] and their variants [9][10][11][12][13][14] have caused major changes on how the architecture community view security.These attacks exploit speculative instructions that are to be squashed (e.g., instructions in mispredicted branches) to access and then transmit secret data over the shared cache.While non-speculative cache side-channel attacks could usually be mitigated by improving the implementations of vulnerable algorithms (e.g., avoid using secret data to look up in large tables), the speculative variants of them are difficult to prevent by changing software implementations because the information leakage happens in speculation.
Fig. 1 shows Spectre Variant 1 where an attacker can exploit the branch misprediction to access arbitrary program data and transmit the secret over a shared cache [7].The victim program is correctly implemented with the appropriate bound check, yet it is still vulnerable due to speculative execution.
Speculative side-channel attacks have found to be an enormous security threat.Different hardware approaches have been proposed to protect against them.For example, InvisiSpec [15] and Ghost-Minion [16] makes speculation invisible in the data cache hierarchy using additional speculative buffers so that secrets cannot be transmitted over cache channels.Delay-on-Miss (DoM) [17] delays all speculative loads that miss in data cache and thus prevent the observable timing differences, while Speculative Taint Tracking (STT) [18] focuses on blocking only the transmitter instruction.STT uses dynamic information flow tracking (DIFT) to taint secret data.It allows to forward the results of speculative instructions if they cannot leak secrets via any potential covert channels.
This work does not propose new mitigation methods for speculative side-channel attacks.Instead, we intend to understand how prevalent these vulnerabilities are in programs from a new perspective.Side-channel attacks rely on a fundamental programming feature to leak the value of secretsthe transformation of data values into memory addresses.Besides the victim program in Fig. 1, consider for example sorting, hashing, or many other algorithms that create addresses based on data values.While we understand the mechanism that leaks data as addresses, there is no clear indication of how serious the problem is in our workloads: How many values do "leak" as addresses in a given application?
This work aims to shed some light on how exposed are we to the potential vulnerability.Clueless is a tool (based on binary rewriting) that tracks dynamic instruction dependencies and tags data values in memory if it discovers that they are used in address calculations to further access other data.
Clueless can be used in two modes: aggregating mode, where it reports on the amount of data that are used as addresses at each point during execution, and tracking mode where the tool is specifically asked to track certain data in memory (e.g., a password) to see if they are turned into addresses at any point during execution.Tracking mode returns a trace on how the tracked data are turned into addresses, if they do.

METHODS
Clueless is a dynamic instrumentation tool that analyses instructions at run-time to track values that leak as memory addresses.Values are data that should not be used, directly or indirectly, as memory addresses, e.g., password hashes, private encryption keys.A value can leak as a memory address when there is information flow from the value to a memory address.The scope of the tool is limited to detecting data-flow: it tracks data dependences but disregards control dependences.In other words, Clueless is able to detect explicit channels [18], a value that is used as an address on a load instruction, but not implicit channels [18], where the value is leaked through control flow interaction.Assume secret is a value, the leakage in the code in Fig. 2a will not be detected by the tool because &A[0] and &A [128] only have control dependence on secret.On the other hand, the tool will detect the leakage in Fig. 2b because secret is involved in the computation of &A[i].Furthermore, addr will be tagged as a leak point.A leak point is a memory location where a leaked value resides.
Clueless uses an algorithm based on dynamic information flow tracking (DIFT) [19][20][21].DIFT has been successfully applied to prevent attacks on software [20][21][22][23][24][25] and has been seen in hardware protection proposals against speculative execution attacks [18].DIFT tracks information flow by associating taints with data and propagating the taints according to the data flow.In addition, Clueless's algorithm needs to automatically assign taints to data and maintain the taints.

Taint assignment
A new taint is assigned to a memory location whenever a value (i.e., data that should not be used to address memory) is loaded from that memory location.Each taint is associated to the address of a value.In the example in Fig. 3, suppose that values reside at memory location addrX and addrY, a new taint   is assigned to addrX  when the load instruction on line 3 executes, and then another taint   is assigned to addrY when line 4 executes.
Clueless needs to know if the loaded data is a value.Most contemporary Instruction Set Architectures (ISAs) do not make a distinction between value and address loads nor between value and address registers.As a binary instrumentation tool, Clueless is clueless about which loads actually load values.Clueless provides two solutions to this problem.
Everything is a value.One solution is to regard all data in the memory initially as values, i.e., Clueless assumes nothing in the memory should be used as a memory address.For every load instruction, a new taint is assigned to the memory address of the load.Consequently, all memory locations which contain memory addresses will be considered as leak points.Clueless effectively provides a way to classify any data in memory into memory addresses or non-addresses based on the past execution.This provides a new perspective to analyse programs: how much of a process's memory is potentially observable by another process through a cache side channel?We name this model as aggregation mode.Aggregation mode indicate how visible a program memory can be -Section 3 presents its results.
Users set watchpoints.Another solution is to let users mark out memory regions that contain values.A new taint is assigned to the memory address of a load only if that address is within a marked memory region.Clueless supports this solution by providing an API that can dynamically register and unregister memory regions to watch.This requires users to modify the source code of instrumented programs by inserting Clueless watchpoint API calls.We name this model as tracking mode.Section 4 presents its results.

Taint propagation
Clueless uses bit arrays to store taint sets, where each bit represents a different taint.With this representation, set union operations are equivalent to bit-wise or operations, which are efficient to perform.Each bit array is associated to a register or a memory location.The number of bits in a bit array is finite and can be configured when compiling the tool.Consequently, the maximal number of taints is equal to the number of bits in a bit array.
At instruction level, data-flow can be divided into two categories: register-register flow and register-memory flow.One of the main differences of the two categories in the context of dependence tracking is that the space required by register-register flow tracking is upper-bounded by the number of architectural registers while that of register-memory flow tracking is upper-bounded by the number of virtual memory locations.For example, pairs of a load and a store (both cause register-memory flow) can copy some data throughout the entire virtual memory and result in every memory location being tainted by the taints of the data, requiring enormous amount of space to store the taint sets.This might not be an issue when a few pieces of data are tracked because the data are not likely to flow through a large part of the memory.When the numbers of tracking points are large, however, the space overhead makes complete tracking of register-memory flow impractical.This is the case for Clueless in aggregating mode -it regards everything in the memory as a value and tracks the entire memory.On the other hand, storing a taint set for each register requires much less space because the number of architectural registers is low.
Tracking dependences via registers.Clueless tracks register-register flow by examining instructions, identifying source, destination, and memory addressing registers and following propagation rules.Table 1 demonstrates how taints propagate through the registers as instructions from Fig. 3 are executed.The propagation rules are listed below: (1) For each load instruction, the taint set of its destination registers becomes either a singleton or an empty set.If a value is loaded, the taint set is a singleton whose element is the new taint associated to the value's address.If what is loaded is not a value, the taint set is the empty set.(2) For instructions that set their destination registers to a constant (e.g.xor with two same source registers, mov a constant to a register), the taint sets of their destination registers become the empty set.(3) For instructions whose source and destination operands are all registers except the instructions in rule 2, the taint sets of their destination registers become the union of the taint sets of their source registers.(4) For load and store instructions, memory addressing registers have their taint sets emptied.All the memory addresses associated with the emptied taints are tagged as leak points.(5) For store instructions, if the taint sets of all the memory addressing registers are the empty set, the address is no longer a leak point and is untagged.
Expanding dependence tracking to memory.Using register-register flow tracking alone, the taint sets of data could be lost because programs often store some data to the memory, use the register containing the data for something else, and later reload the data from the memory.These cases require tracking register-memory flow to store and reload the taint sets.Two additional propagation rules are introduced to expand dependence tracking to memory: (6) For each store instruction, the taint set of the memory address becomes the taint set of the source register that contains the stored data.( 7) For each load instruction, in addition to rule 1, the taint set of a destination register becomes the union of the resulting taint set from rule 1 and the taint set of the memory address.
Although tracking all the register-memory flow using a complete method is impractical due to the space requirements, it is still important to track these flows because temporarily storing data to memory is very common.For this reason, a set-associative cache is used as a best-effort approach to store the taint sets that are associated with memory addresses.The cache uses a first-in-first-out replacement policy.The number of sets as well as the associativity of the cache can be configured when compiling the tool.

Taint maintenance
Clueless has finite number of taints because of the use of statically sized bit arrays as taint sets.Therefore, taints must be maintained and reused.A taint can only be reused when it is in none of the taint sets.Propagation rule 1, 2 and 4 are the rules that can empty taint sets and make taints reusable.Since the addresses associated with the emptied taints are already tagged as leak points according to propagation rule 4, the emptied taints no longer have useful information, thus can be removed from all the taint sets, resulting in them immediately becoming reusable.
Taints can still be exhausted in spite of the recycling.For example, a program can have a loop that loads many values from the memory and sums them.In these cases, Clueless makes the taint assigned by the earliest load available by removing it from all the taint sets.

Limitations
No tracking on speculative execution.Clueless is a binary instrumentation tool.It is not a hardware simulator and does not obtain micro architectural information such as instructions executed in speculation.As a result, Clueless cannot track speculative execution.
Incomplete tracking.Clueless is a characterisation tool as oppose to a verification tool.The tracking of Clueless is incomplete.Clueless can track data dependence within a limited window.The incompleteness is the consequence of our implementation that uses a finite number of taints and a finite sized cache.When compiling the tool, users can adjust these parameters to find the desired size of the tracking window.[26].Clueless is compiled into a shared library and needs to be loaded by Intel Pin.The propagation algorithms of Clueless is implemented in a platform-independent way, but Intel Pin only supports instrumentation of IA-32, x86-64 and MIC ISAs.As a result, Clueless currently only supports these ISAs.

Source code
The source code of Clueless is published under the GNU General Public License, Version 3. Its git repository is accessible at https: //github.com/xiaoyuechen/dift-addr.git.

AGGREGATING MODE
Clueless in aggregating mode regards everything in the instrumented program's memory as values.Clueless in this mode tags any memory locations whose data transform into addresses as leak points.In addition, Clueless collects a set of all memory addresses used by the program, i.e., addresses used in any memory accessing instructions.With the set of leak points and the set of all memory addresses, we could introduce a metric that describes the proportion of data that are used as addresses for a given execution of a program.

Λ of SPEC benchmarks
We used Clueless's aggregating mode on SPEC 2006 to characterise data transformation into memory addresses by analysing how |  | and |  | change and comparing the Λ values of different benchmarks.Since Clueless uses incomplete methods to track   while the tracking of   is complete, the reported values of Λ are lower-bounds of the actual Λ.
The prevalence of data-address transformations, indicated by Λ, is an innate property of a program.Fig. 4 shows the values of Λ of different benchmarks programs.The astar program and soplex program use more than one third of their memory to store addresses.In the bwaves program and sjeng program, on the other hand, such transformations are rarely seen.

A closer look
For more insights into data-address transformation, we further study how much data are transformed into addresses at each point of execution of some benchmark programs.The cause of such fluctuations is that the same blocks of memory containing addresses are repeatedly loaded from and written to.
Bzip2.Fig. 5b shows that the bzip2 program periodically store new addresses to the same blocks of memory.One common dataaddress transformation pattern can be found when  ∈ [0, 3.9 × 10 10 ],  ∈ [3.9 × 10 10 , 7.9 × 10 10 ], and  ∈ [7.9 × 10 10 , 1.75 × 10 11 ] -a rapid increase in |  | which then fluctuates periodically, followed by another rapid but smaller increase in |  | and fluctuates periodically again.The cause for the repeated pattern could be that the bzip2 program reallocates memory to store memory addresses, but the same algorithm is used on the reallocated memory.
Calculix.Fig. 5c shows that the calculix program has an obvious periodic memory access pattern.After the initial increase of both

TRACKING MODE
Clueless in tracking mode allows users to dynamically register and unregister watchpoints, i.e., memory blocks that contain values.If data from any watchpoint are transformed into memory addresses, Clueless will provide a detailed diagnose on each leakage.The diagnostic information include the leak point, the memory address that   the value in the leak point transforms into, a trace of instructions that shows the value-address transformation, and the relevant routine and image names where the leakage happens.This mode could be used to test the side-channel vulnerability of programs and help understand where and how secrets are leaked if such vulnerability exists.

The micro benchmark
We demonstrate Clueless in tracking mode on a micro benchmark program in Fig. 6.Array T[] and function foo are defined in a shared library the victim program links against.The victim program has a secret stored in the s[] array.The victim calls foo with the secret as its parameter.Function foo loads each byte of its parameter array, multiplies the byte value by 64, and uses the result as the index of a constant array T[] to do some lookup.This program is vulnerable to side channel attacks such as Flush + Reload [1].The attacking program may mmap the shared library and flush the cache lines containing T[], wait for the victim to call the foo function, and measure the time to reload the cache lines to find out which lines are accessed by foo.Assuming that the victim's machine has 64-byte cache lines, the attacker can recover the secret completely -each access of T[s[i]*64] will be on a different cache line, so each byte of the secret can be computed using (l-T)/64 where l is the address of an accessed line.The offset of T can also be found trivially because it is just a symbol in a shared library.In our example, T has an offset of 0x2020.

Pinpointing the leakage
Clueless's aggregation mode can be used to characterise the micro benchmark program.Fig. 7 shows how its |  | and |  | change.With Clueless in tracking mode, we can pinpoint which increase of  evaluates to 0x38e0 (after subtracting the image load offset), and is used as a memory address in a load.To recover s[0], we compute (0x38e0-0x2020)/64 and yield 0x63, which is the ASCII code for 'c'.

Tracing the transformation
For further understanding on the leakage, Clueless gives a trace of propagation that causes it.Fig. 8a shows the part the propagation trace that causes s[0] to leak, and Fig. 8b shows the corresponding instructions.
By analysing the trace of the micro benchmark, we find that the instruction at line 1 loads &s[0] which is 0x7fff41801683 with taint set { 0 } to register rax.The following 3 instructions propagate { 0 } from register rax to itself.Then the instruction at line 5 propagates { 0 } to register rdx by adding register rax to it.{ 0 } is further propagated to register rcx which is eventually used as the memory address 0x55baae46d8e0.

CASE STUDY: AES
We have seen how the micro benchmark in Section 4 could leak secrets due to its value-address transformations.Some implementations of Advanced Encryption Standard (AES) [27] are susceptible to cache side-channel attacks for the same reason.These implementations often depend on large tables to speed up the encryption process [28].If encryption keys are transformed into indices of large tables for lookups, attackers may partially or completely recover the keys by observing the corresponding cache state changes.Numerous attacks on AES exploiting this class of vulnerability have been demonstrated in the past [2][3][4][5].Different implementations of AES have also been proposed to protect against such attacks while retaining or improving the speed of encryption [3,28,29].
In this case study, we use Clueless in tracking mode to analyse three different implementations of AES present in OpenSSL 3.0.3-T-table, Vector Permutation AES (VPAES), and Intel Advanced Encryption Standard New Instructions (AES-NI).The expanded encryption key is set as the watchpoint in order to observe if it is transformed into memory addresses.

T-table
The T-table implementation of AES in OpenSSL uses 9 T-tables, i.e., pre-computed lookup tables, with 8 of them being 8kiB each and 1 of them being 2kiB.The encryption key is first expanded to round keys.The first round key is combined with the 16-byte plaintext using xor to form the initial state vector.The elements in the state vector are then used as indices of the T-tables to look up values which are combined with the next round key to form the next state vector.This implementation could be easily broken using Prime+Probe, with the 128-bit encryption key fully recovered after only 300 encryptions [3].

CONCLUSION AND FUTURE WORK
We have presented Clueless: a tool characterising values leaking as addresses.Using Clueless in aggregating mode, we have characterised, for the first time, the amount of data values that transformed into memory addresses in SPEC 2006 benchmark programs.Some benchmark programs use more than one third of accessed memory to reference memory.Clueless in tracking mode has provided the traces of how secrets propagate and leak in a micro benchmark and AES implementations in OpenSSL.The T-table implementation of AES exhibits potential vulnerabilities to cache side-channel attacks while the VPAES and AES-NI implementations are immune to such attacks.
The "leaks" reported by Clueless are to be further studied.We hope to identify the value-address transformations that would lead to the danger of leaking sensitive information from the false positives (e.g.secrets transforming to addresses on the same cache lines).We are interested in applying similar dynamic information flow tracking techniques on hardware models to mitigate cache side-channel attacks such as Specture.The high frequencies of data-address transformations in some programs also indicate optimisation opportunities in cache systems: data that would transform to memory addresses may be associated to the data the transformed addresses point to.This may be a focus of our future work.

Let
be the set of leak points and   be the set of all addresses after the execution of the :th instruction of a program (Trivially,   ⊆   ).Let  be the number of instructions of the entire execution of the program, metric Λ defined by Λ =  =1 |  |  =1 |  | indicates the average proportion of data that transform into addresses the entire executing of the program.Figuratively, Λ is the area under the |  | curve divided by the area under the |  | curve in Fig. 5.
We demonstrate Clueless in aggregating mode on SPEC 2006 and characterise, for the first time, the amount of data values that are turned into addresses in these programs.We further demonstrate Clueless in tracking mode on a micro benchmark and on a case study.The case study is the different implementations of AES in OpenSSL: T-table, Vector Permutation AES (VPAES), and Intel Advanced Encryption Standard New Instructions (AES-NI).The T-table AES implementation can be easily broken with a cache side-channel attack (e.g., Prime+Probe), but VPAES and AES-NI are immune to cache-timing attacks.Clueless readily shows how the encryption key is transformed into addresses in the T-table implementation and a lack of the corresponding transformations in the other two implementations.

Table 1 :
Example Taint Propagation |  | and |  |, |  | becomes stable while |  | becomes periodic.The amplitude of |  | is relatively large at approximately 2.1 × 10 6 , indicating that blocks containing 2.1 × 10 6 addresses are repeatedly written with new addresses.Soplex.Fig. 5d shows how |  | and |  | of the soplex program change.After the initial increase, both |  | and |  | become stable.This does not mean that data in this program are transformed to memory addresses only once.After the data are tagged, they may still be transformed into memory addresses multiple times in different ways, but   would remain the same.The stable |  | only indicates that no new data are tagged, and no tagged memory location is written to.