DefWeb: Defending User Privacy against Cache-based Website Fingerprinting Attacks with Intelligent Noise Injection

Cache-based website fingerprinting (WF) attacks violate user privacy where the attacker leverages the shared last-level cache in CPUs and analyzes the fingerprints through machine learning and deep learning models. WF attacks are even applicable in Incognito and anonymized browser platforms, leading to a serious threat to the public. Several defense techniques inject random noise during website rendering to degrade the attack success rate, while these techniques either create large performance overhead or cannot obfuscate the WF dataset entirely when the attacker retrains a new learning model with noisy fingerprints. In this work, we develop a dynamic generative learning-based defense technique, DefWeb, to protect user privacy against cache-based WF attacks by injecting precise noise into the WFs. For this purpose, (i) we train generative neural networks to represent high-dimensional fingerprints in a low-dimension space while creating distinct clusters for each website. (ii) Minimal noise templates are extracted in the low-dimension space to obfuscate the fingerprints efficiently. (iii) We create practical noise templates that can be added to WFs during website rendering by leveraging self-modifying code (SMC). We implement DefWeb in both simulation and real-world setups to degrade the attacker’s model accuracy. DefWeb can decrease the model accuracy to 1.1% and 28.8% in simulation and real-world setups, respectively, including Mozilla Firefox, Google Chrome, and Tor browsers. Finally, the performance overhead introduced by DefWeb is only 9.5%, which is considerably lower than previous defense techniques.


INTRODUCTION
Web browsers are the primary applications to access the Internet on mobile phones, laptops, and desktops.Users share sensitive information, including SSNs and credit card numbers, through websites.Hence, web browsers are expected to protect users against malevolent people by implementing state-of-the-art defense mechanisms.The privacy and integrity of web users have been generally protected through the Hypertext Transfer Protocol Secure (https) mechanism by providing end-to-end encryption, which is ineffective in concealing user identities as user-specific information is available in SSL and TLS certificates.As an alternative method, the Tor network was designed to anonymize website visitors by encrypting the network packets, hiding the user identity from network observers [6].However, Tor browser [16] is not preferred for daily usage since the network delay is considerably higher than other browsers such as Google Chrome and Mozilla Firefox.
Website Fingerprinting (WF) is an attack technique to profile the user activity on the Internet by observing network packets between a host and a user [26].In the offline phase, an attacker collects website-specific network packets and extracts common features for each website.In the online phase, the attacker captures real-time network packets to detect the visited website.
Oren et al. [24] showed that the sandbox environment of the browsers also provides sufficient attack resources through JavaScript features to profile visited websites from a third-party website in a browser.In this attack model, an attacker utilizes shared hardware resources such as cache usage rather than network activity to profile the visited websites.
The analysis process of the collected dataset is based on the complexity of the attack model and the number of websites profiled during the offline phase.Many WF attacks utilize machine learning (ML) algorithms to efficiently analyze the collected dataset due to the increasing number of websites and system noise.Initially, ML algorithms were leveraged to train multi-class classification models to distinguish the visited websites [12,13].With the recent advancements in GPU resources and deep learning (DL) algorithms, attackers started to leverage DL-based models to extract useful features for each website [27,33].Since DL models require a large dataset to update the hidden layer weights, the number of measurements per website is increased to hundreds from a few measurements.In parallel, attackers obtain higher classification accuracy than before while implementing large-scale WF attacks.
Several defense mechanisms have been developed to prevent cache-based WF attacks.Since measuring the memory access time to distinguish cache hits and misses is essential, the resolution of available timers in the JavaScript environment is reduced significantly in popular browsers such as Google Chrome and Mozilla Firefox.Other defense techniques modify the existing browsers to prevent cache-based WF attacks.Chrome Zero [29] creates a browser extension to protect users by eliminating mostly-used JavaScript features.Similarly, DeterFox [3] constructs a deterministic JavaScript environment in which an attacker always receives the same information in a given time interval.On the other hand, other defense techniques rely on creating artificial noise by accessing random cache sets [32], creating random interrupts [4], or introducing dummy I/O operations [21].While these defense techniques slightly reduce the WF attack success rate, DL algorithms can still extract meaningful features from noisy fingerprints once the models are retrained with the noisy dataset, which makes the proposed techniques less applicable in real-world scenarios.
In this paper, we focus on creating a more efficient WF obfuscation technique that can lower the attacker's model accuracy while introducing minimal performance overhead.For this purpose, De-fWeb: • utilizes Variational Autoencoders (VAEs) to extract useful features from website fingerprints in a low-dimension space and create a separate cluster for each website.• extracts minimal noise to obfuscate each website fingerprint in the latent space and maps the extracted noise to a higher dimension space.• introduces a new deterministic noise generation tool based on self-modifying code (SMC) to construct noise templates in a micro-architecture and browser.• incurs minimal performance overhead on website rendering through precise noise injection compared to previous defense mechanisms.
• publicly provides all WF datasets, codes, noise template datasets, and performance overhead tools in the DefWeb1 GitHub repository.
Outline.The rest of the paper is organized as follows: Section 2 provides background on cache occupancy channel, VAEs, and SMC.Section 3 describes our defense methodology, DefWeb.Section 4 explains the automated practical noise generation technique using Self-Modifying Code.Section 5 measures the performance overhead based on a tool that automatically measures the website loading time.Section 6 delivers a discussion on DefWeb and future research scope.Section 7 shows related work on different WF attacks and defense mechanisms.Finally, Section 8 concludes our study.

BACKGROUND
This section gives an overview of the cache occupancy channel used in the attack scenario, VAEs leveraged for dimension reduction, and SMC utilized to create precise noise on WFs.

Cache Occupancy Channel
The cache structure speeds up the data and instruction transfer from memory to the execution units by storing the most recently used memory blocks.While the first two levels of cache are dedicated to each physical core, the last level cache (LLC) is generally shared between all physical cores in a socket.Hence, applications running on different cores may allocate memory blocks in the same Prime and Probe (PnP) attack [22,25] consists of three steps.First, the LLC sets are filled with attacker-controlled memory addresses.
Next, the attacker waits for a certain amount of time until the victim accesses LLC.Finally, the attacker accesses the previously allocated memory regions and measures the access time.Based on these three main steps, the attacker can distinguish the active sets in the LLC by monitoring the access time differences.The cache occupancy channel [32] is a modified version of the PnP attack, in which an attacker allocates a large array equal to the LLC size and measures the access time to the whole memory region.This attack allows attackers to collect precise WFs even though the available time resolution is not high enough to measure the access time of each individual memory access.Since current browsers provide a coarse-grain timer resolution (2ms-100ms) and the cache occupancy channel is applicable through the browser sandbox environment, we consider the cache occupancy channel technique in our threat model throughout the study.

Variational Autoencoder
Variational Autoencoder (VAE) [19] is an advanced version of Autoencoders [1], which regularizes the training process to avoid overfitting and creates a more organized latent space that enables the generative process as shown in Figure 1.In VAE,   encodes the input as a probabilistic distribution over the latent space ( (|)).
Next, a point  from the distribution is chosen in the latent space, and the sampled point is decoded through the decoder unit   , whose model architecture consists of the transpose of layers in   .The generated sample  ′ =   () is used to compute the reconstruction error between  and  ′ .In the last step, the reconstruction error is back-propagated through both encoder and decoder units.While the hyperparameters are updated with the reconstruction error, the regularisation in the latent space is enforced with a regularisation term expressed as Kullback-Leibler (KL) divergence: . KL divergence metric is utilized to compare the difference between each probability distribution since   is enforced to generate distributions close to a standard normal distribution: . KL divergence metric organizes the latent space in a way that measurements in each class are close to each other in the latent space, and the overlap between the classes is kept minimal.Hence, sampling a new point from a given distribution in the latent space is expected to contain the main characteristic features belonging to that class at the end of the decoder unit.

Self-Modifying Code
Self-modifying code (SMC) modifies the program's executable code page by altering its own instructions while the program is being executed [2].The instruction fetch unit in modern CPUs aggressively prefetches instructions stored in the L1 instruction (L1i) cache to form a pipeline.The CPU starts executing the instructions in the pipeline speculatively even before the retirement of the previous instructions.The SMC writes to the code page to alter instructions that are already prefetched and stayed in the pipeline.To modify the executable code page, the SMC execution must have read, write, and execution permissions to the respective code page.Moreover, to ensure that the instructions to be overwritten are already in the pipeline, the target subpage (where the modifications occur) should not be far from its current location.In Intel micro-architectures, SMC execution requires writing to a code page in the same 1KB subpage that is being executed to affect the instructions in the prefetch queue [5].If the SMC is executed, the prefetch queue becomes invalidated since wrong instructions are executed in an out-of-order way.Consequently, the entire pipeline stored in the L1i cache is flushed out, and the program returns to the address of its last retired instruction to start execution again.In this study, we intend to leverage the SMC execution to create frequent flushes in the L1i cache to introduce noise on LLC.Specifically, the SMC execution controls the amplitude and duration of the injected noise by altering the number of SMC executions in a given interval.Further explanation for the SMC usage is given in Section 4.1.

DEFWEB METHODOLOGY
In this section, we introduce the methodology of DefWeb to inject precise noise into WFs.

Threat Model
The threat model consists of offline and online phases in Figure 2. In each phase, attacker and defender have different capabilities: Attacker.In the offline phase, an attacker collects WFs from an active tab in a browser by profiling the shared LLC with the cache occupancy channel [32].We assume that the attacker obtains a set of WF belonging to sensitive websites in an identical device to the victim's device.Then, the attacker trains a multi-class classification model with the collected WF dataset.In the online phase, the attacker collects a single WF during the victim's website rendering process, which is classified with the pre-trained learning model.Defender.In the offline phase, DefWeb collects a set of WF from the same sensitive websites using the cache occupancy channel.Next, characteristic features of each website are extracted in a lower dimension through VAE.DefWeb computes the minimal modifications required for each website to change the classification in the lower dimension space.Each modification is saved as a noise template for the online phase.In the online phase, we assume that the victim has the userspace DefWeb application installed in the system.The victim enters the website URL to the application, and one of the noise templates associated with the visited website is selected randomly to create dynamic noise during website rendering.Hence, the attacker cannot distinguish the visited websites with high accuracy when DefWeb is active in the system.
Note that attackers may have access to the DefWeb application, and they can retrain their learning models with the noisy dataset.We show that DefWeb is still effective against cache occupancy channel attacks even though the attacker has access to the defense tool (see Section 4.2).The notations used for the rest of the paper are given in Table 1.
WF Dataset Collection.We assume that  websites are chosen as sensitive websites by the attacker.The cache occupancy channel has  milliseconds resolution, and each WF consists of  samples in total, resulting in  ×  milliseconds of profiling.The number of traces per website is , which creates the entire WF dataset,  = { 1 , 2 , . . .,  }, consisting of  ×  measurements.We follow the same data collection method described in [32], which is given in Algorithm 2, Appendix A.1.

Simulated Noise Template Generation
In this part, we explain how DefWeb creates precise noise templates by utilizing VAEs in a simulation environment to effectively defend the victim browsers against cache-based WF attacks.Our purpose is to find the precise noise amount that can be injected into each individual WF in   , where  ∈ 1, . . ., , to a different website   , where  ∈ 1, . . .,  and  ≠ .If each WF in   can be converted to distinct websites, the attacker cannot distinguish the visited website as fingerprints in   will have dominant features from other websites.After this process is repeated for each website, even though the attacker retrains the learning model with the noisy dataset, website-specific fingerprint features will be evenly distributed among fingerprints, yielding a random guess: 1/.De-fWeb consists of four main steps: Step 1: Dimension Reduction with VAE.WFs consist of highdimensional PnP timing readings with excessive noise and misalignment issues.Hence, attackers extract distinct features from WFs through advanced DL models [4,9,11,32].While We aim to create an automated dynamic obfuscation technique by converting WFs into other WFs, it is challenging to perform such an obfuscation mechanism in a high-dimension space due to the excessive system noise.Hence, we focus on extracting website-specific distinct features in a lower dimension space, namely, latent space  with a dimension , to simplify the website obfuscation using VAEs.
As depicted in Figure 1, the encoder unit   extracts dominant features from each measurement and groups measurements belonging to each website in the latent space  : R  → R  .Convolution layers are used in a sequential model to construct the   model.The   model maps the WF dataset with  dimension to  dimension latent space, creating a separate cluster  = { 1 , • • • ,   }, where  =  ( ).Each cluster is represented with its mean value   , where   has a dimension of  in the latent space.In Figure 3a, we present a toy example in which two websites, each with 100 fingerprints, are mapped to a two-dimension latent space, and the clusters' mean values are represented with  1 and  2 .
The   model architecture is designed in the reverse direction of the   model to reconstruct a WF from a given point in the latent space  : R  → R  .. The reconstructed WFs are expected to have similar features to the original WFs while eliminating the system noise.In the training process of VAE,   regenerates  ×  WFs, and the loss is calculated based on the equation given in Section 2.2 by updating the weights in both   and   neural network models iteratively.Once the loss stays less than a pre-defined threshold, the training is stopped.Specifically, during the training VAE, the training process terminates early if the loss, as described in Section 2.2, stabilizes or remains consistent.This early stopping allows us to prevent our VAE model from overfitting the training data by providing reasonable results without unnecessary computation.
Step 2: Distance Table Creation in the Latent Space.In this step, DefWeb computes the minimal noise needed to convert a fingerprint from one website to another website in the latent space.For this purpose, we calculate the distance vector  , between each cluster's mean in the latent space  , =   −   in Table 9  in order to convert it to a noisy fingerprint that has the characteristics of   .With this approach, the closest distance among each cluster   is determined, which eventually gives the minimal noise amount to convert an original WF to another WF in the  dimension.Consequently, we can move the fingerprints in the latent space, and then, the pretrained   unit can generate a new WF dataset only with significant features belonging to the targeted website.
Step 3: Noisy WF Dataset Creation.In this step, we move each fingerprint in the   cluster to other website clusters by adding the distance value from Table 9 in Appendix A.3 as follows:    =    +  , for  ∈ 1, . . ., , where   is the new noisy dataset for   and    is the  th fingerprint of   .Consequently,  −1 measurements in the   cluster are moved to  − 1 different clusters in the  dimensional space and create new latent space   .As a result, DefWeb spreads the fingerprints of each website in the latent space to different clusters artificially, leading to obfuscated WFs.As an example, Figure 3b shows that 50% of the measurements from  1 and  2 are moved to other clusters by adding the distance vector  1 +  1,2 and  2 +  2,1 for  = 2 websites with  = 100 measurements in the  = 2 dimensional latent space.With this method, a re-trained learning model cannot distinguish the websites in the latent space.Finally, the pre-trained decoder unit   generates a noisy WF dataset  with dimension  using the newly formed latent space,   =   (   ), where   represents a reconstructed noisy WF dataset that contains the characteristics of website  ; however, each original WF belongs to a different website as shown in Figure 3c.
Step 4: Simulated Noise Template Creation.Even though added  , values spread the fingerprints into different clusters in the latent space, the exact noise shape that needs to be added to the original WFs in  dimension is not clear.Hence, we extract simulated noise templates  by computing the difference between noisy WF dataset   and the original WF dataset   :  , =    −    , where  ∈ 1, . . ., .This procedure is repeated for all  websites and creates total  ×  ×  number of  templates.
We observed that  has large negative values that cannot be generated in a real-world scenario since it is not possible to decrease the cache reading values in a processor.Moreover, some websites have relatively large cache activity, preventing the conversion of high-activity WF to low-activity WF in the  dimension.Thus, we This method enables DefWeb to convert high-activity WF to lowactivity WF with a higher success rate, which eventually drops the attacker's model accuracy.Finally, we prune our  by zeroing out the rest of the negative portion, which generates more realistic noise templates with values between 0 and the maximum value in  .
Figure 4 demonstrates the noise injection into three different google.comWFs. Figure 4a represents collected WF datasets from google.comwebsite in Chrome browser.Figure 4b shows different noise templates that are created with four steps to convert google.comfingerprints to 360.cn, nih.gov, and netflix.comwebsites.The new noisy WFs that belong to google.com are depicted in Fig- ure 4c.Thus, DefWeb dynamically hides the original features in WFs.Although original WFs come from google.com, the attacker's DL model cannot distinguish distinct features in the WFs due to the injected precise noise.

Simulation-based DefWeb Results
Experiment Setup: The WF datasets are collected on the Intel(R) 11th Generation i7-1165G7 @ 2.80 GHz with Tiger Lake microarchitecture, which consists of a 12MB LLC with Ubuntu 20.04 LTS OS.We collect our datasets using the Google Chrome browser of version 101.0.4951.64 and Firefox browser version 111.0.WiFi connection is used during the rendering of websites.Moreover, the DL models are trained on an NVIDIA GeForce RTX 3090 GPU card.
Data Collection: The WFs are collected through the cache occupancy channel [32] from Google Chrome, Mozilla Firefox and Tor browsers based on Alexa's most visited websites list [14,28].The number of sensitive websites is chosen as  = 100.We exclude pornography websites and websites with redundant content in different versions based on the various countries, such as amazon.co.jp and amazon.co.uk. = 100 measurements are collected for each website that consists of  = 6, 000 samples for Chrome and Firefox with VAE to evaluate whether the reconstructed dataset can still be classified correctly with the pre-trained CNN model.The results show that increasing the dimension leads to higher pre-trained model accuracy, as shown in Table 2.However, when the CNN model is re-trained with the reconstructed dataset, latent dimensions of more than 100 are ineffective in removing the noise, leading to low re-trained model accuracy.Similarly, when the noisy dataset  is used to re-train the CNN model, the accuracy increases to 24.9%, making DefWeb inefficient against attackers.Hence, we choose the latent space dimension as 100 to preserve the website characteristics while removing the system noise from the WF dataset and training the VAE model with 300 epochs.

Closed World Scenario Results
In the closed-world scenario, the number of sensitive websites is  = 100, and the entire list is given in Appendix Table 8.The collected WFs can be classified with a 95.7% success rate with CNN and LSTM models, as shown in Table 3.While the classification accuracy is consistent among Google Chrome and Mozilla Firefox browsers, the Tor browser has a lower accuracy rate due to additional noise.Moreover, we collected 12,000 samples to capture a longer measurement for profiled websites.We use the same LSTM and CNN model hyperparameters and layers with [4,32] to achieve similar accuracy as given in Appendix A.6.However, our classification accuracy is considerably higher than the reported accuracy in [32] due to the changes in the Alexa Top 100 websites list, OS, micro-architecture, and web browser versions.
In the simulation environment, the VAE model can reshape the given WF by modifying the cache reading values.In other words, noise added to cache readings can be either negative or positive to increase conversion accuracy from   to   .While negative noise values remove the patterns of   , positive values inject patterns belonging to   .First, we generate the noisy dataset with original noise templates.Both CNN and LSTM models are retrained with the same hyperparameters 10 times, and the average test accuracy is reported in Table 4.Our simulation-based DefWeb can decrease the success rate to 3.2% in Chrome and 1.1% in Firefox browsers.When we increase the number of websites to 150, CNN classification accuracy reaches up to 93.5% accuracy, which is slightly lower than 100 website classification.The VAE model architecture is kept the same, and the reconstruction accuracy is 94.8%.After the generated simulation templates are added to the websites, the accuracy drops to 2.17%.These results show that DefWeb can obfuscate the WFs with the VAE-based noise templates, resulting in almost a random guess (1%).When we change the hyperparameters in both CNN and LSTM models to fit them into the noisy dataset, the accuracy does not improve further.
DefWeb can obfuscate the WFs in a simulation environment with high accuracy; however, it is impractical to introduce negative noise values in a real-world scenario, which limits the applicability of DefWeb.To elaborate, negative noise values denote that a website should reduce its access to fewer cache sets.This results in fewer attacker-controlled cache lines being evicted, ultimately lowering the attacker's access time.However, if a website's content is fixed, it is impossible to decrease the number of cache accesses from a website.Hence, we cannot decrease the attacker access time from the defensive perspective.
Therefore, we eliminate the negative noise values in  and add new noise templates to WFs.When the new noisy dataset is used to retrain the CNN and LSTM models, the classification accuracy slightly drops compared to the original dataset.This means zeroedout negative noise values still leave the original patterns of  in the noisy fingerprint.Especially, websites with high activity are more difficult to be converted into low-activity websites.For example, the Netflix website has more activity due to the videos running by default on the homepage, which is difficult to convert to Google WF that consists of less activity, leading to low cache readings in Figure 7, Appendix A.4.To solve this problem, we shift up the overall noise amount by adding a certain amount of noise based on the minimum noise in the templates: ( )/.Hence, we can keep as many distinct noise features as possible, even after zeroing out the negative noise values.To determine the most suitable  value, we try several options for  in the range of 2 − 5 and achieve CNN retraining accuracy shown in Table 5.We choose  = 3 as the final value since increasing the value of  leads to higher classification accuracy for the attacker.This indicates SNT cannot successfully obscure the WF, as many distinctive features are zeroed out in the noise template.Conversely, choosing a value less than  = 3 would require generating higher noise values in practical settings, which can be challenging in practical noise generation.

Open World Scenario Results
Since the victim may visit non-sensitive websites that are not in the scope of the attacker, non-sensitive websites need to be distinguished from sensitive websites.For this purpose, we follow the open-world environment setup from Panchenko et al. [26] and Shusterman et al. [31], in which 1,000 traces are collected from nonsensitive websites.The profiled non-sensitive websites are added as a separate class in the model training.Both CNN and LSTM models achieve 95.5% and 96.5% accuracy for Chrome and Firefox, respectively.
Since non-sensitive websites are not included in the VAE model training, no extracted noise pattern can be added to the non-sensitive WFs.The first option for DefWeb is to leave the non-sensitive WFs without any obfuscation since these websites are not important even though they can be detected by the attacker.If there is no change in the non-sensitive WFs, the classification accuracy increases to 17.7% from 3.2% in Chrome.Similarly, the accuracy increases to 17.2% from 1.1% in Firefox.The second option is to inject random noise on non-sensitive WFs, in which the random noise amount changes between 0 and 15 ms.The random noise injection results in 15.8% and 15.5% success rates for Chrome and Firefox, respectively.The final option is to add pre-computed noise patterns that belong to other WFs.For example, cache access patterns extracted from nih.gov can be added to non-sensitive WFs to change the cache access pattern with a more dominant one.On the other hand, the original patterns remain in the non-sensitive WFs since those patterns are not profiled in the VAE model training, which degrades the obfuscation accuracy.The last option decreases the classification accuracy to 10.2% and 10.1% for Chrome and Firefox, respectively.

PRACTICAL NOISE GENERATION
Even though the created simulation-based noise templates degrade the website detection accuracy significantly, constructing the same noise template in actual hardware is extremely challenging.For this purpose, we propose an automated noise generation technique that can reliably produce deterministic noise during the website fingerprinting collection process.We leverage the SMC execution that can be fine-tuned to generate precise noise templates on Intelbased devices.

Intelligent Noise Injection with SMC
Each simulated noise template  , consists of  similar templates since  measurements from each website are converted to another website.First, we calculate the average of  noise templates to create a generalized average noise template  (  ,  ) to convert    into    .Next, each  (  ,  ) template is added to the original fingerprints    to replace its unique patterns with the patterns of our target website    .However, it is observed that averaged noise templates do not obfuscate the website-specific patterns as expected due to the misalignment issue stemming from the loading time variation of the visited websites.
In order to create practical noise templates from simulated noise templates, four steps are taken as follows: (S1) The first step detects and expands useful patterns in  to solve the misalignment issue.(S2) All noise templates are segmented into multiple consistent noise blocks to ease the generation of practical noise templates.(S3) The effect of SMC on the cache timings is analyzed and a look-up table is created to control the noise amount and duration.(S4) Multiple noise blocks are concatenated to create a practical noise template on PnP measurements.The created noise blocks are merged together to form individual practical noise templates,  .Finally, we inject the created noise templates during the website measurement collection automatically based on the visited website.We explain the overall procedure in detail: S1) Misalignment Solution.Each  has two purposes: 1) Mask the dominant features of the original WF   .2) Insert distinct features belonging to the target website   .Hence, each  (  ,  ) template consists of multiple regions of interests (RoIs) as shown in Figure 5(a), in which lower values aim at masking original features and higher values introduce new dominant features.Thus, the masking noise blocks should match the dominant features to mask the original features efficiently.
In order to overcome the misalignment issue, first, we detect the sudden changes over  samples in every template by utilizing the Change Point Detection (CPD) algorithm [17,20].In the implementation of the CPD algorithm,  (  ,  ) is divided into two sections based on a chosen point  that is initially set to the middle point in the measurements.For each section, an empirical estimate of the mean is determined, which is compared with the statistical means of every sample point in that section to calculate the deviations.The total residual error is calculated after adding the deviations to every sample point of individual sections.The residual error is repeatedly computed after changing the initial chosen point  that iterates over all the sample points until the total residual error is minimized to a predefined threshold.The maximum number of change points in this method is determined as 30 empirically.
We calculated the amount of expansion that is required to fix the misalignment issue by observing the distribution of the misalignment in the original WFs.For this purpose, we compute the cross-correlation coefficient between every WF by shifting the fingerprints one by one.We observed that the misalignment amount is in the range of [-100 100] as shown in Figure 8, Appendix A.7. 99% of the fingerprints have a misalignment value between -50 and 50.Based on this observation, we determined the expansion amount as 50 samples from both sides.If the width of a RoI is   , the width is expanded on both sides by 50, thus creating the total width as   + 100 as shown in Figure 5b.S2) Segmentation of  into Dynamic Noise Blocks.After solving the misalignment issue, the next step is to segment  (  ,  ) into multiple dynamic consistent noise blocks by using the CPD method.If the number of change points in  (  ,  ) is    , the template is segmented into    + 1 blocks and can be represented as the concatenation of the noise blocks as follows: where (  ,  )  represents small noise blocks of  (  ,  ) template, and    indicates the maximum number of change points per template.We selected    as 30 empirically so as not to construct extremely small blocks for a given  (  ,  ).Each block (  ,  )  has two distinct parameters: width, ℎ(  ,  )  , and average amplitude,  (  ,  )  .ℎ(  ,  )  represents the length of a noise block in terms of samples while  (  ,  )  represents the average height of the block in terms of milliseconds.
Our objective is to generate the noise blocks with the SMC execution and concatenate them to form an entire practical noise template  (  ,  ).We observed that the SMC execution cannot generate noise with a width value of less than 7. Hence, we discard all the small blocks having a width value below 7.
Figure 5c demonstrates the segmentation process of a noise template into smaller noise blocks that can be generated with the SMC execution.Next, we extract the corresponding width, ℎ(  ,  )  and average amplitude,  (  ,  )  for every block in the  (  ,  ) template, where  ∈ {1, . . .,   + 1}. Figure 5d demonstrates an example of the parameter extraction from a single block, which corresponds to 12.9 milliseconds as the average amplitude and 170 samples as the width.Creation.In this step, our purpose is to automate the creation of noise blocks with a given width and amplitude using the SMC execution.As explained in Section 2.3, an SMC execution flushes the entire instruction cache and Translation Look-aside Buffer (TLB) as well as the pipeline.Consequently, frequent flushes slow down the memory accesses, which results in high access times for the cache readings.First, we explain how the SMC execution can be created using an assembly code, and then, automated noise block creation with a parameterized SMC code is given.

S3) Look-up Table
In Algorithm 1, the pseudo-code for the SMC execution is demonstrated with an assembly code.The main block of the  function is located between lines 15-36, which takes two pointers as inputs.In lines 1-2, two pointers are defined to allocate a buffer in the memory.The first pointer,     __ indicates the starting position of the buffer, and     __ refers to the end position of the buffer.The rdi and rsi registers point to the     __ and     __ , respectively.A payload in line 9 returns NOP instructions after each call.The buffer pointers in rdi and rsi are stacked into r8 and r10 registers to store the start and the end position of the buffer.The register r8 is used to track the address of the current buffer pointer, which is the same as rdi at the beginning.In line 25, the rsi register loads the effective address of the payload's return statement.The "rep movsb" instruction (line 26) copies one byte from the source location into the destination location and repeats rcx number of times.The rsi register indicates the source location and the destination location is pointed by rdi, specifying the current location of the buffer.As the rcx is set to 64, 64 bytes of NOP instructions are stored in the buffer.The call r9 instruction executes all 64 bytes of NOP instructions starting from the address pointed through register r9.The current buffer position tracked by r9 is updated during each iteration in order to move the buffer pointer forward by 64 bytes.These steps are repeated 100 times until the value of rax is decremented to zero.The call r9 instruction causes the mispredictions as the previous instructions still remain in the pipeline.Due to this speculative execution, the entire L1 instruction cache is flushed.
In order to make the noise generation deterministic, two input parameters are used to control the effect of the SMC execution on the cache accesses.These two parameters are the number of repetitions (  ) and the sleep time (  ).We observed that if the SMC execution is repeated more frequently, the cache accesses timings increase, leading to a noise block with a higher amplitude.In contrast, the amount of sleep between SMC executions increases the width of the noise block since the SMC execution takes more time.Therefore, we use these two parameters as input to the SMC execution (lines 3-5) to create more deterministic noise blocks.
We profile the effect of   and   parameters on the ℎ(  ,  ) and  (  ,  ) values of a noise block.For this purpose, we create pairs of   and   values starting from [50 0] and increased by 50 up to [1300 2000].We recorded the resulting ℎ(  ,  ) and  (  ,  ) values for each combination of   and   pairs, which results in 1107 pairs in total.
During the individual noise block generation process, we leverage the look-up table to choose appropriate   (  ,  ) and   (  ,  ) pairs to be used in the practical noise generation.As an example, if the  (  ,  ) is segmented into   windows, we extract a set of parameters, {ℎ, } (  ,  ) with the dimension of   × 2. Finally, the look-up table converts the width and amplitude template into {  ,  } (  ,  ) pairs with the same dimension of   × 2. The overall process of creating {  ,  }  (  ,  ) pairs is represented as follows: S4) Practical Noise Template and Noisy WF Generation.In this step, we execute Algorithm 1 in parallel with the cache occupancy channel profiling.Algorithm 1 takes the {  ,  } (  ,  ) pairs as input and generates precise practical noise templates,  .The injected noise adds most dominant features for the target website   .In order to compare our practical noise templates  with average noise templates  , we calculate the absolute difference between each sample in  and  , and then, the difference is divided by  sample value.This process is repeated for all the  −  pairs, which leads to 33.6% error rate.Note that, this deviation is expected due to the misalignment and high variance in the practical noise templates.The example average and practical noise templates for Google Chrome are given in Figure 9, Appendix A.8.As generating noise in a physical system often introduces residual noise due to inherent properties of the system architecture, this error rate is expected.Finally, DefWeb injects practical noise into the original WFs during the data collection and website rendering.Hence, Algorithm 1 is executed in parallel with the cache occupancy channel attack conducted by the attacker, generating noisy cache access timings.Thus, the generated practical noise is added to the original WFs.

Practical Noise Evaluation
In this section, we evaluate the performance of DefWeb for both closed-world and open-world scenarios in the same experiment setup, as given in Section 3.3.In real-world setup, the user enters the website URL into the noise generation application that opens the browser to initiate the webpage rendering.Once the rendering process begins in the browser, SMC-based noise generation is automatically started.It generates practical noise based on the noise templates, as detailed in the previous section.Throughout this process, an attacker JavaScript code [32] also executes concurrently within the browser.During the data collection process, the specified target websites (given in Table 8) are visited while the browser cache remains enabled.Each website undergoes a profiling process for 30 seconds.Note that in the simulation setup, we assume that there is no misalignment and system noise during the noise injection process since it is performed in the Python environment.
In the closed-world scenario, DefWeb achieves a considerable decrease in classification accuracy when the model is retrained with the noisy WF dataset.The classification accuracy for 100 websites drops to 28.8%, 29.7% and 5.2% accuracy for Chrome, Firefox and Tor, respectively.DefWeb outperforms random noise injection methods such as cache-sweeping [32] and interrupt injection [4] as given in Table 6.The re-trained model accuracy drops more than three times while previous techniques can only achieve up to 1.54 times accuracy decrease, which shows the applicability of DefWeb in a real-world scenario.
We also test the efficiency of DefWeb with the increasing number of websites in the Chrome browser.We add 50 more websites to the target website list and generate noise templates with VAE.The classification accuracy for 150 websites drops to 24% when the CNN model is retrained with the noisy dataset.The decrease in accuracy is expected since the model needs to learn a more complex dataset with more websites.The difference between simulation-based and practical noise is considerably high due to the limitations of SMC-based noise injection.In the simulation setup, we add noise with negative values (Table 4) to mitigate WF Attacks.However, in the real-world setup, we encounter several limitations associate with SMC-based noise injection, which include (1) limitations on the upper bound of noise generation, (2) the inability to create negative values of noise in the real-world, and (3) misalignment between simulation-based noise and SMC-based noise templates.These factors collectively contribute to the existence of an accuracy gap between the simulation and real-world setups.
To be specific, the first reason is the upper bound of the SMCbased noise generation, which is 13 ms.However, extracted noise templates for the sensitive websites include noise amounts of more than 13 ms, which limits the efficiency of the practical noise.The second reason is the difficulty of converting high-activity website fingerprints to low-activity fingerprints.Since the cache readings are already considerably high compared to other websites, it is more difficult to generate a high amount of noise in a given time resolution (5 ms in our case).Hence, all websites cannot be converted into another website with high accuracy.The third reason is the misalignment between simulation-based noise templates and the SMC-based noise generation process.The noise injection location is affected by the network speed that changes during the measurement collection.

PERFORMANCE OVERHEAD
Additional noise injected during the WF collection eventually slows down the website rendering process.Cache Shaping [21] was recently proposed to obfuscate cache access timings by introducing dummy I/O operations.However, Cache Shaping causes large performance overhead due to the excessive amount of repeated dummy I/O operations.The previous defense techniques chose either decreasing the retrained learning model accuracy with highperformance overhead (up to 71.8% [21]) or having relatively less overhead with a high retraining accuracy [4,32].However, DefWeb tackles this problem by injecting noise in certain time slots during the website rendering, which reduces both the performance overhead and retraining accuracy.Since DefWeb is only utilized during the website rendering, we utilize WebAPI functions and Selenium library [15] to measure the performance overhead introduced by DefWeb.The Selenium library is leveraged to render websites based on a URL.We record the starting time and end time of the website rendering process by monitoring performance.timing.navigationStartand performance .timing.loadEventEnd functions, respectively.
Since the utilized WebAPIs are available in both Chrome and Firefox browsers, the performance overhead can be measured in different browsers.
We leverage our tool to measure the loading time of 30 websites with and without DefWeb enabled in the system.The performance overhead varies between 0% and 100% for the visited websites as given in Figure 6.While some websites are not affected by the DefWeb execution, the loading time of one website increases twice.On average, the performance overhead incurred by DefWeb is 9.5%, which is significantly lower than Cache Shaping [21] and random interrupt injection [4] methods shown in Table 7.In the random interrupt injection method, the average loading time of the website gets increased by 15.7%, which is reduced 1.6× by DefWeb.The cache shaping defense slows down the CPU performance from 51.4% up to 71.8%, dependent on the efficacy (6) of their defense tool.However, our proposed DefWeb can maintain a steady efficacy with a performance overhead of only 9.8%.We observed that De-fWeb creates more performance overhead if the used noise template has more noise injected at the beginning of the website rendering.In other words, performance overhead depends on the used noise template, which is the reason for varying overhead results on different websites.The results demonstrate that our approach with intelligent noise injection, DefWeb, has a lower overhead compared with other noise-injecting defense mechanisms.to 10%.Since we have all these issues in the practical noise mechanism, our practical noise leads to higher classification accuracy compared to simulation-based noise injection.Limitations.The fundamental idea of DefWeb is to convert the fingerprints of source websites into the fingerprints of target websites by introducing intelligent noise.In this study, the source websites include the low-activity website and conversely, the target websites incorporate the high-activity websites.DefWeb is capable of converting a low-activity website into a high-activity one, but not the other way around.This limitation stems from the fact that we cannot create negative noise in a practical environment, as explained in Section 3.4.
In practice, the noise injection always has an additive impact on the existing signal.Therefore, it is not practical from an attacker's perspective to introduce some negative noise that will reduce the workload of the victim's system and decrease the amplitudes of the high-activity website fingerprint.However, we can still select separate high-activity websites as a source and a target.Even though it would lower the entropy for the classification model, the conversion between websites would be more successful, which can eventually decrease the classification rate.

RELATED WORK 7.1 WF Attack Mechanisms
Vila et al. [35] discovered new vulnerabilities targeted on the Google Chrome browser by leveraging shared event loops.PerfWeb [10] classified unique website fingerprints by monitoring hardware performance events using various ML techniques.Panchenko et al. [26] collected data traces by monitoring the loading process of the web page, particularly from the packet size, direction, and orders in the cell, TLS, and TCP layers.Dipta et al. [7] showed that dynamic frequency scaling values in the Linux and Android OSes provide distinct fingerprints for different websites.Zhang et al. [36] also leveraged Intel Running Average Power Limit (RAPL) to collect WFs and distinguish different websites.JavaScript-based cache attacks [24] showed that cache structure could be utilized to create unique WFs from the browser environment.Similarly, Shusterman et al. [32] introduced a cache attack from a web browser to profile the website activity in the LLC by accessing a memory region whose size is equal to the LLC size, namely, the cache occupancy channel.This attack achieves high website detection accuracy by using a low-resolution timer.However, Cook et al. [4] showed that the website fingerprints have unique behaviors due to system interrupts and they propose a loop-counting attack without any memory access as an enhanced version of sweep-counting attack [32].

WF Defense Mechanisms
Clock Resolution Reduction.Oren et al. [24] suggested reducing the resolution of the clock for mitigation of their browser-based attacks.However, Schwarz et al. [30] showed that an adversary can restore the resolution of the timer with free-running timers and blocking timers to construct timing primitives, which use clock interpolation and edge thresholding.Shusterman et al. [31] proposed CSS PnP attacks that can completely bypass defense techniques even though JavaScript is disabled.Therefore, cache partitioning [8,18] is a possible assuring mechanism to effectively prevent shared resources side-channel attacks.Cook et al. [4] proposed a randomized timer to mitigate their loop-counting attack.Random Noise Injection.Shusterman et al. [32] proposed a defense technique by injecting dummy cache activities during the WF collection.However, their approach is ineffective when an attacker retrains the DL model with noisy fingerprints.Li et al. [21] introduced a cache shaping technique by creating dummy I/O operations and randomly switching between the different amounts of I/O operations.Their technique creates high performance overhead due to the excessive number of parallel I/O operations.Gulmezoglu et al. [9] proposed an XAI-based defense technique by detecting the most useful features in the WFs.However, the proposed defense technique is not tested in a real-world setup.Cook et al. [4] also created a countermeasure technique that injects spurious interrupts to generate random noise.However, the proposed technique does not lower the re-trained model accuracy significantly.Browser-based Defenses.Aside from the aforementioned defense mechanisms, some defense mechanisms aim to restrict the attacker's capabilities in the browser environment.NoScript extension [23] restricts the JavaScript code execution on the browser.Furthermore, Snyder et al. [34] suggested a browser extension that enables selective banned high-hazard characteristics based on the website.Schwarz et al. [29] suggested a generic browser extension, JavaScript Zero, mitigating JavaScript-based side-channel attacks without negative effects on the website.

CONCLUSION
DefWeb demonstrates that intelligent noise injection can decrease the attacker model's accuracy significantly compared to random noise injection methods even though the attacker has access to the DefWeb created WFs.First, we demonstrate the decrease in DL model accuracy on a simulated environment to show the applicability of DefWeb.Next, we create an automated noise template generation tool based on the SMC execution that can be applied to different micro-architectures.DefWeb can create practical noise templates that have a similar shape to the simulated noise templates, which can obfuscate the features of the source website while injecting new features belonging to the target website.Our results show that we can achieve 1.6% and 29.7% classification accuracy in Firefox for simulation and practical experiment setups, respectively.Furthermore, the performance overhead introduced by DefWeb is similar to previous defense techniques while degrading the attacker's accuracy considerably.Finally, we show that DefWeb can also be applied to protect the users against different types of WF attacks such as loop counting attacks [4].

Figure 1 :
Figure 1: The structure of Variational Autoencoders (VAEs).VAEs consist of the Encoder and Decoder units.cache set, creating contention among different applications in LLC.Prime and Probe (PnP) attack[22,25] consists of three steps.First, the LLC sets are filled with attacker-controlled memory addresses.Next, the attacker waits for a certain amount of time until the victim accesses LLC.Finally, the attacker accesses the previously allocated memory regions and measures the access time.Based on these three main steps, the attacker can distinguish the active sets in the LLC by monitoring the access time differences.The cache occupancy channel[32] is a modified version of the PnP attack, in which an attacker allocates a large array equal to the LLC size and measures the access time to the whole memory region.This attack allows attackers to collect precise WFs even though the available time resolution is not high enough to measure the access time of each individual memory access.Since current browsers provide a coarse-grain timer resolution (2ms-100ms) and the cache occupancy channel is applicable through the browser sandbox environment, we consider the cache occupancy channel technique in our threat model throughout the study.

Figure 2 :
Figure 2: Threat model for the attacker and the defense mechanisms in offline and online phases.

Figure 3 :
Figure 3: Example of two websites,  = 2, are mapped in a  = 2 latent space.(a) Before adding distance vector, (b) After adding distance vector, (c) New noisy latent space vector,   Appendix A.3.  , is the distance vector that can be added to the  th fingerprint of   in the latent space   in order to convert it to a noisy fingerprint that has the characteristics of   .With this approach, the closest distance among each cluster   is determined, which eventually gives the minimal noise amount to convert an original WF to another WF in the  dimension.Consequently, we can move the fingerprints in the latent space, and then, the pretrained   unit can generate a new WF dataset only with significant features belonging to the targeted website.

Figure 4 :
Figure 4: Simulate Noise Template (SNT) injected.(a) Original WF Dataset: google.com,(b) Created Noise template, (c) Noise injected websites: 360.cn, nih.gov, and netflix.com.increase the cache reading values of low-activity WF by artificially introducing an additional constant amount of noise to all lowactivity WFs.The constant value is determined by the minimum noise amount multiplied by a constant 1/.This method enables DefWeb to convert high-activity WF to lowactivity WF with a higher success rate, which eventually drops the attacker's model accuracy.Finally, we prune our  by zeroing out the rest of the negative portion, which generates more realistic noise templates with values between 0 and the maximum value in  .Figure4demonstrates the noise injection into three different google.comWFs.Figure4arepresents collected WF datasets from google.comwebsite in Chrome browser.Figure4bshows different noise templates that are created with four steps to convert google.comfingerprints to 360.cn, nih.gov, and netflix.comwebsites.The new noisy WFs that belong to google.com are depicted in Fig-ure 4c.Thus, DefWeb dynamically hides the original features in WFs.Although original WFs come from google.com, the attacker's DL model cannot distinguish distinct features in the WFs due to the injected precise noise.

Figure 5 :
Figure 5: Visualization of the misalignment issue solution (S1) and the segmentation of  into smaller noise blocks steps (S2).a) The graph shows an  ( 1 , 87 ) to convert the WFs to vimeo.com from 360.cn.b) The graph shows the modified  ( 1 , 87 ) after the expansion to solve the misalignment issue.c) The segmentation of  ( 1 , 87 ) is visualized.d) ℎ(  ,  )  and  (  ,  )  parameters are extracted from every noise block.

Figure 6 :
Figure 6: Average website loading time with DefWeb in the Chrome browser.The horizontal red line represents the average overhead.

Figure 8 :
Figure 8: Histogram of misalignment values between same websites.

Figure 9 :
Figure 9:  vs.  .The dashed line shows the template obtained from the simulation.The solid line is the noise template generated by using SMC conflicts for the Google Chrome browser on the Intel Tiger Lake processor.

Table 1 :
Key Notations used in the text  = (  ,   ) VAE structure with encoder and decoder

Table 2 :
Latent space dimension selection based on the CNN model accuracy.  (  ) represents the reconstructed WF dataset without noise and  is described as the noisy WF dataset.
and  = 12, 000 samples for Tor browser.The data collection resolution is chosen as 5 ms.In Figure7( Appendix A.4), we show example website fingerprints collected with the cache occupancy channel.While some websites have distinctive cache patterns (google.comand nih.gov), others lack specific patterns (netflix.com).VAE Model Training: The encoder unit   has three 1-dimensional convolutional layers with kernel sizes 32, 64, and 128.The decoder unit   consists of three 1-dimensional transpose convolutional layers with kernel sizes of 128, 64, and 32.We analyze the effects of different latent space dimensions on pre-trained and re-trained CNN model accuracy.First, we reconstruct the entire dataset (  (  ))

Table 3 :
WF Accuracy (mean and standard deviation) obtained from Chrome, Firefox and Tor browsers with CNN and LSTM models compared with previous attacks

Table 4 :
Website classification accuracy (mean and standard deviation) obtained in the closed-world scenario.NNV: Noise added with negative values, NZO: The negative parts of the noise are zeroed out.

Table 6 :
[4]]arison of DefWeb with Cache-Sweep[32]and Interrupt Injection[4]defense techniques.The accuracy drops in the re-trained DL models are reported for each defense technique.Accuracy drops of Cache-Sweep and Interrupt-based defense techniques are taken from[4].DefWeb can also be applied in the open-world scenario by adding random noise templates during the fingerprint collection of nonsensitive websites.When random noise templates are injected, the open-world classification accuracy drops to 32.9% and 33.7% in Chrome and Firefox, respectively.These results show that DefWeb can significantly degrade the performance of retrained CNN and LSTM models.