Going Incognito in the Metaverse: Achieving Theoretically Optimal Privacy-Usability Tradeoffs in VR

Virtual reality (VR) telepresence applications and the so-called “metaverse” promise to be the next major medium of human-computer interaction. However, with recent studies demonstrating the ease at which VR users can be profiled and deanonymized, metaverse platforms carry many of the privacy risks of the conventional internet (and more) while at present offering few of the defensive utilities that users are accustomed to having access to. To remedy this, we present the first known method of implementing an “incognito mode” for VR. Our technique leverages local ε-differential privacy to quantifiably obscure sensitive user data attributes, with a focus on intelligently adding noise when and where it is needed most to maximize privacy while minimizing usability impact. Our system is capable of flexibly adapting to the unique needs of each VR application to further optimize this trade-off. We implement our solution as a universal Unity (C#) plugin that we then evaluate using several popular VR applications. Upon faithfully replicating the most well-known VR privacy attack studies, we show a significant degradation of attacker capabilities when using our solution.


INTRODUCTION
Recent years have seen explosive growth in research and investment into the "metaverse, " which comprises immersive augmented and virtual reality (AR/VR) applications that claim to realize the next major iteration of the internet as a multi-user 3D virtual environment.Such platforms, by their very nature, transform every movement of their users into a stream of data to be rendered as a virtual character model for other users around the world.
Since at least the 1970s, researchers have understood that individuals exhibit distinct biomechanical motion patterns that can be used to identify them or infer their personal attributes [13,39].Thus, attention has rightly shifted toward the unique security and privacy threats that metaverse platforms may pose, with recent studies showing that seemingly-anonymous VR users can easily and accurately be profiled [63] and deanonymized [54,64] from just a few minutes of tracking data.They further show that while "the potential scale and scope of this data collection far exceed what is feasible within traditional mobile and web applications" [63], users are less broadly aware of security and privacy risks in VR than they are of similar risks in traditional platforms like social media [11].
Of course, data privacy challenges are not unique to VR.Nearly every major communications technology advancement of the past century has been accompanied by corresponding privacy risks.For example, on the web, browser cookies pose a widely understood risk to privacy by attaching identifiers and tracking users across websites [8].However, the maturation of web technologies has also brought an enhanced understanding of, and countermeasures to, such attacks, with technologies private browsing (or "incognito") mode in browsers providing users with vital defensive tools for reclaiming control of their data.By contrast, equivalent comprehensive privacy defenses have yet to be developed for the metaverse.We thus find ourselves now in the dangerous situation of facing unprecedented privacy threats in VR while lacking the defensive resources we have become accustomed to on the web.
In this paper, we aim to begin addressing this disparity by designing and implementing the first "incognito mode" for VR.Our method leverages local -differential privacy to provide quantifiable resilience against known VR privacy attacks according to a user-adjustable privacy parameter .In doing so, it allows for inherent privacy and usability trade-offs to be dynamically rebalanced, along a theoretically optimal continuum, according to the risks and requirements of each VR application, with a focus on the targeted addition of noise to those parameters which are most vulnerable.
We provide an open-source implementation of our solution as a Unity plugin, which we then use to replicate three existing VR privacy attack studies.Our results show a significant degradation of attacker capabilities when using our extension.
Finally, we provide statistical bounds for the perceived error that users may experience when using our technique.We argue that these bounds are well within the range that VR users can naturally adapt to according to past research on homuncular flexibility [87].security and privacy of eye tracking [37,40].We place a limited emphasis on eye tracking in this paper, as features such as foveated rendering are not yet widespread, and there are already known effective countermeasures [14,34,43,47,77].We similarly set aside the privacy of full-body motion capture systems [44,59,67,72].Instead, we focus on the simple setup of a headset plus two handheld controllers, as is found on most consumer VR devices today.Comprehensive Attacks.The attacks most relevant to this paper are those of the 2020 Miller et al. "TTI" 1 study [54], the 2022 Nair et al. "MetaData" study [63], and the 2023 Nair et al. "50k" study [64].First, the TTI study demonstrated that 511 seemingly-anonymous VR users could be deanonymized with 95% accuracy from just 5 minutes of tracking data.The MetaData study expanded on this result, showing that a malicious VR application can also ascertain more than 25 private data points from its users, including various environmental, demographic, and anthropometric attributes.Finally, the 50k study showed that 55,541 VR users can be uniquely identified with 94.33% accuracy from 100 seconds of motion data collected from the popular "Beat Saber" VR game.Together, the below attributes are those that the literature suggests can be harvested from VR users and that our techniques aim to protect: • Anthropometrics: height, wingspan, arm lengths, fitness, interpupillary distance, handedness, reaction time [63].• Environment: room size, geolocation [63].

Metaverse Threat Model
We present a threat model to contextualize our contributions within the broader ecosystem of VR privacy.Our model is adapted from the standard model proposed by Garrido et al. [27].We consider a target user who interacts with the metaverse over multiple usage sessions.The parties which could plausibly observe a session are: • A (I) Hardware Attacker, which controls the hardware and firmware of the target user's VR device, and thus has access to raw sensor data from the VR hardware.• A (II) Client Attacker, which controls the client-side VR application running on the target user's device, and thus has access to data provided by the device APIs.• A (III) Server Attacker, which controls the external server used to facilitate multi-player functionality, and thus receives a stream of telemetry data from the client.• A (IV) User Attacker, which represents another end-user of the same VR application, and thus naturally receives from the server a stream of data about the target user.
In our model, the goals of an attacker are to correctly observe attributes of the target user, or to identify them across multiple sessions.Fig. 1 shows that the four attackers lie on a continuum; the later attackers have less privilege and attack accuracy, but can more easily conceal their attacks.Generally, each attacker inherits a subset of the capabilities of the previous attackers as data streams become increasingly processed and filtered at each step.

(I) (II) (III) (IV)
Decreasing Capability & Fidelity Increasing Ease & Concealment In this paper, we present algorithmic statistical defenses for the vulnerable attributes of §2.1 that can be implemented at either the device firmware or client software level.Tab. 1 shows the attackers covered by each implementation possibility.In practice, lacking any special access to VR device firmware, our evaluated systems were all implemented at the software level.

Software Incognito Firmware Incognito
Table 1: Coverage of proposed defenses.
Overall, the "VR incognito mode" defenses proposed in this paper are unable to address the threat of hardware and firmware level attackers.We argue that this is a necessary concession of a softwarebased defense, and that unlike the client, server, and user attackers we cover, hardware and firmware attacks can be discovered via reverse engineering.Still, in an ideal world, VR devices would contain hardware-based mechanisms for ensuring user privacy.As it stands, VR firmware is tightly controlled and not alterable by researchers without cooperation from OEMs, who are presently disincentivized from implementing hardware-level privacy protections.

Private Web Browsing
We now detour briefly to the more mature field of private web browsing to seek inspiration from the web privacy solutions which have stood the test of time.
The research community has surveyed the field of web privacy [58,81], and identified observable attributes ranging from tracking cookies [8] and HTTP headers [41] to browsing histories [45] and motion sensor data [89].As in VR, these attributes can be combined to achieve profiling [23,28], fingerprinting [41] and deanonymization [90].Further, the attack model used by web privacy researchers resembles the metaverse threat model presented in §2.2, with most defenses focusing on web servers and other users, some on clientside applications, and relatively few on the underlying hardware.
In response to these threats, proposed solutions have included proxies, VPNs [35], Tor [17,36], and, of course, private browsing or "incognito" mode in browsers, as well as dedicated private browsers and search engines, e.g., Brave [1] and DuckDuckGo [19].Of these solutions, "incognito mode" stands out due to its ease of use: a wide range of defensive modifications to protocols, APIs, cookies, and browsing history can all be deployed with a single click [4].Due perhaps to this outward simplicity, surveys of web privacy protections used in practice have found private browsing mode to be by far the most popular at 73% adoption [30].
In summary, web privacy is highly analogous to metaverse privacy; although the data attributes being protected are vastly different, the threat of combining attributes to profile and deanonymize users is a constant, as is the threat model used to characterize both fields.On the other hand, the size and scope of data collection in VR potentially exceed that of the web [63], while users are simultaneously less aware of the threat in VR [49], and the equivalent privacy tools are not generally available.We are motivated by the popularity of incognito mode on the web to seek an equivalent for VR, with the same fundamental goal as in browsers: allowing users, at the flick of a switch, to become harder to trace across sessions.

Differential Privacy
Having established our motivation for pursuing a metaverse equivalent to "incognito mode," we now lay out the tools necessary to enable its realization.Chief among these is differential privacy [21], which provides a context-agnostic mathematical definition of privacy that statistically bounds the information gained by a hypothetical adversary from the output of a given function M (•): Definition 1. (-Differential Privacy [22]).A randomized function M (•) is -differentially private if for all input datasets  and  ′ differing on at most one element, and for all possible outputs S ⊆ Range(M): A function M (•) fulfills differential privacy if its outputs with and without the presence of an individual input element are indistinguishable with respect to the privacy parameter  ≥ 0. In practice, a randomized function M (•) typically ensures differential privacy by adding calibrated random noise to the output of a deterministic function, M () =  () + Noise.Lower  values correspond to higher noise, making it harder to distinguish outputs and strengthening the privacy protection.In addition to , the required noise is affected by the sensitivity (Δ) of the deterministic function.
Another aspect worth highlighting is sequential composition [22]: if M (•) is computed  times over  with   , the total privacy budget consumed is   .Thus, users' attributes become less protected with every query execution.Differentially private outputs are also immune to post-processing [22]; an adversary can compute any function on the output (e.g., rounding) without reducing privacy.
In practice, differential privacy can be used centrally, whereby a server adds noise to an aggregation function computed over data from multiple clients, or locally, whereby clients add noise to data points before sharing them with a server.While local differential privacy is noisier than the central variant, it also requires less trust of the server.Since servers are considered potential adversaries in our threat model ( §2.2), we use local differential privacy to protect VR users in this paper.Specifically, we implement local differential privacy using the Bounded Laplace Mechanism [22,31] for continuous attributes and randomized response [86] for Boolean attributes.
Bounded Laplace Mechanism.The Laplace mechanism [22], also known as the "workhorse of differential privacy," [31] is a popular method of implementing local differential privacy for continuous attributes.Laplacian noise satisfies a stronger notion of -differential privacy than Gaussian noise, which only satisfies a weaker (, )-differential privacy [91].However, its unbounded noise can yield semantically absurd edge cases (e.g., a negative value for the height attribute).Thus, in this paper, we use the Bounded Laplace mechanism [31], which transforms the noise distribution according to the privacy parameters and deterministic value, then samples outputs until a value falls within pre-determined bounds without compromising differential privacy.Inputs that fall outside the bounds are automatically clamped to the nearest bound.Additionally, we employ the modified sampling technique of Holohan et.al [32] to avoid a known vulnerability associated with the use of finite floating-point in other differential privacy implementations [57].
Randomized Response.To achieve local differential privacy for Boolean attributes, we can apply the randomized response method from Warner [86]: (i) the client flips a coin, (ii) if heads, the client sends a truthful response, (iii) else, the client flips a coin again and sends "true" if heads and "false" if tails.This method has been shown to be ( = ln 3)-differentially private with a fair coin [22], though one can vary  by changing the bias of the first coin flip.

Homuncular Flexibility
While differential privacy can be used to quantifiably address the problem of data leakage from VR telemetry, it does so by introducing noise to the VR data, thus potentially degrading the user experience.However, past research on "homuncular flexibility" has shown that users can learn to control bodies that are different from their own, particularly in virtual reality [2,87].Thus, the remainder of this work focuses on deploying differential privacy in VR in a way that users can rapidly learn to ignore.By transforming the virtual object hierarchy according to known usable non-linear interaction techniques (e.g., the Go-Go technique [68]), the corresponding attributes (e.g., wingspan) can be obscured while allowing users to flexibly adapt to their new environment.

VR PRIVACY DEFENSES
In this section, we provide a differentially-private framework for user data attribute protection in VR.We define each attribute defense in terms of abstract coordinate transformations, without regard to any specific method of implementation.Later, in §4, we describe a concrete system for implementing these defenses within VR applications via a universal Unity plugin.
Our "incognito mode" defenses aim to prevent adversaries from tracking VR users across sessions in the metaverse.In practice, this means limiting the number of data attributes adversaries can reliably harvest from users and use to infer their identity.Local differential privacy (LDP) is the primary tool that allows us to achieve this with a mathematically quantifiable degree of privacy.LDP has the effect of significantly widening the range of attribute values observed by an adversary given a particular ground truth attribute value of a user.In doing so, it ensures that the observable attribute profile of a user always significantly overlaps with that of at least several other users, thus making a precise determination of identity infeasible.The noise added by LDP may have some negative impacts on user experience, as is the case with incognito mode in browsers.However, users can tune the privacy parameter () to reduce the impact of noise on user experience as required.
Upon initiating a new metaverse session (i.e., connecting to a VR server), the defenses generate a random set of "offset" values, which are then used throughout the session to obfuscate attributes within the VR telemetry data stream through a set of deterministic coordinate transformations.The re-randomization of offset values at the start of each session ensures that all usage sessions of a user are statistically unlinkable. 2 On the other hand, these offsets remain consistent within a session to ensure adversaries never receive more than one view of sensitive attribute values.
What follows are the specific differentially-private coordinate transformations that protect user data attributes (and thus allow them to "go incognito") in VR.While for simplicity this section considers the protections for each attribute in isolation, in practice, our implementation uses a relative transformation hierarchy to allow any set of enabled defenses to seamlessly combine with each other (see §4.5).The coordinates used throughout this paper refer to the left-handed, Y-up Unity coordinate system, pictured in Fig. 2. 2 Methods for tracking users that are not unique to VR (such as via their IP addresses) are not considered to be within the scope of this paper; defenses like VPNs are widespread.

Preliminaries
In our setting, LDP protects against adversaries with knowledge of observed attributes across all user sessions except for the current session of a target user ( ′ ).Sequential composition allows us to provide an upper bound for a user's privacy budget as the sum of each  used per attribute.
We identified the Bounded Laplace mechanism [31] as our tool of choice for protecting continuous attributes like height, wingspan, and room size in VR because it produces random noise centered around the sensitive value (e.g., height) while preserving the semantic consistency of the attribute (e.g., height > 0).The Laplacian noise distribution is preferable over, e.g., simply imbuing uniformly distributed random noise, because it has the property of minimizing the mean-squared error of any attribute at a given privacy level () [38], thereby minimizing its impact the user experience.
Where Boolean attributes are concerned, we use randomized response [86] with a weighted coin to provide -differential privacy for chosen values of .The use of randomized response over simpler mechanisms (e.g., a single coin flip) aligns Boolean attributes with the same -differential privacy framework as continuous attributes, and thus allows the  values of multiple attributes to be combined into a single "privacy budget" if desired.
Throughout this paper, we use the following standard variable notation in our algorithm statements: For a given attribute  (e.g., height), we use  ′ (e.g., height ′ ) to denote the LDP-protected value an adversary observes.Our use of local differential privacy requires Δ to cover the entire range of the bounded interval [, ] (Δ = | − |).Alg. 1 contains helper functions for the mechanisms discussed here that will be used throughout §3.

Continuous Attributes
Using the preliminaries established above, and in particular the Bounded Laplace mechanism, we now describe coordinate transformations for protecting continuous attributes in VR.Each defense begins by calculating an offset using the LDPNoisyOffset helper function before diverging into two distinct categories: additive offset defenses, which protect attributes such as interpupillary distance (IPD) that are not expected to change over the course of a session, and multiplicative offset defenses, which protect attributes like observed height that might be updated each frame.

Additive Offset
There are two continuous attributes that we can protect by simply adding a fixed offset value to the ground truth as a one-time transformation: interpupillary distance (IPD), and voice pitch.The use of an additive offset is sufficient to protect these attributes without impacting usability due to the relatively static nature of such attributes throughout a session, with the resulting defenses being shown in Alg. 2.
IPD.We start with IPD as it is amongst the easiest attributes to defend due to the fact that it should not reasonably be expected to change during a session.Our suggested countermeasure to attacks on IPD defends the player by scaling their avatar such that when an adversary measures the gap between their left and right eyes, the distance will correspond to a differentially private value.
Voice Pitch.An attacker can also fingerprint a VR user by observing the median frequency of their speech as measured by a microphone on their VR device, which they can use in particular to infer a user's gender in addition to simply being a unique identifier.Thus, we suggest pitch-correcting the voice stream according to the differentially-private offset.As with IPD, the attacker can now only observe a differentially private pitch + offset value.Incidentally, we found that this defense is also sufficient to confuse machine learning models which attempt to infer the user's ethnicity based on their accent (see §5), though that effect may be less resilient.
Studies which focus entirely on speech privacy [92] have presented more sophisticated techniques for obfuscating voice than the ones discussed here, but we include this differentially-private defense for completeness given the inclusion of speech attributes in VR attack papers [63].

Multiplicative Offset
We now turn our attention to the bulk of attributes for which a multiplicative offset is required.Consider, for example, the case of wingspan, where the perceived distance between a user's hands should appear to be 0 when their hands are touching, but should reflect wingspan+offset when their hands are fully extended.Simply adding offset to the distance in all cases, as per the additive offset approach, is insufficient to achieve this property.Instead, we scale the entire range of values by  ′ / as shown in Fig. 3.As a result, observable attributes attain a differentially-private value at their extremes, while their zero-point is maintained.We present in this section multiplicative offset defenses for a variety of attributes, as summarized in Alg. 3. Height.A typical method for inferring the height of a VR user is to record the y-coordinate of the VR headset ( ℎ ) over the course of a session, and then use the highest observed coordinate (or, e.g., the 99th percentile) as a direct linear correlate of height.This attack is effective because  ℎ = height when a user is standing upright, which they generally are for a large portion of their session.

Multiplicative offset
While one may be tempted to simply adjust  ℎ by offset at all times, doing so could cause the relative error of a fixed offset can grow to become disproportionate in applications where users are required to get close to the ground.In fact, in an extreme scenario where a user decides to lie flat on the ground, an adversary may observe  ′ ℎ = 0 + offset, which could defeat the privacy of this method by revealing offset.
Therefore, our suggested countermeasure is to use a multiplicative offset, whereby  ′ ℎ =  ℎ * (ℎℎ ′ /ℎℎ).When  ℎ = ℎℎ, the adversary now observes the differentially-private value  ′ ℎ = ℎℎ +offset, while  ′ ℎ = 0 when  ℎ = 0 as shown in Fig. 4. We also suggest adjusting   and   such that the relative distance between the user's head and hands appears to remain unchanged.Squat Depth.Prior works have shown that an adversary can assess a proxy of a user's physical fitness by covertly prompting the users to squat and measuring their squat depth, i.e., ℎ = ℎℎ −  ℎ , where  ℎ is the lowest headset coordinate recorded during the squat.The aim of this defense is to ensure that an adversary can only observe a differentially private ℎ value.While this could be achieved by setting a strict lower bound on  ℎ , doing so has the potential to be disorienting and could potentially have a negative impact on the VR user experience perspective.Instead, our suggested defense offsets  ℎ using the following transformation (independent of any defenses to height): Wingspan.The wingspan attribute is harvested in a similar way to height, with an adversary monitoring the distance  between the left and right controllers over the course of a usage session and using the maximum observed value of  as a strong correlate of the user's wingspan.A VR application could require a user to fully extend their arms for seemingly legitimate gaming purposes, thus revealing their wingspan to potential attackers.The defense must therefore modify the observed distance  when the user's arms are extended.However, as discussed at the start of this section, simply adding a fixed offset to  does not allow  = 0 when the user's hands are touching, which is desirable for UX.
In function Wingspan of Alg. 3, we formally introduce our recommended defense, where arm  and arm  are the arm length measurements in VR.As with our protection of squat depth, we ensure that the noise scales smoothly to preserve the user experience.As a result, when the user's hands are at the same coordinates, the observed distance is 0; thus, when the user touches their physical hands, the virtual hands also touch.On the other hand, when the arms are extended completely, the real-time distances between the controllers and their midpoint become   = arm  and   = arm  , where   +   = .In such a position, the observed wingspan becomes differentially private: The defense adds half the total offset to each arm.Consequently, the adversary will only observe a differentially private wingspan value when using the controllers' coordinates ((  ,   ) and (  ,   )) to calculate the distance: In VR research, this is known as the "go-go technique" [68]; here, we use a small scale factor to obscure the user's wingspan (rather than to extend reach).As with the other multiplicative offset defenses, post-processing immunity protects the sensitive values when multiplied by   ∈ [0, 1], and the adversary can only learn  ′ from the observed distances in the range [0,  ′ ].
Arm Length Ratio.If an adversary manages to measure the wingspan of a user, determining the arm length ratio is possible by using the headset as an approximate midpoint.As function Arms of Algorithm 3 shows, the corresponding defense is almost equivalent to that of the user's wingspan, but while the wingspan protection adds noise symmetrically to both arms, in this case, we add noise asymmetrically to obfuscate the ratio of arm lengths.This reflects a unique deployment of the go-go technique with different scale factors used for each arm to obscure length asymmetries.
Room Size.Lastly, previous works have demonstrated that an adversary can determine the dimensions of a user's play area by observing the range of their movement.Once again, an additive offset would fail to defend against this attack by simply shifting the user's position rather than affecting their movement range.We therefore employ a similar technique as with the other multiplicative offset transformations in that the dynamic noise at the center of the room is 0, which increases as the user approaches the edges of their play area.
When the user is at the center of the room, ( ℎ ,  ℎ ) = (0, 0), the offsets are 0. When the user is at a corner of the room, e.g., at ), the offsets become half the noise added to each room dimension ( Noise  2 , Noise  2 ).Consequently, the adversary can only collect the noisy room dimensions, e.g., for width: Thus, the adversary would only learn a differentially private room dimension from observing  ′ ℎ in the range [0, ℎ ′ 2 ], with the same being true of length.Note that offsets added to  ℎ and  ℎ are intentionally chosen independently so that the adversary cannot even learn the proportions of the room.
Security Arguments.We conclude by arguing why the multiplicative offset approach maintains differential privacy, emphasizing that applying a fixed offset multiplicatively is very different from resampling the random offset value.Proposition 1.Given an single individual's ground truth value  ∈ [, ] collected locally once, where  and  are the lower and upper bounds of possible values of , and an offset N sampled once from a differentially private distribution, broadcasting any  ′ =   ( + N) to a server protects  with differential privacy, where  ∈ [0, ] is a real-time value continuously generated locally.
Proof: Firstly, an adversary cannot learn the sensitive value from the ratio   ∈ [0, 1] without knowing .Thus, an adversary can only learn  + N from the possible stream of broadcasted values  ′ = {0, ...,  + N} sent to the server.Given that N is sampled from a differentially private distribution s.t. +  is centered around ,  + N is immune to post-processing and is thus differentially private [22].□ To provide a concrete example, consider again the attribute of height:  = ℎℎ,  ′ = ℎℎ + offset,  =  ℎ .Given that ℎℎ ′ is differentially private, an adversary who does not know the user's current  ℎ value (between 0 and ℎℎ) will only be able to observe the current  ′ ℎ value (between 0 and ℎℎ ′ ), which cannot be used to find ℎℎ. 1 Function Height( ℎ ,   ,   , ℎℎ, ,  ℎ ,  ℎ ):

Binary Attributes
We now switch our focus to attributes like handedness which can be represented as Boolean variables.For such attributes, we deploy the RandomizedResponse function of Alg. 1.If randomized response suggests an untruthful response, the user's virtual avatar is mirrored for other users, as is their view of the virtual world.While the user can still interact with the world and other avatars normally, we found that this approach comes at the cost of all text appearing to be backwards absent any special corrective measures.
Handedness.An adversary may observe a user's behavior, e.g., which hand they use to interact with virtual objects, to determine their handedness over time.Mirroring the user's avatar randomly on each VR session obfuscates handedness.
Arm Length Asymmetry.Using a mirrored avatar also provides plausible protection against adversaries observing which arm is longer; however, there is a large degree of overlap between this defense and that of arm length ratio.

Network Communication Attributes
Finally, we turn our attention to network-layer attributes, namely latency, which can reveal geolocation via multilateration, and throughput, which can reveal the VR device model.Such attributes are extremely difficult to protect with differential privacy due to their one-way boundedness; for example, while we can add artificial delay to increase perceived latency, there is no way to decrease the latency of a system below its intrinsic value, which would be necessary to provide differential privacy based on the ground truth.Instead, we resort to clamping, which has the effect of grouping observed attribute values into distinct clusters that effectively anonymize users within their cluster.The defenses of this section are not intended to be a primary contribution of our paper, but are a necessary component of a complete VR privacy solution.
Geolocation.A server attacker can observe the round-trip delay of a signal traveling between a VR client device and multiple servers to determine a user's location via multilateration (hyperbolic positioning).Furthermore, prior works suggest a user attacker can also use the round-trip delay of the target's audio signal as a proxy for latency.In response, our defense clamps the latency of all broadcasted signals to a fixed round-trip delay by artificially delaying each packet.Due to the sensitivity of hyperbolic positioning, even a 1 ms offset can skew the adversaries' prediction by ≈ 300 km.
Reaction Time.Likewise, adversaries can measure a user's reaction time by timing the delay between a stimulus (e.g., a visual or audio cue) and the user's response.In addition to being a further identifying metric, reaction time is also highly correlated with age [88].
While the technical defense for reaction time is largely equivalent to that of geolocation, the specific clamping values are different because the sensitivity of the underlying attributes vary greatly.If the defenses for geolocation and reaction are simultaneously enabled, the higher latency clamp should be applied to protect both.
Refresh/Tracking Rate.Finally, a server attacker can use the telemetry throughput to ascertain the VR headset's refresh and tracking rate and thus potentially identify the make and model of a user's VR device.Moreover, user attackers can leverage a VR environment with moving objects that users perceive differently depending on their refresh rates to determine the refresh rate of the VR display.Thus, our defenses clamp the rate at which the VR device broadcasts its tracked coordinates to obfuscate the true device specifications.
Summary.While our aim in this section was to be as thorough as possible with regard to covering known VR privacy attacks, we by no means claim to have comprehensively addressed every possible VR privacy threat vector.Instead, we hope to have accomplished two simple goals.Firstly, we believe the combined defenses of this section are sufficient to significantly hinder attempts to deanonymize users in the metaverse.Within a large enough group of users, adversaries may have to combine dozens of unique attributes to reliably identify individuals; the absence of the lowhanging attributes discussed herein should obstruct their ability to do so.Secondly, we hope that the attributes covered in this section were diverse enough, and the corresponding defenses flexible enough, to be extended to future VR privacy threats.

VR INCOGNITO MODE
In this section, we introduce "MetaGuard, "3 our practical implementation of the defenses presented in §3 and the first known "incognito mode" for the metaverse.We built MetaGuard as an open-source Unity (C#) plugin that can easily be patched into virtually any VR application using MelonLoader [50]. 4 We begin by describing the options and interface made available to MetaGuard users.We then discuss our choice of DP parameters (, bounds, etc.) and outline how MetaGuard calibrates noise to each user.Finally, we describe the concrete game object transformations applied to the virtual world to implement the defenses of §3.Fig. 5 shows a mixed reality photo of a player using the MetaGuard VR plugin within a VR game.

Settings & User Interface
The main objective of MetaGuard is to protect VR user privacy while minimizing usability impact.The flexible interface of Meta-Guard (shown in Fig. 6) reflects this goal, allowing users to tune the defense profile according to their preferences and to the needs of the particular VR application in use.Specifically, we expose the following options: (A) Master Toggle.The prominent master switch allows users to "go incognito" at the press of a button, with safe defaults that invite (but don't require) further customization.
(B) Feature Toggles.The feature switches allow users to toggle individual defenses according to their needs; e.g., in a game like Beat Saber [6], users may wish to disable defenses that interfere with gameplay (i.e., wingspan and arm lengths), while keeping the other defenses enabled.
(C) Privacy Slider.Lastly, we present users with a "privacy level" slider that adjusts the privacy parameter () for each defense, allowing users to dynamically adjust the inherent trade-off between privacy and accuracy when using the defenses of §3.Users can choose from the following options, which we generally refer to simply as the "low, " "medium, " and "high" privacy settings: • High Privacy, intended for virtual telepresence applications such as VRChat

Selecting Epsilon Values & Attribute Bounds
As discussed in §2.4, the level of privacy provided by the defenses of §3 depends on the appropriate selection of DP parameters, namely , Δ, and attribute bounds.Although our approach in MetaGuard is to allow users to adjust the privacy parameter () according to their preferences, we must nevertheless translate the semantic settings of "low, " "medium, " and "high" privacy into concrete -values, noting that a given privacy level may translate to a different -value for each attribute depending on its sensitivity to noise.Furthermore, the specific lower bound () and upper bound () of each attribute (and thus Δ = | −  |) must be determined in order to use the Bounded Laplace mechanism.This section outlines our method of selecting these values, with the results shown in Tab. 2.

Selecting 𝜀-Values & Clamps
Continuous Anthropometrics.We conducted a small empirical analysis using the primary authors of this paper 5 to select appropriate -values for each of the continuous anthropometric attributes at each privacy level.We began by selecting three VR applications (VRChat [33], Tabletop Simulator [25], and Beat Saber [6]) that represent the most popular examples of the intended use cases for the high, medium, and low privacy modes respectively.We then tested a wide range of -values for each attribute in each application while monitoring their effect on usability.For example, in Beat Saber, we had both a novice and expert-level player complete the same challenges at different -values to evaluate the impact of noise on in-game performance.By contrast, in VRChat, we were simply interested in the impact of noise on the ability to hold a conversation (e.g., to maintain virtual "eye contact").Next, we analyzed the concrete privacy impact of candidate  choices by simulating attackers at a variety of -values.For example, Fig. 7 illustrates that for the height attribute, the vast majority of privacy benefit is already realized at  = 1.We combined these results with the findings of our usability analysis to produce the final -values shown in Tab. 2 according to the appropriate balance of privacy and usability for the intended use of each level. 5The authors include one novice VR user and one expert.
Binary Anthropometrics.For attributes where the defenses of §3 suggest the use of randomized response, we selected -values such that the corresponding prediction accuracy was degraded by 15%, 50% and 85% at the low, medium, and high privacy levels.Voice.Although technically a continuous anthropometric, vocal frequency cannot be calibrated via playthroughs due to the lack of a tangible impact on gameplay performance.Instead, we selected -values which degraded inference of gender by roughly 25%, 50% and 75% at the low, medium, and high privacy levels respectively.Clamps.Finally, for attributes where the corresponding defense of §3 suggests clamping, we chose clamp values which have the effect of anonymizing users within progressively larger groups.For example, for refresh/tracking rate, we selected clamps which hide users within the set of high (90Hz [26]), medium (72Hz [52]), and low (60Hz [75]) fidelity VR devices.For the latency-related attributes, we selected values below the perceptible 100ms threshold [9,56,60] that significantly decreased prediction accuracy.

Selecting Attribute Bounds
Finally, beyond , the Bounded Laplace mechanism also requires attribute bounds to constrain the outputs to semantically consistent values.We used public datasets to obtain the 95th percentile bounds for anthropometric measurements [10,18,70,73]; our use of local DP causes Δ to reflect the full range of possible values.For room size, we extracted the bounds from official VR setup specifications [85].We list the bounds and corresponding references in Tab. 2. We emphasize that the sole purpose of our informal experimentation in this section is to set a reasonable range of -values that cover a variety of VR use cases.Given the lack of consensus on a formal method for selecting DP parameters [20], our choices simply serve to establish a plausible spectrum of -values corresponding to our perceived boundaries of the privacy-usability trade-off.The power to select exactly which point on this spectrum is best suited for a particular application remains with the end user.

Rerandomization & Linkability
By default, we suggest randomly resampling offset values according to the algorithms of §3 at the start of each session.Assuming that MetaGuard users cannot be linked across sessions, adversaries will be unable to aggregate measurements across multiple sessions to obtain user data.Alternatively, one-time randomization can be used, allowing linkability but assuring no attribute leakage occurs.

Calibration & Noise Centering
One final parameter is required to successfully implement the continuous attribute defenses of §3: the ground truth attribute values of the end user.Centering the Laplacian noise distribution around the ground truth attribute values of the current user has the effect of minimizing noise for as many users as possible, particularly those who are outliers, thus achieving theoretically optimal usability.
To achieve this, the MetaGuard extension calculates instantaneous ground truth estimates upon instantiation using the method shown in Fig. 8. Specifically, the OpenVR API [82] provides Meta-Guard with one-time snapshot locations of the user's head, left and right eyes, left and right hands, and a plane representing the play area.Estimates for the ground truth values of height, wingspan, IPD, room size, and left and right arm lengths can then be derived from these measurements.We note that the privacy of MetaGuard is not dependent on the accuracy of the ground truth estimates, which exist only to ensure that the added noise is not more than the level necessary to protect a given user.

Defense Implementation
We now finally provide a complete description of our "VR Incognito Mode" system for implementing the defenses of §3 in light of the interface, -values, bounds, and calibration procedures described above.Our implementation follows two phases: a setup phase, which executes exactly once on the frame when a defense is enabled, and an update phase, which executes every frame thereafter.x P ,z P : 1

World
x S : 2 y P : 3 y P : 4 x P ,z P : 5 x P ,z P : 6 x P ,z P : 5 x P ,z P : 6 x S : 7 x P ,y P ,z P : A x P ,y P ,z P : B x P ,y P ,z P : C  Setup Phase.When a defense is first enabled, MetaGuard uses the calibration procedures of §4.4 to estimate the ground truth attribute values of the user.These values are then used in combination with the -values and bounds of §4.2 to calculate noisy offsets corresponding to each privacy level using the methods outlined in §3, and are then immediately discarded from program memory (with only offsets retained) so as to minimize the chance of unintentional data leakage.By default, the Unity game engine uses telemetry data from OpenVR [84] to position game objects within a virtual environment, which are then manipulated by a VR application.During the setup phase, the system modifies the game object hierarchy by inserting intermediate "offset" objects as shown in Fig. 9.
Update Phase.During the update phase, the system first checks which defenses the user has enabled in the interface (see §4.1).For all disabled attributes, the corresponding offset transformations in the game object hierarchy (as shown in Fig. 9) are set to the identity matrix.For each enabled feature, the system implements the corresponding defense of §3 by fetching the noisy attribute value calculated during the setup phase for the currently-selected privacy level and enabling the relevant coordinate transformation on the inserted offset objects such that the observable attribute value matches the noisy attribute value.Specifically, Fig. 9 illustrates how the position of each game object is defined with respect to another object in the hierarchy, and how the defenses modify the relative position or scale of each object with respect to its parent.

SYSTEM EVALUATION & RESULTS
In this section, we demonstrate the effectiveness of the defenses introduced in §3 by evaluating their impact on the accuracy of a theoretical attacker.To do so, we faithfully replicated the attacks of the TTI [54], MetaData [63], and 50k [64] studies to measure their accuracy both with no defenses and with the MetaGuard extension at the low, medium, and high privacy levels.The results of this evaluation are summarized in Tab. 3 of §A.The presented accuracy values represent what a server attacker could achieve, and also provide an upper bound for the capabilities of user attackers.

Evaluation Method
We obtained from the original authors anonymized frame-by-frame telemetry data recordings of the 511 users from the TTI [54] study, 30 users from the MetaData [63] study, and 55, 541 users from the 50k [64] study.Using this data, we could virtually "replay" the original sessions exactly as they occurred, and were able to reproduce the identification and inference attacks described in the original studies with nearly identical results.Next, we repeated this process for each session with MetaGuard enabled at the low, medium, and high levels.The resulting decrease in attack accuracy for each attribute at each privacy level is shown in §A.
To emulate a realistic metaverse threat environment, we streamed telemetry data from the client to a remote game server via a WebSocket.The MetaGuard extension was allowed to clamp the bandwidth and latency of this data stream as discussed in §3.The network-related attacks were then run on the server side.
Beyond the attacks which deterministically harvest sensitive data attributes, all three studies use machine learning to identify users or profile their demographics.We used sklearn to replicate the published methods as closely as possible, using the same model types and parameters as in the original papers.Once again, we replicated the original results with similar accuracy, with the decrease in identification corresponding to the use of the low, medium, and high privacy levels of MetaGuard being shown in Tab.3C of §A.

Ethical Considerations
Other than the -calibration effort described in §4.2, which was performed by the authors, this paper does not involve any original research with human subjects.Instead, our results rely on the replication of prior studies using anonymous data obtained either from public online repositories or directly from the authors of those studies.We verified that all original studies from which we obtained data were non-deceptive and were each subject to individual ethics review processes by OHRP-registered institutional review boards.Furthermore, the informed consent documents of those studies explicitly included permission to re-use collected data for follow-up studies, and we strictly followed the data handling requirements of the original consent documentation, such as the promise to only publish statistical aggregates rather than individual data points.

Primary & Secondary Attributes
Continuous Anthropometrics.Tab.3A shows that our defenses effectively reduce the coefficients of determination to values below 0.5 for the targeted continuous attributes.We found that physical fitness (squat depth) is the most challenging attribute to protect while preserving user experience, as it shows the smallest drops in prediction accuracy.The remaining attributes show significant decreases in attack accuracy even at the low privacy level: IPD (−67.53%),room size (−55.89%within 2m2), wingspan (−33.07%within 7 cm) and height (−16.93%within 5 cm).
Binary Anthropometrics.An advantage of the randomized response technique is precise control over attacker accuracy levels by choosing the values of .Unsurprisingly, the prediction accuracy of handedness (92.5%, 75%, and 57.5% for the low, medium, and high privacy levels) corresponded to the chosen -values.
Network Attributes.The prediction accuracy of the attributes dependent on latency and throughput dramatically dropped thanks to clamping (except for reaction time, which showed a modest accuracy drop of 8.3% at a low privacy level).Altogether, the low accuracy of these predictions significantly impedes the ability of adversaries to determine which VR device an individual is using.

Inferred Attributes
The machine learning models of the MetaData study primarily use the attributes discussed above as model inputs to infer demographics.Clearly, the reduction in accuracy of these primary attributes will have a negative impact on the accuracy of inferences based on them; nonetheless, we ran the models on the noisy attributes to quantify this impact.The results show significant accuracy drops in predicting gender (−23.5%),age (−58.25%),ethnicity (−48.75%), and income (−73.85%),even at the lowest privacy setting.Most importantly, the three identification models simulating an attacker identifying a user amongst a group all had a significant drop in accuracy (see Tab. 3C); thus, MetaGuard empirically succeeds at its primary goal of preventing users from being deanonymized.

DISCUSSION
In this study, we set out to design, implement, and evaluate a comprehensive suite of VR privacy defenses to protect VR users against a wide range of known attacks.In the absence of any defenses, these attacks demonstrated the ability to not only infer specific sensitive attributes, but also to combine these attributes to infer demographics and even deanonymize users entirely.
Through our evaluation of MetaGuard, our practical implementation of a "VR incognito mode" plugin, we have demonstrated that -differential privacy can pose an effective countermeasure to such attacks.Our results show a considerable accuracy reduction in the identification and profiling of users using real VR user data from 56,082 participants across three popular VR privacy studies.By evaluating our system using telemetry data from these existing studies, we were able to independently measure the performance of each defense at each supported privacy level, a feat that would otherwise have required an infeasible number of laboratory trials.
MetaGuard allows users to "go incognito" by randomizing their fictitious measurements, such as height and wingspan, at the start of each new session, thus thwarting cross-session likability.Alternatively, if users do not mind being linked across sessions, they do not need to re-randomize their fictitious measurements between sessions, allowing adversaries to track them across sessions without revealing their true attribute values in the process.
Our use of bounded Laplacian noise allows us to achieve a theoretically optimal balance between privacy and usability, minimizing the mean squared tracking error a user is expected to experience for a given privacy level () [22,31].This, in turn, allows us to leverage homuncular flexibility to implement the defenses in a way that users can rapidly learn to ignore [2,87].For example, the average wingspan offset at the medium privacy level is 4.5 cm, which is well within the range that VR users can flexibly adapt to [68].Even those transformations which do not directly affect the player model can be thought of as equivalent to body modifications.For example, room size is not necessarily implemented as a body manipulation, but changing the room-to-avatar ratio can be thought of as equivalent to changing the size of the entire avatar and thereby hiding the relative size of the room.As such, we expect homuncular flexibility to be applicable to such transformations as well.
Overall, MetaGuard constitutes the first attempt at producing a privacy-preserving "incognito mode" solution for VR.Grounded in theoretical privacy, and demonstrated using thorough empirical evaluation, we aim to provide a solid foundation for future work in this area.The importance of privacy-enhancing software like MetaGuard will become more pronounced as current market trends make virtual reality increasingly ubiquitous and shape the next generation of the social internet, the so-called "metaverse" [61,71,76].As it stands, VR device manufacturers have been observed selling VR hardware at losses of up to $10 billion per year [69], presumably with the goal of recouping this investment through software-based after-sale revenue, such as via targeted advertisement [3,12].
But despite using the terms "attacker, " and "adversary" throughout our writing, it's likely that such actions would in practice be entirely above board, with users agreeing (knowingly or otherwise) to have their data collected.It is more important than ever to give users the ability to protect their data through technological means, independent of any warranted data privacy regulations, in a way that is as easy to use as the privacy tools used on the web today.

Limitations. Our decision to base our evaluation on data from
prior studies means that we inherit the biases of the original studies.In particular, the test subjects of the studies from which our data is derived were not perfectly representative of the general population of VR users.While our evaluation method does precisely replicate the telemetry stream that would have been generated by the original participants were they using the MetaGuard extension, it does so under the assumption that their use of MetaGuard would not have changed their behavior.The accuracy of MetaGuard could be somewhat diminished if it turns out that users modify their behavior to compensate for the added noise.Further, our study considers a limited set of data attributes, which may not be comprehensive with respect to the attributes inferable in VR.MetaGuard may not be effective at protecting attributes beyond those that we directly considered.Finally, the mean-squared-error definition of "usability" by which our system is theoretically optimal may in some cases fail to align with the true user experience in VR.
Future Work.Lacking access to VR device firmware, we implemented the MetaGuard extension described in this paper at the client software layer, providing an effective defense against server and user attackers.In future work, we believe the same defenses could be easily applied at the firmware level, allowing data to also be protected from client attackers.However, protecting data from hardware or firmware-level adversaries will likely require entirely different methods to the ones presented in this paper.
While our aim in this paper was to be as comprehensive as possible when addressing VR privacy attacks, there were a few niche VR hardware features that we specifically excluded.Future systems could extend the techniques of this paper to less common VR accessories, such as pupil tracking and full-body tracking systems, that we did not address in this work.Moreover, we think it is necessary to enlarge the body of known VR privacy attack vectors, and we hope the framework of the MetaGuard extension is modular enough to support the implementation of their corresponding defenses.
An important aspect of the MetaGuard system is the ability for users to toggle individual VR defenses according to the requirements of the application being used.While this process is entirely manual in our implementation, in the future, the "incognito mode" system could be configured to automatically profile VR applications and determine which defenses are appropriate for a given scenario.Furthermore, the application could incorporate the differential privacy concept of a "privacy budget, " adding more noise to enabled attributes to compensate for the privacy loss of disabled attributes and maintain the same level of overall anonymity.Our method of selecting -values was somewhat informal, in part due to the lack of a quantitative metric of usability impact for noise in VR.Therefore, we look forward to future work that performs user studies to rigorously quantify the impact of adding noise to various attributes on the VR user experience, so as to better shed light on the costs vs. benefits of noisy mechanisms like differential privacy in VR.
Finally, there are methods other than differential privacy that, while relinquishing the provability of our approach, may produce a better experience for the end user.In the future, we hope to evaluate techniques that utilize machine learning to develop corruption models that hide user data while maintaining functionality.

RELATED WORK
We analyzed a large number of VR/AR/XR security and privacy literature reviews [3,7,15,16,24,42,48,61,66,78,79] to asses the current state of the art with respect to metaverse privacy attacks and defenses.The variety of attacks mentioned in these works were a major motivation for producing this paper, as discussed in §2.1.
With respect to defenses, there are a limited number of studies proposing the use of differential privacy in VR.Related works have primarily focused on using differential privacy to protect eye-tracking data [14,34,43,47,77] without regard to other types of VR telemetry.For example, Steil et al. [77] and Ao et al. [47] use differential privacy to protect visual attention heatmaps, while Johnn et al. [34] proposes the use of "snow" pixels to obscure the iris signal and prevent spoofing while preserving gaze.
A few of the defenses proposed in this paper have also previously been discussed outside the context of VR.For example, Avery et al. [5] discuss defenses against attacks inferring handedness in the context of mobile devices, and Sun et.al [92] proposed countermeasures to inferring attributes from speech in mobile applications.
In summary, MetaGuard fills an important gap in the VR privacy landscape, not only by being the first to defend various anthropometric, environmental, demographic, and device attributes, but also in general by presenting a comprehensive usable metaverse privacy solution rather than focusing on any one particular data point.

CONCLUSION
In this paper, we have presented the first comprehensive "incognito mode for VR." Specifically, we designed a suite of defenses that quantifiably obfuscate a variety of sensitive user data attributes with -differential privacy.We then implemented these defenses as a universal Unity VR plugin that we call "MetaGuard."Our implementation, which is compatible with a wide range of popular VR applications, gives users the power to "go incognito" in the metaverse with a single click, with the flexibility of adjusting the defenses and privacy level as they see fit.
Upon replicating well-known VR privacy attacks using real user data from prior studies, we demonstrated a significant decrease in attacker capabilities across a wide range of metrics.In particular, the ability of an attacker to deanonymize a VR user was degraded by as much as 96.0% while using the MetaGuard extension.
Over the course of decades of research in web privacy, private browsing mode has remained amongst the most ubiquitous privacy tools in popular use today.We were inspired by the success of "incognito mode" on the web to produce a metaverse equivalent that is just as user-friendly, while serving the same fundamental purpose of helping users remain untraceable across multiple sessions.We hope our open-source MetaGuard plugin and promising results serve as a foundation for other privacy practitioners to continue exploring usable privacy solutions in this important field.

Figure 4 :
Figure 4: Use of additive vs. multiplicative offset for height.

Algorithm 3 :
Local differential privacy for continuous attributes with multiplicative offsets.

Figure 5 :
Figure 5: Mixed reality photo of a player using "MetaGuard," our implementation of incognito mode for VR.

Figure 7 :
Figure 7: Coefficients of determination of height from predictions on actual vs. noisy data as  increases.

Figure 9 :
Figure 9: Game object hierarchy with existing (dark grey) and inserted (light grey) game objects, and coordinate transformations used to implement VR Incognito Mode defenses.