XRLoc: Accurate UWB Localization to Realize XR Deployments

Understanding the location of ultra-wideband (UWB) tag-attached objects and people in the real world is vital to enabling a smooth cyber-physical transition. However, most UWB localization systems today require multiple anchors in the environment, which can be very cumbersome to set up. In this work, we develop XRLoc, providing an accuracy of a few centimeters in many real-world scenarios. This paper will delineate the key ideas that allow us to overcome the fundamental restrictions that plague a single anchor point from localization of a device to within an error of a few centimeters. We deploy a VR chess game using everyday objects as a demo and find that our system achieves 2.4 cm median accuracy and 5.3 cm 90th percentile accuracy in dynamic scenarios, performing at least 8× better than state-of-art localization systems. Additionally, we implement a MAC protocol to furnish these locations for over 10 tags at update rates of 100 Hz, with a localization latency of ~1 ms. We have additionally open-sourced our system's codebase at https://github.com/ucsdwcsng/xrloc.git


Introduction
Extended Reality (XR), broadly encompassing virtual, augmented, and mixed reality technologies, can potentially revolutionize fields such as education, healthcare, and gaming [5,79,93].The primary ethos for XR is to provide immersive, interactive, and realistic experiences for users.A key component of delivering this user experience is to transfer the physical world into the virtual space.For example, our everyday spaces and objects can be transformed into video game assets (like tennis racquets, swords, or chess pieces) for interactive gaming applications. 1To enable these applications, we find a common thread -any XR system should localize and track objects in an environment.Specifically, this object-tracking system needs to satisfy three key requirements to realize XR applications: R1.Ease of anchor deployment: Any asset localization system must have low deployment efforts, which can potentially be embedded within common electronics like TVs or soundbars.This single module should be smaller than 1 m. 2 R2.Accurate and reliable: Assets must be localized to an accuracy within a few centimeters in room-scale scenarios.We place a stringent requirement of a few centimeters of accuracy to provide a glitch-free user experience.Providing immersive XR experiences consequently means small user or object tracking errors are more obvious and severely impede the adoption of XR [91].Specifically, the localization system must be reliable during movement, under occlusions, and consistently track assets within an accuracy of a few cm.R3.Multi-asset low latency localization: Finally, an XR system needs to localize multiple objects in an environment in real time.In dynamic scenarios, this can mean we must localize tens of objects with a 60-80 Hz update rate as people naturally perceive their surroundings at 60-75 Hz [21], and delays in updates of object locations in a dynamic scenario can break away from an immersive experience.
However, none of the existing asset localization systems meet these three key requirements to deliver XR applications in everyday scenarios (see Table 1).Camera and visual sensors are susceptible to poor lighting and visual occlusions, consequently failing to provide reliable localization (R2).Additionally, deploying a camera-based system can be privacy invasive [68] in home and public settings.Acoustic systems [50] provide accurate localization but are difficult to localize multi-asset with low latency simultaneously (R3).Radar systems [41,60,96] can provide low-latency object tracking from a single module but fail to track occluded objects or those which have small radar cross-sections (RCS).Some RFID systems have succeeded in realizing low latency [52,73,99].Their asymmetric architecture (cost-effective tags and expensive readers) better suits large-scale deployments in retails and industrial sectors.However, long-range RFID systems (> 6m) are expensive and bulky to integrate into consumer electronics, precluding wide-scale deployments (R1).
Altenatively, many single RF module localization solutions [15,25,27,31,32,43,48,55,75,88,89,100] leveraging WiFi/BLE or ultra-wideband (UWB) are easy to deploy because of transceivers which can be inexpensively deployed in consumer electronics.However, they fail to provide the necessary cm-level accuracy.None of the existing systems simultaneously satisfy all three stringent requirements to enable XR applications, and prior art will be more carefully considered in Sec. 8.
To address the need for XR-compliant localization, we develop XRLoc, which consists of two parts -a localization tag, attachable to objects of interest, and a single localization module to furnish few-cm level locations from a single vantage point.The localization module is less than 1 m and can be easily incorporated within everyday electronics such as televisions or soundbars (satisfying R1).It leverages the tag's single UWB transmission for a few cm accurate localization.An accompanying MAC protocol also supports the localization of multiple tags at an update rate of 100 Hz (satisfying R3).An example deployment of XRLoc is showcased in Fig. 1, where beverage cups are attached with off-the-shelf UWB tags.XRLoc is leveraged to transform an office space into a life-sized chess board, with these cups taking the place of chess pieces and localized with cm-level accuracy.A video demo of this case study is also included as well 1 .However, to simultaneously meet all the aforementioned requirements, we need to solve four key challenges: 1. Geometric dilution of precision: In most UWB localization systems, three or more UWB anchors need to be placed in diverse locations in a room to localize the UWB tag, increasing deployment efforts and breaking away from R1.Alternatively, we can place these UWB anchors within a single localization module constrained to a 1 m space.However, reducing the spatial diversity can worsen the localization accuracy by 10×.This accuracy degradation is called 'geometric dilution of precision' [77] (GDOP).A potential strategy to overcome GDOP is to borrow techniques from RFID-systems [52,73,99] that achieve real-time cm-scale accuracy from a single RFID reader.However, we observe UWB systems provide 15× worse measurement accuracy compared to RFID systems [99], owing to an RFID system sharing the same clock at the transmitter and receiver (monostatic architecture).Hence a direct consequence of GDOP is a XRLoc's reduced resilience to measurement noise, which precludes us from directly borrowing techniques from RFIDbased systems.
To reduce our measurement noise, we could increase transmit power to improve signal quality, increase transmission length for better averaging, or choose better hardware with lower noise floors.However, these solutions come at the cost of increased battery consumption at the tag, increased localization latency, or expensive tag design, respectively.Alternatively, XRLoc makes a key observation when looking at the phase difference of the received UWB signal measurements (PDoA) between a pair of anchors -PDoA measurement quality can be improved proportionally to the distance between the pair of anchors.This simple observation forms the cornerstone of XRLoc's design and allows us to satisfy the first requirement R1. 2. Ambiguous location predictions: However, this improved PDoA measurement quality comes at a detrimental cost -increasing the anchor spacing creates multiple ambiguous location predictions as phase measurements wrap around at 2.The changes in these ambiguities mirror the changes in the true location of the tag, and they do not affect tracking systems [13,87], which leverage phases to provide cm-level tracking accuracy for handwriting recognition.However, incorrectly choosing an ambiguous absolute location can degrade the accuracy by several tens of centimeters and may create glitches within the XR system.
To predict accurate locations despite phase wrap-around, XRLoc leverages a simple observation -unlike phase measurements, time of arrival measurements do not suffer from ambiguity.Specifically, the time difference of arrival (TDoA) between a pair of anchors, although inaccurate in furnishing cm-level localization, can help to detect and filter out ambiguities.By cleverly fusing these time-difference and phase-difference measurements, XRLoc can provide cm-level accurate locations from a single UWB transmission and satisfy the second requirement R2. we find that hardware biases can corrupt our location estimates and degrade our location accuracy by over 2×.Specifically, through empirical measurements, and as observed in previous studies [19], UWB modules [20] suffer from a distance-sensitive measurement bias.We model, estimate, and calibrate for these biases via a three-point calibration procedure.We fuse the time and phase measurements with a corrected PDoA and TDoA measurement model by leveraging a particle filter to provide cm-accurate and low-latency location estimates, satisfying R2. 4. High update rate multi-tag operation: In addition to providing low-latency localization, XRLoc must furnish locations for multiple objects in the environment.Often, the UWB transmissions for localization from multiple tags in an environment can cause packet collisions at XRLoc's module.The collision causes localization failure 25% of the time.We leverage a low-power wireless side channel to alleviate packet collisions to design a power-efficient medium access control (MAC) protocol.Specifically, XRLoc deploys a LoRabased MAC to support consistent localization for tens of tags at over 80 Hz, satisfying R3.
XRLoc brings together these key techniques to build a 1 m sized module, consisting of 6 Decawave DW1000 [63] UWB modules for localization, along with a Semtech LoRa SX1272 [72] to furnish a side-channel for the MAC protocol.Additionally, we prototype a simple UWB + LoRa Tag using the Decawave EVB1000 and a LoRa SX1272.Through extensive evaluations, we find that XRLoc satisfies all the three stringent requirements with (1) Static localization error with median and 90th percentile accuracy of 1.5 cm and 5.5 cm, an improvement of 9.5× and 2 Why is this problem hard?
We have established the need for localizing users and objects within a few centimeters of a single vantage point.In this section, we will find that restricting our sensing to within a space of 1 m reduces our geometric diversity leading to localization errors of many 10's of centimeters.This phenomenon is commonly referred to as geometric dilution of precision.We will explore the use of three common UWB measurementstwo-way-ranging (TWR), time-difference-of-arrival (TDoA), and angle-of-arrival (AoA) -and find systems that rely on these measurements fail to furnish the required accuracy.Additionally, we'll explore fusing and jointly optimizing for these measurements to improve localization accuracy.However, even this measurement fusion is insufficient.To test this hypothesis, we build a simple simulation environment described below.
Simulation environment: We perform extensive simulation in a 3 × 3 environment, a standard room size, to find the best case localization accuracy.We use 6 UWB transceivers, placed either diversely in the environment (red diamonds in Fig. 2 (a)) or in a limited space near the bottom wall (see Fig. 2 (b, c, d)).Next, we divide this space into a 1 mm grid and place tags in each position to measure the location accuracy.The pixels of the 'heatmaps' represent these tag locations, and the pixel color intensity quantifies the median localization accuracy across 100 simulated trials.
Simulating TWR: Many UWB radios measure the time of flight (ToF) of the signal between the transmitter and receiver with up to a resolution of 15.6 ps [22].The ToF is measured via multiple packet exchanges, taking at least 0.3 ms [17].
And clock drifts at the receiver during this TWR event can lead to a ToF measurement deviation of 150 ps for a 0.  deviation of 1.5 • , as independently verified in [37,101].Consequently, we simulate our AoA measurements as zero-mean Gaussian with 1.5 • standard deviation.

Quantifying localization errors
TWR, or distance measurements between a tag and multiple receivers placed diversely in an environment, can be used to trilaterate the tag's position to achieve a few cm-level accuracies.From Fig. 2(e), we find that the median localization error is 2.9 cm.Additionally, this error is consistent (with a variation of a few centimeters) across the space (see heatmap in Fig. 2(a)).However, when we place all the receivers within a 1 m linear form factor to satisfy R1, we find that the accuracy degrades by over 8× as compared to the diverse antenna placement.Additionally, we observe a non-uniform performance with errors as large as 1 m.To meet R2, we have made our localization system too erroneous to be usable.The fundamental reason for the performance degradation is the reduced geometric diversity when the antennas are closer.With the antennas placed around the environment, trilateration is more resilient to errors in distance measurements.We quantify the localization errors by leveraging TDoA or AoA measurements and summarize the results in Fig. 2(e).We find under low geometric diversity, the median localization errors can be close to 54.4 cm and 40.9 cm for TDoA and AoA measurements, respectively.

Fusing all measurements
Similar to many robotics applications [4], we can use TWR, AoA, and TDoA measurements to provide higher accuracy.This fusion is done by jointly optimizing the error function from TWR, AoA, and TDoA [49] measurements.Specifically, in Fig. 2(c), we measure 6 TWR measurements from each receiver (red diamonds), 3 AoA measurements from each closely-spaced pair of UWB receivers, and 3 TDoA measurements between one antenna from each of these paired groups.The measurement-fusion efforts provide median localization of 23.3 cm.However, it still fails to meet our criteria of a few-cm error in localization.
None of the existing states of art systems can surmount the challenge of localizing from a single vantage point and deliver the stringent requirements set forth by our application use case.In XRLoc, we develop the algorithm (Sec.3) and prototype a system (Sec.4 and 5) to achieve this smallform-factor, high accuracy (median accuracy of 3.3 cm as seen from Fig. 2(d) and Fig. 2(e)), and multi-asset localization system, for use within VR systems and immersive audio applications.In the following section, we will delineate the key ideas which allow XRLoc to circumvent the challenges posed by geometric dilution of precision.

Circumventing low-spatial diversity
In the following sections, we tackle the fundamental challenge in single-vantage point localization.First, we will explore improving our phase measurements to improve location accuracy by increasing the antenna separation (Sec.3.1).However, this comes with the unintended side-effect of introducing ambiguities to our location prediction.So, we explore the use of time difference of arrival (TDoA) measurements to combat these ambiguities (Sec.3.2).Finally, we explore fusing these measurements in an accurate and low-latency fashion by leveraging a particle filter (Sec.3.3).By exploring the key ideas here, XRLoc will fulfill R2 and furnish few-cm level localization.

Improving localization resolution
The prior learning from Sec. 2 is that we reduce our resiliency to noise when we try to localize tags from a single vantage point.Lacking spatial diversity adds vulnerability to the optimization creating large outlier measurements and preventing few-cm scale localization.However, when we have two closely (less than half-wavelength) separated antennas, we can find the phase difference (Δ) between this pair as where  is the incoming angle of arrival w.r.t. to the normal of this pair of antennas,  =  2 is the distance between them, and  is the wavelength at 3.5 GHz UWB center frequency. 3owever, the typical UWB phase has a resolution of around 8 bits 4 , which provides a phase resolution of 1.4 • , and consequently a localization resolution of 2.1 cm at a distance of 3 m from the localization module.However, increasing the inter-antenna separation, , linearly increases the measured phase difference.We can leverage this to improve our localization resolution to the ∼ 1 mm limit when the antenna separation is 1 m.
Prior works [13,87] have leveraged this fact to increase accuracy for handwriting tracking purposes.But, widening this separation comes at the cost of introducing more phase ambiguities.This is apparent when we return to the AoA equation and observe that our phase-difference measurements, Δ, wrap over 2 for a larger separation than half-wavelength separation for angles between −90 • <  < 90 • .This is not an issue for tracking purposes, where the changes in location of these ambiguities mirror the true changes in the location and continue to provide a similar trajectory estimate.However, for XRLoc, we find predicting and tracking these incorrect locations can degrade the localization accuracy by an order of magnitude to several tens of centimeters.

Ruling out ambiguities
To overcome ambiguities, a simple solution is adding more antennas between the two we have placed so far.These additional antennas will help eliminate phase ambiguities by reducing the consecutive antenna distance while employing a 1-m antenna array aperture.Fig. 3 (a, b) depicts these ambiguities that exist in such a system by showing the likely positions of the tag.Considering the simulation environment from Sec. 2, we deploy two arrays with spacing 33.3 cm and 25 cm for the same antenna aperture of 1 m.Next, we deploy a tag at the center of the space and predict its potential locations (pixel color intensities) in both scenarios.We observe that keeping the same aperture of 1 m, we have similar measurement errors (peak widths) in both cases, consistent with our previous findings, but reducing separation creates fewer ambiguities.Deploying 23 antennas within this 1 m, each spaced half-wavelength apart, will remove all our ambiguities at the cost of increased hardware complexity.
Alternatively, we observe TDoA measurements are free from ambiguities and can potentially be leveraged to disambiguate the predictions from PDoA.Similarly to the previous PDoA images, in Fig. 3 (c, d), we only show the tag's location likelihoods when relying on TDoA measurements.The TDoA peak, although very erroneous (larger peak widths), is unambiguous.Additionally, increasing the number of antennas reduces this error/peak width.To recap, by reducing the antenna separation (or increasing the number of antennas), we increase the separations between the ambiguities coming from PDoA measurements and tighten our peak widths coming from TDoA.Consequently, at the correct antenna spacing, our ambiguous peaks will be wide enough to be rejected by our TDoA measurements.We find this sweet spot when we use 6 antennas, 4× fewer antennas than would have otherwise been required.

Jointly optimizing for TDoA and PDoA measurements
We can now extend the key intuitions to leverage TDoA and PDoA to develop a localization algorithm to meet our few-cm accuracy requirement.As further explained in Sec. 4, via careful engineering and hardware design choices, we measure PDoA with a standard deviation   = 5 • and TDoA with a standard deviation of   = 150.This measurements can be modeled as a zero-mean Gaussian: TDoA between Rx i and j :  , ∼ N (0,   ) PDoA between Rx i and j :  , ∼ N (0,   ).Additionally, given a candidate tag location, ì , and receiver locations ì   , ∀ ∈ [1, 2, . . .,  ] we can also compute the expected PDoA and TDoA as where ì  is the location of the tag and ì   /ì   are the locations of the 6 UWB antennas placed within a linear 1 m array. and  are the speed of light and UWB wavelength, respectively.Note here we forgo the far-field assumption made in Sec.3.1.
The location ( ì ) which gives the closest expected measurements to the actual measurements is the likely tag location, where ì   and ì   measure the error between our predictions and the actual measurements, and ) is a diagonal covariance matrix containing the TDoA and PDoA measurements standard deviations.Note here that since each receiver on XRLoc's localization module is independently measuring the TDoA and PDoA, we have a diagonal covariance matrix.
The simplest way to find this best tag location is to perform a grid search over our space to find the minimum point for Eq. 2. Aiming for cm-level localization, we choose a grid size of 1 × 1 mm.But this exhaustive search can be timeconsuming (around 61.2 s / location on a 12-core CPU), precluding real-time localization in dynamic situations.Alternatively, we can leverage gradient descent-based optimization techniques [26] to arrive at the most likely tag position.However, these techniques fail when we do not have a good initial estimate of the location, which is the case when looking to localize a tag in a large environment [28].
To surmount this challenge, we provide the final insightselectively searching over the large space instead can reduce the computation complexity for localization.The brute force approach unnecessarily searches over each grid point for every packet.We can instead sample our environment more sparsely and slowly converge to our ideal location over a few packets.This is, in fact, the key idea behind particle filters [2], which are commonly used in state estimation scenarios with highly non-convex error functions and poor initialization.
Armed with this insight, for the first packet we receive, we uniformly distribute a set of particles (500 particles/m 2 ) in our environment and compute the likelihood of these positions.When we receive consecutive packets, we can resample the set of particles with the highest probability and continue converging to our true locations.However, despite the fewer likelihood computations required, particle filters commonly furnish non-real-time estimates (with a latency of 7.2 ms on a 12-core CPU).To combat this problem, XRLoc adaptively re-samples and reduces the number of particles based on the current confidence of the estimate.As we do not know the tag's location, many particles are initially required to sample the search space uniformly.However, our particles converge close to the true location over time, improving our confidence in the location estimate.We can reduce the number of particles needed as we no longer need to explore the space uniformly.Empirically, this adaptive particle filter implementation converges within five measurements and provides a location estimate with a 1.2 ms latency on a 12core CPU.

Challenges with prototyping XRLoc
Additional considerations arise when employing the ideas from Sec. 3 while prototyping XRLoc using off-the-shelf components.First, we need to acquire low-noise phase measurements.In Sec.4.1, selecting the right clock is imperative to ensure a low phase noise.Second, due to hardware imperfections, we find that the expected PDoA measurements (Eq. 1) do not match the real-world measurements.To account for the offsets, we devise a calibration scheme and re-consider the formulation of the expected PDoA measurements in Sec.4.2.Finally, we explore the effects of multipath reflection on the TDoA measurements in Sec.4.3.

Acquiring accurate time and phase
Before prototyping XRLoc, we conducted extensive simulations to investigate the minimum phase and time acquisition accuracy needed to achieve few-centimeter positioning accuracy, assuming 6 antennas were equally spaced in a 1-meter region.In a 3×3 environment, we implemented the algorithm presented in Sec.3.3 at varying phase and time acquisition noise levels.Our simulation results are presented in Fig. 4(a), where the horizontal axis represents the standard deviation of the phase error, and the vertical axis represents the 50 percentile of the localization error.Each line shows the standard deviation of the time error.
From this simulation, we make two key observations.First, we see that time errors between 3-250 ps provide similar localization accuracy, and these lines are grouped in the plot.However, exceeding 300 ps in time error significantly increases localization error, as TDoA fails to segregate ambiguity made by PDoA.Second, these simulations clarify that few-cm level accurate localization requires high phase accuracy.Specifically, the red vertical line marks a threshold of 5 • of standard deviation in phase measurement needed to achieve few-cm accurate locations.
The synchronization clock is the main factor affecting this phase noise in our system.The phase of the UWB signal is measured by first down-converting the received signal with the carrier signal.It is measured relative to this carrier signal by the baseband processing unit [22].And when we consider measuring the PDoA, we look at the difference in the phase of any two receivers.In this situation, if both receivers share the same carrier clock, then the PDoA they measure will be induced purely from the relative distance traveled by the signals to each receiver (see Eq. 1).A simple way to achieve this is to connect the two receive antennas to the same UWB module [13].However, we observed the overhead of extracting the complete CIR when implementing these systems is large (∼ 1.2 ms), precluding low-latency localization.Specifically, we have the API overhead to measure the data and the data extraction overhead over USB, requiring 599 s and 612 s, respectively.
Alternatively, we prototype our system using independent UWB modules [63] for each receiver, eliminating the need to export CIR measurements.This reduces the data acquisition latency by ∼ 4× to ∼ 340 s.However, we cannot synchronize the carrier clocks on these independent modules, but instead, synchronize a lower 38.4 MHz clock leading to phase measurement errors.Via measurements with different clocks, we find that the phase noise in this input clock can largely influence the noise in the PDoA measurements.Specifically, from the oscillator's data sheet [18], we can obtain the phase noise of the oscillator,   ( offset ) where  offset is the frequency offset from the center frequency of the oscillator.Using the   ( offset ), the standard deviation of clock jitter,  jitter , can be expressed as follows.
where, Δ is the bandwidth of the measurement and  osc is the oscillator frequency.We measure the standard deviation of the phase error (  ) and time stamping error (  ) as: where,   is the sampling frequency,  t is the frequency of the clock used for to measure time-of-arrival and  is the speed of light.We can choose an appropriate clock to meet our phase and time measurement thresholds by modeling this noise behavior.Many off-the-shelf [1, 18] clocks satisfy these requirements at reasonable price points and employ [18] in prototyping XRLoc.For example, according to the datasheets provided by Crystek [18] and Abracon

Combating hardware biases
In Eq. 1, we provided an expression for the expected PDoA measurement if we know the underlying tag and receiver locations.In reality, however, we see a large deviation when we compare the expected PDoA measurements with true PDoA measurements.To verify this, we perform an experiment varying the distance of a tag from XRLoc's localization module.In Fig. 4(b), the green'RAW' measurements are shifted from black ground truth 'GND' measurements.Visually, we observe three deviations -a constant additive bias () which contributes to a downward shift, a multiplicative bias () w.r.t.distance affecting the slope of the line, and an exponential bias () w.r.t.distance affecting the curvature (non-visualized in the figure).We assume these biases result from the ADC saturation when the distances are too close and propose a 3-point calibration to compute these hardware-specific calibrations below.Subsequently, we modify our expected PDoA measurements from Eq. 1 as where,   ,   ,   are the calibration parameters and   = | ì  − ì   | is the distance between the tag and UWB receiver.We replace Eq. 1 with this updated expected PDoA equation for the particle filter described in Sec.3.3.
To estimate these calibration parameters, we perform a three-point calibration.First, we model the phase ( Φ) measured at each UWB module according to these biases as where Φ is the calibrated phase.Next, we measure the received phase (Φ) at each UWB receiver for three known locations within our space.Finally, we use regression to find the expected calibration parameters, which minimize the deviation between the measured and expected phases according to the above equation.

Handling multipath reflections
However, in common indoor settings, reflections of the RF signal can potentially lead to ambiguities in TDoA measurement [71].Despite our best efforts to acquire bias-corrected PDoA measurements, the presence of multipath can prevent us from ruling out ambiguous location predictions.However, UWB signals sample at the rate of 1 GHz, implying a time resolution of 1 ns.This fine-time resolution implies we are only corrupted by reflected paths whose additional travel distance is within 30 cm.In indoor environments, finding such close-by reflected paths is unlikely, and we find that our direct path and reflected signals are separable in the time domain.With this in mind, we measure the time of arrival and phase of the signals at the hardware reported first peak index, FPI [65], at the 6 UWB receivers in XRLoc's localization module.

Enabling multi-tag operation
Through the ideas presented in Sec. 3 and 4, XRLoc fulfills the first two requirements for a localization system to be compatible with XR applications -ease of deployment (R1) and accuracy (R2).However, when we extend the current system to localize multiple tags in an environment, packet collisions amongst various tags can detrimentally affect our localization rates, resulting in a packet drop of 25%.Alternative to allowing tags to transmit arbitrarily, we can schedule individual tags at specific time intervals and leverage timedivision multiple access (TDMA) to prevent collisions.
We seek to enable a total localization rate of 1000 Hz at XRLoc's receiver means localizing a 1000 tags at a rate of 1 Hz or 10 tags at 100 Hz.Specifically, we explore leveraging low-power wireless technologies [35,70] as a side channel for MAC protocol operation.A MAC controller needs to perform three tasks -onboarding new tags, providing time synchronization, and applying corrections to tags that deviate from their time slots.Existing systems [8,54,81] leverage UWB signals for providing this MAC control.However, we observe when a large number of tags need to be onboarded or corrections to the tag's time slots need to be made, frequent collisions between UWB beacons for localization and UWB transmission for MAC control can exacerbate the problem we seek to solve.Alternatively, we propose using an additional side-channel leveraging low-power wireless technologies [35,70] to simplify the MAC control and allow for independent tag management and localization functions.UWB is known to have high power consumption (e.g., about 416 mW for DW1000) during reception due to its use of wide bandwidth and despreading processing [9].From the viewpoint of low power reception, using LoRa (e.g., about 20 mW for SX1280) or BLE (e.g., about 16 mW for nRF52832) for the side channel is practical.We employ LoRa as a side channel to furnish reliable and low-power MAC control for multiple tags.LoRa and UWB are at 900 MHz and 3.5 GHz, allowing them to co-exist with minimal interference.We also note that alternative side channels like BLE be employed; however, we choose to implement this prototype with LoRa given its simplicity.
The MAC protocol consists of two components -a LoRa MAC controller (gateway), which is deployed along with the localization module we have built so far, and a LoRa Receiver (LoRa RX) connected to the UWB tag.The gateway performs the three core functions of the MAC protocol.
Discovery and Onboarding: New tags introduced to a system transmit beacon packets to announce their presence.Subsequently, the gateway invites these new tags to join the network by assigning a specific transmit time slot to transmit the UWB localization packets.The number and duration of a transmit slot are determined by the maximum number of tags and their localization rate.Currently, we support 1000 slot with a 1 ms slot width.Fig. 5(c) illustrates a block diagram of operation.Global Time sync: Each tag must have a consistent notion of time slots, which requires a global time synchronization within the accuracy of at least half the slot width.Previous works [66] have showcased s accuracy in synchronization clocks, and we leverage these works to provide time synchronization.Specifically, the gateway transmits time-sync packets every 100 s, the time it takes for the 5 ppm clocks to drift by 500 s, half the slot width for each tag.The LoRa RX receives these sync packets and corrects for its clock drift.Correcting erroneous tags: Finally, as a precautionary measure, we develop a correction mechanism to re-slot colliding tags.There may be a time-sync failure at tags, resulting in transmission at an incorrect time slot, leading to consistent collisions among groups of tags.By tracking the tags which suffer consistent collisions, the gateway broadcasts a correction packet over LoRa to re-slot the erroneous tag.

Implementation
We have seen XRLoc consists of three core componentsthe localization module, the LoRa MAC handler, and the UWB+LoRa tag.This section will take a closer look at prototyping these components.Localization Module: XRLoc's primary contribution is a single-vantage point localization module using off-the-shelf components with a size of 1 m.This small size allows the localization module to be deployed within common electronics like TV's our soundbars.Fig. 5 shows the implemented prototype.The prototype is built with 6 UWB receivers EVB1000 [63], with table 5(e) detailing the configuration parameters.We synchronize the UWB modules to a common clock (OCXO [18]) via a clock distributor module [78] as shown by the 'blue' path in Fig/ 5(a).Additional to the clock modification discussed in Sec.4.1, we expose the EVB1000's 'SYNC' pin to reset the time on the UWB modules to reduce bias in TDoA measurements.This sync is handled by an Arduino Due and is indicated in the 'red' path, with additional details provided in [64].When each EVB1000 receives a single "blink" signal for localization from the UWB Tag, the receiver reports the first-peak-index (FPI) of the direct path in the channel impulse response's peak, the signal phase at this point, time of arrival (RXTIME), and a carrier phase correction (RCPHASE) via the data path (shown in black).LoRa MAC gateway: The LoRa gateway is the central controller to initialize, discover, and onboard all the tags in the environment.It is prototyped with a LoRa SX1272 [72] transmitter.This handler maintains the MAC state machine and performs all the functions described in Sec. 5. Tags: We prototype the tag (shown in Fig. 5(d)) using the EVB1000 [63] and program it with the parameters in Table 5(e).The tag transmits 'blink' packets at 60 Hz, with each transmitted frame having 14 bytes of payload, including packet number and MAC address, to facilitate and test the MAC protocol.Operating in parallel, we have the LoRa SX1272 receiving time-sync packets from the Gateway module maintaining the UWB transmit slots and providing medium access control.An interrupt pin is raised by LoRa RX (shown in blue in Fig. 5(d)) to initiate a UWB 'blink' transmission at the accurate time slot.

Evaluation
XRLoc takes strides in achieving a few cm-scale localization in static and dynamic conditions.We rigorously test the system over eight different moving datasets and at multiple static points in various environments, including line-of-sight (LOS at Env-1 and 2) and non-line-of-sight (NLOS at Env-3) conditions as shown in Fig. 6.To make the NLOS condition in Env-3, a wooden board 2.5 cm thick was placed 30 cm forward from the XRLoc anchor.Additionally, we re-implement state-of-art AoA-based UWB localization system ULoc [101] based on their open-source documentation.We place 3 anchors in a diverse scenario, as a triangle in this space, and a constrained linear scenario, in a 1 m straight line.We test ULoc with the same static and dynamic positions.

Static Localization Accuracy
One of the key use cases targeted in XRLoc is to provide accurate locations of real-world objects and place them in the virtual realm.These objects of interest could be tagged with inexpensive and long-lasting UWB tags, which will relay their location to the VR system.To simulate this use case, we place multiple tags in the environment with the simple goal of recreating a life-size chess game.In this static scenario, from Fig. 7(a), we observe a median and 90 th percentile error of 1.5 cm and 5.5 cm, respectively.We additionally observe XRLoc provides a 9.5× and 4.0× improvement at the median over using ULoc in a linear (AoA-L) and diverse (AoA-D) placement scenario which have (median, 90 th %) of (14.6 cm, 28.7 cm) and (6.1 cm, 13.7 cm), respectively.The evaluation of different ranges shows median errors of 6.8 cm and 15.2 cm at 4m and 5 m in the LOS condition, respectively, and 35.3 cm and 34.0 cm in the NLOS condition as shown in Fig. 7(b).

Moving Localization Accuracy
Continuing with the motivation of playing a life-size chess game, we characterize XRLoc's localization accuracy in dynamic scenarios.Fig. 8(a) and 8(b) showcase two characteristic movement patterns we tested.We tested 8 movements, as shown in the demo video 1 , and achieved median and 90 th errors of 2.4 cm and 5.3 cm, respectively, as shown in Fig. 8(c).We observe an 11× and 3.2× improvement at median over using ULoc in a linear (AoA-L) and diverse (AoA-D) placement scenario, which have (median, 90 th %) of (26.0, 43.3 cm) and (7.5 cm, 17.4 cm), respectively.
In Fig. 8(d), we show the time-series error of localization for the 'Fig.8(b)' movement scenario (Fig. 8(c)).We note that opting to use a particle filter over a brute force approach provides a localization latency of 1 ms, compared to exhaustive grid search's latency of 61.2 s on a 12 Core CPU as explained in Sec.3.3.However, because the particle filter performs a sparse sampling over the entire space, XRLoc may initialize the tag's location incorrectly.This is visible in the inset shown in Fig. 8(d).But, throughout 5 received packets, we can see the location converges to the true location, and XRLoc subsequently provides accurate location predictions.Figure 9: MAC protocol performance: (a) Packet success ratio across ten tags with (blue) and without (red) mac protocol.(b) packet success ratio over time for Tag 2 (best performing without MAC) and Tag 9 (worst performing without MAC).In all cases, MAC protocol provides a success rate of over 99.5%.

MAC Protocol Efficacy
In the previous sections, we have shown XRLoc can achieve a few-cm level localization from a single localization module, meeting the first two requirements (R1 and R2).To allow multiple tags to be localized with this accuracy, XRLoc leverages a LoRa side-channel to develop a power-efficient MAC protocol as described in Sec. 5. To evaluate its efficacy, we set up 10 tags to transmit at 100 Hz for a half-hour period.Fig. 9(a) showcases the packet success ratio, and we find over 99.5% of the packets are received by XRLoc's localization module.Alternatively, when we do not have a MAC protocol, we have an average success rate of 76%, ranging between 56% − 87%.Specifically, considering the best and worst tag, we plot the packet arrival rate in Fig. 9(b) over the 30 min period and observe there are large periods when packets from Tag 09 are not received, likely due to collision from either Tag 02 or any of the other tags in the environment.Alternatively, we see a consistent packet arrival rate using a MAC protocol.Clearly, a MAC protocol is necessary to achieve multi-tag tracking and localization at high rates and fulfill R3.

Justifying design choices
The evaluations from the previous sections prove XRLoc's ability to fulfill the stringent requirements set for Sec. 1.In the following section, we will answer key questions about the design choices made when developing XRLoc.TDoA and PDoA are both needed?: As we have discussed, a system relying purely on time-based measurements will not meet the stringent requirements of few-cm localization accuracy.We further evaluate this on our datasets in Fig. 10(a).We see a median localization accuracy of 2.4 cm, deviating over an order of magnitude from our few-cm level accuracy requirement.This re-iterates the challenge of achieving single-vantage point localization.
However, we claimed in Sec.3.2 TDoA measurements play an important role in ruling out ambiguous initialization caused by PDoA-only localization.To confirm this, we see in the same figure when PDoA is solely used for localization, and we have a median accuracy of 49.1 cm.Clearly, ambiguities from phase wrap-around can be detrimental to XRLoc's performance, emphasizing TDoA's role.Through this micro-benchmark, it is apparent TDoA and PDoA work hand-in-hand to provide few-cm location accuracy.How does the aperture effect the localization?:In Sec.3.1, we discussed the importance of the antenna aperture in bringing resilience to phase measurement error.Consequently, a wider distance between the first and last antenna helps to improve localization accuracy.To ensure easy integration within everyday consumer electronics (like TVs or soundbars), we restrict XRLoc's size to less than 1 m wide.However, how important is antenna aperture to our localization performance?For this, we reduce the maximum antenna aperture to 80, 60, and 40 cm and report the results in Fig. 10(b).Clearly, a reduction in the aperture size affects the localization accuracy, with median localization accuracy reducing to 9.6, 19.3, and 35.0 cm, respectively.In fact, we see a steep drop-off in accuracy when we have an aperture of 40 cm.Furthermore, we see that a minimum aperture of 1 m is required to achieve the required localization accuracy.Under space constraints, smaller apertures may be used at the cost of lower accuracy.How many antennas are needed?: Clearly, a minimum aperture of 1 m is needed.However, within this aperture, how many antennas are needed to meet the localization requirements?This is an important question to consider to make XRLoc cost-effective.In the previous localization accuracy analysis, we consider an array with 6 antennas.In Fig. 10(c), we reduce the number of antennas placed within the 1 m aperture.For 6, 5, and 4 antennas, we see the median location accuracy of 4.7, 6.9, and 28.7 cm, respectively.As few as 4 antennas are enough to meet the required few-cm localization accuracy at the median.Although, we observe a sharp reduction in localization accuracy in the 90 th percentile.More antennas provide a better averaging effect and reduce erroneous TDoA and PDoA measurements, hence improving the localization performance at higher percentiles.From these experiments, we empirically observe choosing at least 6 antennas meets the required few-cm level accuracy required for XR applications.
Are there better antenna spacing we can choose?:So far, we have considered placing our antennas in a uniform linear array (ULA), separated by 20 cm.However, many works [82,87] showcase antenna patterns that are more optimal than a ULA.To investigate the improvements from these co-prime antenna arrays, we leverage our simulator from Sec. 2 to carry out extensive simulations and showcase the results in Fig. 10(d).We see slight degradation of error when using co-prime arrays.However, co-prime arrays can be levered to reduce the number of antennas required by XRLoc to achieve similar location accuracy.
Why do we need fine-grained bias compensation?: Finally, we evaluate the system-level measurements.In XRLoc, we choose the appropriate clock sources to achieve the required accuracy in both TDoA and PDoA measurements (Sec.4.1) and additionally calibrate for TDoA and PDoA hardware biases via a 3-point calibration scheme (Sec.4.2).In Fig. 11(a), we showcase the importance of this bias calibration, observing median localization accuracy degrade by 1.8× to a median accuracy of 2.4 cm without applying

Related Works
Providing indoor location information for people and various in-animate objects is a well-studied problem.This section will broadly cover various techniques leveraged to address this problem.We will find that none of the existing techniques meets the stringent requirements we set up earlier in Sec. 1. Recall that we seek to provide easy-to-deploy (R1), few-cm accurate localization (R2) in dynamic scenarios for multiple people or objects of interest (R3).A few key technologies which can be considered are: Visual sensing: Under this broad umbrella, we have many distinct technologies.Existing VR systems utilize external IR-based sensors [11] or specialized cameras [85] to furnish accurate ground truth locations.There are also works that deploy a single Lidar [36] for person tracking or utilize headset-mounted cameras [58].However, these systems are sensitive to visual occlusions, hindering a user experience.Recent works [51,69,102] which leverage machine learning to track objects despite occlusions.Alternatively, other studies [34,47,92,94,97] seek to deploy multiple cameras, let tag equips with a camera, or utilize special light sources to be robust to occlusions.However, no studies have simultaneously solved all the problems of ease of anchor deployment (R1), accuracy (R2), and the risk of security and privacy [86].Moving away from deploying privacy-invasive cameras, other works [57] seek to use the cameras on-board VR setups fused with occlusion-resilient radio-frequency  [41,96] have looked at furnishing human pose with these radars from a single radar.Recent work [60] has shown that the human body can act as a strong blockage at these frequencies.These blockages can hinder tracking multiple people and objects in an environment and affect user experience.Additionally, tracking and identifying smaller assets in an environment can be challenging as radar reflections depend on an object's radar cross-sectional area (RCS).Alternatively, many works [76] propose placing retro-reflective tags on objects with small RCS to guarantee their detection; however, these systems suffer from poor localization accuracy.
RF-based sensing: The robustness of sub-6 GHz RF-signals to occlusions [74] and low privacy risk makes it a promising technology to consider.The common mode of operation is for multiple RF radios to jointly localize an active RF transmitter or a passive RF reflector (tags).Many works have looked at leveraging WiFi [42,59,83,95], LoRa [38], or BLE [7] to achieve robust user localization.However, these systems fail to provide the required localization accuracy due to bandwidth limitations.RFID has a strong asymmetry in the reader-tag relationship, and the transmitter and receiver share the same clock, which allows for highly accurate phase acquisition.According to [52,73,99], RFID systems do not have carrier and sampling frequency offset and enjoy a phase measurement accuracy of 0.085 • [99], 15× better than the UWB, which provides an accuracy of 1.4 • .Using the highly accurate phase, [39,52,53,87,99] has succeeded with tracking or localization at the few cm levels.However, due to the asymmetric nature, RFID readers whose range is several meters are not suitable for embedding into consumer electronics (R1) because of their power-hungriness and expensiveness (ex.ImpinjJ Speedway R420 costs $1666).The main target of RFID is industrial or retail store settings where thousands of tags must be deployed inexpensively, and readers' one-time cost is justifiable.For instance, [73] looks at item ordering in manufacturing lines, retail stores, or libraries.[52,99] examine industrial robotics or baggage handling tasks.
Unlike RFID, Ultra-wideband provides a more symmetric architecture where localization modules can cost $10 − 100.Consequently, we have seen their increased adoption in smartphones and smart tags.It provides over 500 MHz of bandwidth and a time resolution of 1 ns, providing localization accuracy to a few tens of centimeters.Many current UWB-localization schemes leverage the accurate timeresolution for Two-Way Ranging (TWR) [3,10,24,29,40,44,62,103] and localize objects via trilateration.However, these multiple-packet exchanges increase localization latency and prevent real-time tracking of multiple objects of interest (R3).Many works instead leverage the TDoA or PDoA of the UWB signal to multiple time-synchronized anchors [13,14,30,80,84], or AoA measurements [22,37,101] at multiple anchors to furnish locations using a single packet.Some works [98] employ alternative transmission schemes to TWR to reduce the packet overhead.However, these systems only meet the necessary localization accuracy when the UWB anchors are placed in diverse locations, increasing deployment efforts and deviating from R1.
As discussed in Sec. 2, few-cm accurate localization is challenging due to geometric dilution of precision.To circumvent this problem, three common techniques are leveraged.First, by leveraging reflected paths in the environment, many systems [15,31,43,48,55,75,100] create additional "virtual" radios in the environment.These "virtual" radios provide the needed spatial diversity to localize an object of interest.However, multipath is often unreliable [6] in many environments and can lead to localization failure and poor user experience.Second, many works [25,67,88,89] look at fusing TWR, TDOA, and AoA information to provide single anchor localization solutions.However, some systems cannot furnish the few-cm accurate localization requirement or rely on TWR measurements, increasing the system's latency.Finally, some works develop switched beam antennas [27,32], which selectively sense signals approaching the anchor from different directions.However, these systems lack the required angular resolution to provide localization accuracy of a few cm. 9 Discussion and Future work XRLoc overcomes the fundamental challenges arising from geometric dilution of precision to deliver cm-level accurate localization by developing an easy-to-deploy and low-latency localization module.Through this development, we are one step closer to achieving immersive XR experiences.However, a few limitations and possibilities of future work can be explored to build upon XRLoc.Extensions to 3D: XRLoc focuses on localizing people and assets on a 2D floor plane, which is required in various XR applications.However, these ideas can be extended to the 3D domain by incorporating a vertical array of antennas in conjunction with the current horizontal linear array.These 3Dcompliant antenna arrays can be retrofitted with television screens or paintings to allow cm-accurate 3D localization.
Improving power efficiency of XRLoc's localization module: Various works [9] have noted the 10× higher power consumption of UWB reception than transmission.Keeping this in mind, we designed a system that requires only a single transmission from the tag for localization to ensure long battery life.However, the 6 receivers on XRLoc's wallpowered localization module are power inefficient.To rectify this, antenna switching schemes [33] can be employed, or multiple antennas can be combined to connect to a single receiver [13] to reduce the number of receivers.However, unlike XRLoc's system, these alternatives will not be FiRa compliant [16].
Miniaturized tag design: We prototype our tag from offthe-shelf EVB1000 [63] and LoRa [72] evaluation boards.Future work can look towards miniaturizing these tag designs.Since these radios we employ are centered at 3.4 GHz and 930 MHz, it allows us to place these radio modules in close proximity with limited RF interference.

Figure 1 :
Figure 1: XRLoc enables users to play a life-size chess game with everyday objects.XRLoc localizes mugs retrofitted with off-the-shelf UWB tags from a single vantage point with a few cm of location accuracy, which are then translated to chess pieces in the virtual world.

Figure 2 :
Figure 2: (a) Spatially-diverse placement of UWB anchors (red diamonds) near the walls provides median accuracy with TWR of 2.9 cm (b) when receivers are constrained near the bottom wall, median accuracy degrades by 8× when using TWR (c) fusion of TDoA, TWR, and AoA does not help in these scenarios either, providing median accuracy of 23.3 cm.(d) XRLoc solves the challenges associated with dilution of precision, achieving median accuracy of 3.3 cm (e) Summary of errors when leveraging various UWB measurements and XRLoc.

Figure 3 :
Figure 3: Log-likelihood heat map of PDoA and TDoA when changing the number of antennas  .

Figure 4 :
Figure 4: (a) Localization error vs. PDoA error standard deviation, with TDoA error standard deviations as each line in the legend.For few-cm level localization, the threshold, per the red line, is   = 5 • and   = 150 ps.(b) Phase measurements (green) deviate from ideal (black) measurements.Performing appropriate calibration fixes these deviations (red).

Figure 5 :
Figure 5: Implementation: (a) Block diagram showcasing interconnections between the 6 UWB receivers [63], the clock synchronization scheme (blue), "SYNC" implementation (red), and data back-haul via USB (black).(b) real-world implementation of block diagram; inset: external modification to UWB receiver.(c) block diagram for Tag showcasing the UWB and LoRa radios, the interrupt line (blue) to schedule UWB transmission and LoRa clock-sync broadcasts (dotted green) (d) real-world implementation of Tag.(e) UWB/LoRa radio parameters.

Figure 8 :
Figure 8: Dynamic testing scenario: (a, b) Scatter plot of XRLoc's predictions (EST), ground truth (GND), and antenna locations (ANT).More examples can be found in demo video 1 (c) CDF of errors compared to AoA-based localization when three anchors [101] are placed diversely around the room (AoA-D) and constrained to a 1 m single line (AoA-L).(d) time-series errors of movement in (b), with the inset showcasing particle filter convergence within five packets.

Figure 10 :
Figure 10: Microbenchmarks: (a) Using TDoA or PDoA only as opposed to a fusion (XRLoc).(b) Reducing aperture from 1 m (XRLoc).(c) Reducing the number of antennas while keeping the aperture at 1 m.(d) Leveraging co-prime antenna array as opposed to uniform linear array (ULA).

Figure 11 :
Figure 11: (a) Localization error with and without bias calibration; (b) measured PDoA and TDoA errors.appropriate bias calibration.In Fig. 11(b), we also observe an average TDoA error of 180.7 ps and PDoA error of 8.2 • .

Table 1 :
3. Measurement bias-aware localization: However, as we push the envelope on cm-accurate location predictions, Existing technologies do not satisfy the 3 key requirements for an XR localization/tracking system.
Mm-wave radars near the 60 GHz and 77 GHz bands have gained recent interest.Many works