MI-Poser

Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications.

Fig. 1.MI-Poser takes the Visual-Inertial Odometry (VIO) tracking data from AR glasses and two EMF sensors on the wrists as input and generates 3D body shapes through a machine learning model for IK.To tackle the well-known magnetic metal interference issue, we propose Metal Interference Mitigation (MIM) which actively detects and corrects metal interference with EMF-IMU sensor fusion.As a result, the output body movements become steady and have more fidelity.
Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language.Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors.The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering.We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference.Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants.It remains robust against various upper-body movements and operates efficiently at 60 Hz.Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently

INTRODUCTION
Human motion tracking is integral to augmented reality (AR) and virtual reality (VR).Existing VR devices [20, 35,54] primarily use cameras in head-mounted displays to track head pose and hand-held controllers for spatial input.However, cameras are power-intensive and impractical for AR glasses.Moreover, their limited field of view can lose track of controllers or hands, constraining user interaction.To achieve occlusion-robust, hands-free tracking, researchers have explored wearable-IMU-based solutions [21,[68][69][70].However, these approaches typically require more sensors attached to the body (e.g., waist to track torso movement), and the error can accumulate over time as the IMU sensor does not provide 3D position information directly.As a result, the solutions often suffer from imprecise tracking results.
In this paper, we introduce MI-Poser, an upper-body pose-tracking system utilizing magnetic tracking in wristbands and AR glasses, as depicted in Figure 1.MI-Poser incorporates an electromagnetic field (EMF) source in AR glasses and two wrist-worn EMF sensors, enabling 6-DoF wrist tracking relative to the head.Combined with AR glasses' Visual-Inertial Odometry (VIO) tracking, MI-Poser can track 6-DoF poses for head and wrists.We trained deep neural networks for human pose inverse kinematics (IK) on a large dataset (AMASS [34]) to reconstruct upper-body pose from sparse signals.Importantly, EMF sensing often suffers from interference from nearby metallic objects.Thus, we propose metal interference mitigation (MIM) to enhance the input data of pose reconstruction by utilizing a collocated IMU sensor.MIM detects metal interference on EMF sensors and actively corrects the measured values.MI-Poser's IK and MIM incur little latency and the pipeline runs efficiently at 60 Hz.
We first evaluated MI-Poser's feasibility in body pose tracking by comparing its output with Microsoft Kinect [36] tracking.The study involved ten participants performing various upper-body movements, including out-of-sight hand motions.Results indicated that MI-Poser tracks upper-body pose with a mean joint position error of 6.6 cm.Additionally, we found that MI-Poser outperforms prior IMU-based work with the same sensor placement, thanks to the precise 6-DoF information EMF tracking provides.
Next, we assessed our approach to addressing the inherent metal interference issue by collecting a dataset of synchronized EMF sensor data, IMU sensor data, and ground truth pose data.Data were collected under three conditions with varying metal profiles: open-space, standard, and extreme cases.We first quantified errors between EMF and ground truth pose in each condition, finding significant tracking errors exist only briefly when the EMF sensor passes by a metal object.Our EMF-IMU fusion approach successfully detects such short-period metal interference (0.62 MCC in standard and 0.56 in extreme cases).Moreover, our correction approach reduces tracking error under interference by 11.6°max rotation error per session and 3.3 cm max position error per session in standard cases.

RELATED WORK
To situate our work, we first review existing wearable systems for body pose tracking and discuss the need to address the limitations of current vision-and IMU-based systems.Then, we examine existing research employing magnetic tracking to provide a background for our EMF-based pose tracking system.

Wearable Systems for Body Pose Tracking
Researchers have explored wearable body pose estimation systems as a means to achieve portable and flexible interactions.Prior work has involved attaching an array of sensors on the body [51,60] or using exoskeletons [71].Recent developments in machine learning approaches have enabled systems with lower costs using sparse sensors, reducing the burden of wearing numerous sensors on the body.Vision-based approaches are the most popular in this context [2-4, 22, 39, 50, 58, 63, 66].For example, Ahuja et al. [4] attached additional cameras to the Meta Quest 2 VR controllers to create an inside-out body capture system.Magic Leap 2 [33] employs a similar technique.Some researchers opted for wrist-worn vision sensors instead of VR controllers, using a spherical camera [8] or an array of small cameras [30].Furthermore, several IK models have been developed to achieve high-fidelity body reconstruction from sparse sensor inputs.These models use the poses of the head and two hands as inputs to estimate the full body [5,15,22].For instance, Jiang et al. [22] proposed a full-body pose tracking system using HTC VIVE (note it necessitates an additional base station in the environment).To train and evaluate these IK models, researchers utilized the extensive human motion database AMASS [34], which comprises a collection of high-precision MoCap datasets.Although these vision-based systems minimize error within the dataset, they unavoidably face challenges such as heavy computation (e.g., model inference) and sensing costs1 , which can be critical in resource-constrained devices like AR glasses.Moreover, they depend on line of sight; therefore, trained IK models may not function correctly if a user's hand moves out of the camera's view, limiting the possible tracking range of human body movement.Unlike existing VR headsets (e.g., Meta Quest Pro [35]) that use cameras for tracking hand-held controllers in 3D space, AR glasses lack sufficient space to accommodate multiple cameras, resulting in a limited field of view.
As alternatives, researchers have investigated IMU-based pose tracking [21,37,49,56,61,64,[68][69][70].Shen et al. [49] proposed a method for reconstructing an arm movement from a single smartwatch by using its embedded IMU sensors.Similarly, Tautges et al. [56] proposed an approach to reconstructing full-body animation from four acceleration sensors attached to wrists and ankles.In the context of IMU-based pose tracking, recent works leverage more sophisticated machine/deep learning models using the AMASS dataset [34].For example, Sparse Inertial Poser [61] allows 3D human pose estimation using six IMU sensors attached to wrists, lower legs, back, and head.Deep Inertial Poser [21] improves the approach by incorporating temporal pose priors through deep learning.TransPose [70], LoBSTr [68], and Physical Inertial Poser [69] build upon those works and further advance the performance of IMU-based body pose tracking.However, these approaches require not a small number of sensor-instrumented joints (typically six) and exhibit a certain degree of errors.
Considering previous research, there is a demand for hands-free, occlusion-robust pose tracking systems with minimal errors, particularly in AR contexts.As a result, we developed a body pose tracking system utilizing a practically sparse sensor input (AR glasses and two wrist-worn sensors) based on a different sensing modality: EMF sensing.

Magnetic Tracking for Interaction
Magnetic field sensing-based tracking has a long history [46,47] with a comprehensive review available in [44].Several HCI applications have been proposed [9,11,18,43,52], leveraging occlusion-free tracking.For example, Abracadabra [18] tracks finger radial position relative to a watch using an attached magnet, and Nenya [9] measures magnetic field changes with a magnetometer-equipped smartwatch and a ring with two permanent magnets.Chen et al. [11] proposed uTrack, which tracks finger movements using a pair of magnetometers on the back of the fingers and a permanent magnet to the back of the thumb.While precise in short-range tracking like centimeters, these approaches can not be extended to long-range (e.g., body-scale) since the Earth's geomagnetic field easily influences the tracking.Razer Hydra [52] extends the sensing range of the controller by using a base station that generates a weak magnetic field.
Electromagnetic field (EMF) tracking involves oscillating magnetic fields [25,29] and has gained attention for its precision in medium-range 6-DoF tracking.For a detailed review, see [17].Several HCI applications have been proposed using EMF tracking, such as Finexus [12], which advances uTrack [11] by tracking multiple fingertips in real time.AuraRing [42] offers 5-DoF finger tracking for VR/AR applications with low power consumption (i.e., around 2.3mW for a sensor in a ring and 73.3mW for a transmitter in a wristband).Whitmire et al. [62] extended the tracking range to body scale, enabling VR controller tracking by embedding three coils in HMD and a set of orthogonal receivers in hand-held devices with reasonable power consumption (i.e., around 45mW for a sensor in a controller and 224mW for a transmitter in HMD without wireless communication module).Similar work to ours is EM-Pose [23], a wearable EMF-based body tracking system that uses 6 or 12 on-body sensors, which requires users to wear an additional EMF source on their back.
A known drawback of EMF tracking is its susceptibility to magnetic field distortion by environmental metals [42,62], particularly in dynamic environments like VR/AR.Interference becomes more significant as the tracking range increase es, such as body-scale [62].Some work has attempted offline calibration to account for magnetic interference [26,27], but online calibration is desirable.Although previous work [23,62] recognized the issue, no work has been proposed to date to quantify the interference effect in different user environments and to devise solutions to it.Therefore, we propose approaches to addressing the metal interference issue in our EMF-based body pose tracking system.

PROPOSED METHOD
We design a hands-free, wearable upper-body pose tracking system with an extensive tracking range, MI-Poser.The system pipeline is illustrated in Figure 2. Unlike existing VR tracking systems that employ cameras and handheld controllers, MI-Poser utilizes wrist-worn EMF sensors for tracking.As depicted in Figure 1, users wear an EMF receiver on each wrist while an EMF source is mounted to the AR glasses.The EMF tracking captures  the wrist poses relative to the EMF source.Simultaneously, the AR glasses track the user's head position in the world coordinate using Visual-Inertial Odometry [16].These sparse measurements are input into our IK model to reconstruct a high-fidelity upper-body pose.We prioritized upper-body pose tracking due to its applicability to various AR applications.While our sensor setup could estimate full-body pose in a data-driven manner for specific motions through hallucination, like walking as shown in [37], we focus on the fidelity of the reconstructed pose in general conditions.Metal interference is inevitable when using EMF sensors for body tracking in dynamic user environments.Therefore, we propose Metal Interference Mitigation (MIM) methods that operate online with minimal latency.While previous work [23,62] acknowledged the issue, no concrete solutions have been proposed, as discussed in Section 2.2.Our solution incorporates an IMU sensor embedded with the EMF receiver.The fusion of EMF and IMU sensors has been studied to enhance EMF tracking performance in ideal, metal-free environments using a static filter like Kalman Filter [47].However, in practical situations, the tracking algorithm should actively detect interference presence in real time and dynamically correct the trajectory.

MIM Overivew
Initially, we examined the behavior of our EMF tracking under metal interference.Within a 1.5 m range from the EMF source, metal effects are seldom present in open-space environments (e.g., outside), leading to accurate tracking performance.However, in typical spaces with some metal objects (e.g., a desk with a laptop), interference emerges when the EMF receiver approaches a metal object, confirming previous observations [62].Even over a short period, interference significantly impacts the position and rotation of the EMF sensor, potentially degrading IK performance.We identified two types of metal interference based on how the sensor moves around metal objects.On one hand, when the sensor passes by a static metal object, a spike-like short-period error occurs in the tracking.On the other hand, when a metal object and the EMF receiver move together, an error persists as long as they remain in close range.Both cases can occur in end-user scenarios, such as users moving their arm near metal objects or holding a metal can while interacting with AR content.
To address this, we divided the problem into two parts: interference detection and interference correction.The overview of our MIM approach is presented in Figure 3.The detection part aims to identify moments when tracking errors arise due to metal interference, while the correction part seeks to mitigate errors by adjusting the interfered EMF sensor values.In this paper, we focus on correcting the first type of interference, the short-period error that occurs when metal objects are placed statically in an environment and the sensor encounters them occasionally (e.g., users swinging their arms).This is due to the challenge of tracking pose over a long period (more than a few seconds) using EMF and IMU sensors under metal interference.However, our detection method can also address the second type of interference, informing users of the reasons for degraded body tracking performance and improving user experience [6].For instance, if a user holds a smartphone that causes interference to the tracking of the corresponding wrist, MI-Poser can notify the user via AR glasses that the tracking performance is low because the metal object is close to the hand.

Interference Detection
The first part is interference detection.There are two values regarding the rotation of the sensor based on different principles: angular momentum from the gyro sensor and orientation from the EMF sensor.Let's assume a rotation We can then introduce an error threshold  ΔΦ ℎ to estimate the interference state  ( + Δ) by comparing  ΔΦ ℎ with the distance between Φ   ( + Δ) and Φ   () + ΔΦ    () × Δ.The distance is the intrinsic geodesic distance between two angles.If the distance is larger than the threshold, the system predicts Î ( + Δ) = 1.Otherwise, it predicts Î ( + Δ) = 0.

Interference Correction
The second part is interference correction, which dynamically adjusts the measured value from the EMF sensor based on the detection result.It is essential to perform this correction online, as MI-Poser aims to be a real-time body pose tracking system.This means that if Î () = 1, we need to correct the current position    () and rotation Φ   () using past tracking and sensor data up to time .If interference persists in Î ( + Δ) = 1, we must correct them using the past data up to  + Δ.As noted, we apply this correction as long as the detected interference is of short duration to avoid drift error.
For the rotation correction, we use Given that IMU-based rotation tracking is fairly feasible, we expected this simple solution to work well.Meanwhile, IMU-based position tracking is known to be a challenging problem, and we have prepared three methods.

IMU Odometry
Model.This physics-based method uses initial velocity and a time series of acceleration from the IMU sensor to calculate position through dual integration.
, where  (),  (), and () represent position, velocity, and acceleration at time , and  0 represents the initial reference time.While straightforward, this method performs poorly when there is noise in () and  ( 0 ) [57].
Since we use this approach around the interference moments, the EMF position tracking can contain noises during the moments, resulting in noisy  ( 0 ).Another limitation is that successful short-time arm tracking based on an IMU sensor [32] requires a high sampling frequency like 2000 Hz, indicating that lower frequency leads to larger errors.

Trajectory Forecasting
Model.The IMU odometry method may not account for trajectory trends and seasonality, which are often used in practical time-series forecasting methods [14].Human body movements, particularly arm movements, include short-duration trends and can be forecasted based on past trajectories [19,55,65].We anticipated that a short-period future trajectory could be forecasted using previous tracking history, which could then be used to correct EMF position data under metal interference.We adopted the N-BEATS method [41], a state-of-the-art deep learning approach using backward and forward residual links and a deep stack of fully-connected layers.The model's input and output can be written as: , where Δ  and Δ  correspond to the amount of future data the model outputs and the amount of previous data the model takes as inputs, respectively.
In testing this model, we found that significant prediction errors tended to occur when there were large position changes right after the moment the model predicted.This can be understood as the time-series forecasting model estimating the trajectory based on past data but not reflecting future acceleration information.This observation led us to introduce the following model.

Fusion Model.
To consider future acceleration while avoiding error due to noisy  ( 0 ), we approximate the trajectory as follows: , where Δ is small enough (at least, Δ < Δ  ).We iteratively use the same N-BEATS model.In detail, while the N-BEATS model outputs estimation for Δ  seconds, we use the single prediction value corresponding to time  0 + Δ.After adding the acceleration component through integration, we use this value as the input for the next N-BEATS inference for the next frame ( 0 + 2Δ) if metal interference still exists (i.e., Î ( 0 + 2Δ) = 1).In this way, we can adjust the N-BEATS prediction by adding the acceleration component, which further influences the subsequent trajectory forecasting.

IMPLEMENTATION AND SYSTEM PERFORMANCE 4.1 Hardware
Our EMF tracking system has an EMF transmitter (source) and two EMF receivers (sensors) using off-the-shelf 3D coils (See Figure 4).These components communicate with AR glasses through Bluetooth Low Energy (BLE) using the Enhanced ShockBurst protocol [40] to minimize latency.Designing an EMF tracking system for human body tracking involves multiple trade-offs, such as source coil size, electric current, sensor coil size, and tracking range/error requirements.Our final sensor configuration meets our 1.5 meter range requirement (typical arm reach) with a position RMS error of 0.9 mm and angle RMS error of 0.5°within a 1 meter range in an ideal metal interference-free lab environment.Additionally, an IMU sensor is integrated into the sensor, which we leverage for a sensor-fusion approach to MIM.The EMF and IMU data streams are synchronized and accessible via a BLE connection.We run the tracking at 120 fps with a latency of around 15 ms, and the tracking algorithm runs locally on the sensor.The algorithm incorporates Kalman Filter to stabilize long-range tracking and reduce jitter.We use Spectacles [53], commercial AR glasses, featuring a precise VIO algorithm running at 60 Hz, which is accessible via a BLE connection.

IK Model for Pose Estimation from Sparse Sensor Inputs
We used the SMPL model [31] to represent and animate the human body pose.We trained our IK model to reconstruct the upper body from a sparse sensor set using the AMASS dataset [34], similarly to prior work discussed in Section 2.1.For the IK model, we adopted the state-of-the-art model from AvatarPoser [22].The key difference is that our proposed system has EMF sensors on the wrists, while AvatarPoser assumes hand-held controllers.We expected that this different sensor placement would improve upper-body tracking by avoiding rotation noises from hand movements and helps the model infer more plausible arm poses.
In AMASS, we used a subset combination of the CMU, Eyes_Japan, KIT, MPI_HDM05, and TotalCapture datasets as the training set, and MPI_Limits as the validation set.We down-sampled the MoCap dataset from 120 Hz to 60 Hz and generated windowed segments of 40 frames (i.e., 2/3 second window) with a stride length of 0.1 seconds to match with the original work [22].We used the Adam optimizer [28] with a batch size of 32, and a starting learning rate was 0.001, which decays by a factor of 0.8 every 20 epochs.We performed the training with PyTorch on Google Cloud Platforms with NVIDIA Tesla V100 GPU.
To account for variations in body size and sensor-wearing positions, we calibrated the sensor outputs before inputting them into the IK model.For sensor-position calibration, the user simply maintained the default T-pose (Figure 13 in Appendix) for a few seconds.We used sensor measurements taken during this period for calibration, similar to prior work [21,70].We first estimated a scaling factor by comparing the arm span between actual sensor measurements and the SMPL model definition.To compensate for minor sensor offsets, we applied spatial transformations to the sensor output, ensuring alignment with the SMPL model definitions in the default pose.We applied the scaling factor and transformations to each frame throughout the entire body pose tracking session.

Trajectory Forecasting Model for MIM
To train the N-BEATS model for MIM (interference correction), we first collected data by moving the EMF sensor freely in open space without metal objects.Approximately 40 minutes of data were used for training the N-BEATS model with two stacks where there are two blocks per stack with 512 hidden layer units.The model takes 120 samples (corresponding to 1 second of Δ  ) as input and outputs 60 samples (corresponding to 0.5 seconds of Δ  ) position data (Recall the EMF tracking runs at 120 Hz).We used the Adam optimizer [28] with batch size 16 up to 20 epochs.Within the dataset, the best model performed a 1.26 cm mean absolute position error in the validation dataset.This means that the model can forecast a position of 0.5 seconds ahead with small errors.

Real-Time System Performance
Currently, the IK model and MIM process run on a laptop (MacBook Pro with a 2.6 GHz 6-Core CPU and 16 GB memory) written in Python while streaming data in real time.Future work will involve transferring the process to AR glasses using JavaScript.Spectacles are equipped with an Octa-core CPU (2 × 2.52 GHz + 6 × 1.7 GHz).Given the limited computational resources, it is crucial to consider power consumption and inference speed, and we report them in our current prototype in this section.

Power Consumption.
The power consumption at the source and the sensor is 1.4W and 0.68W, involving the communication modules, respectively.Further optimization, as demonstrated in [62], is advisable.For instance, replacing the currently used microcontroller (F7), which has more capabilities than necessary, with a lower power consumption alternative (H7) could reduce power consumption.Nonetheless, the sensor-level power consumption is significantly lower than existing camera-based research work.For example, ControllerPose [4] attaches a camera to each controller to capture upper-body movements, and a single camera's power consumption is approximately 3.3W, which does not involve the hand tracking algorithm.While we must consider the power required to run the model on the device, our prototype is suggested to operate with reasonable power consumption.

Inference
Speed.The IK model's current latency is 4.2 ms on the laptop, from captured EMF values to the output.Likewise, MIM's detection and correction models incur average latencies of 0.09 ms and 0.50 ms, respectively.These latencies do not significantly impact the body pose tracking pipeline, making MIM a suitable complement to the EMF-based upper-body pose tracking system.Together with the MI-Poser's IK model for reconstructing body pose, our pipeline takes approximately 5 ms to process one frame on a laptop.The current MI-Poser pipeline operates efficiently at 60 Hz (recall the EMF sensor runs at 120 Hz and the VIO tracking runs at 60 Hz).Notably, commercial on-device tracking speeds, such as Meta Quest 2 [35], are also 60 Hz.For further comparison, we ran ControllerPose [4] and IMUPoser [37] systems using the same laptop, with pipeline speeds of approximately 4 Hz and 48 Hz, respectively.Please refer to the Video Figure for a real-time demonstration.

USER STUDY 1: UPPER-BODY POSE TRACKING IN THE ABSENCE OF METAL INTERFERENCE
Since MI-Poser is the first setup for an upper-body tracking system with two wrist-worn EMF sensors and AR glasses, we first examined its tracking performance in an open space (without visible metal objects) and compared it with similar setups using IMU sensors.We trained an IK model using the existing AMASS dataset [34] and tested it with real sensor data.

Data Collection
We collected sensor data from AR glasses and EMF sensors to demonstrate system performance using the setup shown in Figure 1.Ground truth data were obtained using Microsoft Kinect [36], following prior work [4].To ensure the reliability of the ground truth data from Kinect, we filtered out unreliable inferred tracking frames in post-processing, based on the tracking state Kinect logged, which constituted approximately 10% of the data.
To inspect the fine-grained performance of MI-Poser across various upper-body movements, we designed an obstacle course-style setup inspired by previous works [3,21].We selected motions that encompassed a diverse range of upper-body movements: • Punch: the participant alternately punches with both arms in front of their body.
• Wave: the participant randomly raises their arms and waves in the air.
• Swing: the participant alternately swings their arms from side to side.
• Rotate: the participant rotates both arms against each other in front of their chest.
• Walk: the participant walks randomly with their arms swinging naturally.
• Basketball: the participant jumps and performs basketball shooting gestures over their head.• Tennis: the participant swings their arms from behind their body to the front, with torso rotation.
• Golf : the participant swings both arms together, with torso rotation.Several of these motions, such as swinging, walking, basketball, tennis, and golf, include moments when hands move outside the field of view of the cameras on AR glasses.These moments are often challenging to track in conventional camera-based AR systems.
We recruited 10 participants from our institution with diverse genders, ages, weights, and body shapes for data collection.Participants performed each motion for 50 seconds, with 10-second rest periods in between.The entire data collection process, including the initial calibration, took approximately 10 minutes per participant.We obtained approval from our institution to conduct the study.

Fine-Grained Error Metric.
The results across different joints and motions are presented in Figure 5.The overall error (a) without and (b) with sensor-position calibration is 10.4 cm and 6.6 cm, respectively, demonstrating a significant improvement due to the sensor-position calibration.As we align the root (hip) in calculating the error metric following prior work such as [21,37], the hip generally has the smallest error, and the error propagates to the end-effectors like wrists, leading to the largest error.Still, the overall error is reasonably small after the sensor-position calibration, including when hands are out of view from the AR glasses.
However, there are several performance limitations to consider.First, by examining the error by motions in Figure 5, larger errors arise from those involving fast and extensive torso movements, such as tennis.Next, the EMF tracking method has a minimum working distance of about 10 cm to prevent signal saturation.As a result, gestures involving close proximity to the source, like both hands being close in golf, can cause significant signal jitters.Lastly, while our sensor-position calibration accounts for body skeleton scale and minor sensor position shifts, it does not adjust for variations in body shape.This can lead to inaccurate hand-body contact, such as when the hand penetrates or hovers over the body mesh despite physical contact.
Additionally, we tested a scenario where a user wears a single EMF receiver on their wrist, assuming the sensor is embedded in a smartwatch.We trained a different model with this configuration using the AMASS dataset and evaluated its performance with our dataset, using data corresponding to one EMF receiver and AR glasses.The overall error without and with sensor-position calibration is 22.0 cm and 15.9 cm.The largest error comes from the hand without the EMF receiver, while similar performance is maintained for the hand with the receiver.Anecdotally, the error decreases in some movements (e.g., rotating, walking), which can be attributed to hallucination from the training dataset, as indicated by [37].Thus, while depending on applications, utilizing a pervasive configuration of on-body sensors could be valuable when users employ tracking in their everyday lives, such as in AR applications while walking.

Comparison to Prior Work.
Our system is the first of its kind to use two wrist-worn EMF sensors for upper-body pose tracking.To the best of our knowledge, no prior research has reported the performance of upper-body pose tracking using such a sparse set of real sensors (head and two wrists).Although most IMU-based approaches were tested using the AMASS dataset, several works also reported performance on real sensor data.However, these studies used different datasets and involved more sensors (including leg-worn IMUs) in reporting full-body performance, making direct comparisons difficult.Moreover, the current Spectacles' SDK 2 does not provide raw IMU signals, preventing us from obtaining the IMU signal for the head in our setup, which makes a fair comparison with IMU-based solutions using our dataset impossible.This remains a limitation of the current study.
As a remedy, we adopted IMUPoser's model [37] (two-layer bi-directional LSTM) as a baseline for the IMUbased upper-body pose tracking and tested its error on the DIP-IMU dataset and IMUPoser dataset Both datasets contain human body poses across different motions similar to ours, such as arm raise, arm swing, and walking, involving 10 participants.The DIP-IMU dataset [21] includes 17 IMUs (X-Sense sensors), while IMUPoser uses common wearables as sensors, such as smartwatches and earbuds.We trained their model with the AMASS dataset (the same subsets we used for MI-Poser) using the configuration of three IMU sensors corresponding to our setup (i.e., the head and two wrists) and evaluated its performance on the datasets.As a result, the joint position error for the upper body is 8.3 cm and 10.4 cm with sensor-position calibration for the DIP-IMU and IMUPoser datasets, respectively.Although not a direct comparison in terms of evaluating with different datasets, it is suggested that MI-Poser has a better performance compared to IMU-based systems, thanks to the high precision EMF/VIO tracking.

USER STUDY 2: MEASURING AND MITIGATING METAL INTERFERENCE
User Study 1 demonstrated that our EMF-based tracking setup achieves accurate upper-body pose tracking.In this section, we quantify the metal interference on EMF tracking in various environments and examine the effectiveness of MIM in enhancing the input data for our pose-tracking pipeline.

Data Collection
6.1.1Apparatus.We used the same sensor described in Section 4.1, which streams EMF tracking and IMU tracking data synchronously.To obtain ground truth tracking data, we used Apple ARKit 4 [7], which enables a state-of-the-art self-localization in world coordinates based on VIO tracking [24].We developed a custom iOS application using Xcode 13.3 that records the position, orientation, and UNIX timestamp.We attach the EMF sensor and iPhone 13 Pro, which can utilize built-in LiDAR for enhanced ARKit tracking performance, to a non-metal rigid body with a space of 20 cm between them.This decision was based on the observation by Whitmire et al. [62] that significant interference occurs if an EMF receiver and a smartphone are closer than approximately 5 cm.The apparatus used for data collection is shown in Figure 6 (a).When the apparatus is moved, the sensor streams its EMF pose tracking data along with the IMU data at 120 Hz relative to the EMF source, while the iPhone captures its pose at 60 Hz relative to the reference world coordinate.Since these two sensors are attached to a rigid body and their transformation is constant, we can align their coordinates using a calibration process, which we elaborate on in Appendix B. We did not use high-end MoCap systems such as OptiTrack [38] because we wanted to conduct data collection in multiple and actual environments.Our apparatus offers a convenient method for data collection.We tested whether ARKit could work properly when we moved the apparatus naturally to represent arm movements in VR/AR scenarios.ARKit reports its tracking state, and based on that, we observed that the tracking performance worsens if we rotate the apparatus too quickly.Consequently, we could not include such movements in the data collection.
6.1.2Condition.We selected three representative cases for data collection concerning the level of metal interference: open-space, standard, and extreme conditions.The open-space condition represents locations with minimal metal interference (See Figure 6 (b)).In contrast, the standard condition includes typical places like desks and rooms with a few metal objects (e.g., a desk with metal support, Figure 6 (c)).For the open-space condition, we chose two locations in a building where no visible metal objects were present within a 2-meter range, except for the floor 3 .For the standard condition, we selected three locations in the same building: a desk with a few everyday metal objects, such as a laptop and monitor, a meeting space with a metal door, and a crafting room with some metal objects like a hammer.In addition to these two conditions, we added an extreme condition, where we intentionally moved the apparatus closer to metal objects for an extended period, such as touching a laptop or holding a metal can (Figure 6 (d)).This condition was introduced to examine the extent of errors that could occur in potential end-user environments.
6.1.3Procedure.We collected four session data for each environment, resulting in 8 sessions for the open-space condition, 12 sessions for the standard condition, and 4 sessions for the extreme condition.Five people from our  institution were asked to the apparatus and move around freely within a roughly 1.5-meter range from the EMF source for approximately one minute, emulating natural body movements.We informed them that this data would be used for tracking body pose in VR/AR applications and asked them to move freely as if they were playing with VR/AR content, such as exercising or manipulating imaginary objects.Note that the EMF source is fixed to the environment; as detailed in Appendix A, we can compute the sensor values with respect to the source using the HMD's VIO tracking, so we considered the metal interference problem in the EMF coordinate.
Before each session, we initialized the ARKit world tracking coordinate and scanned the environment, preventing it from entering the "extending map" status during the recording to avoid degraded tracking performance.In this step, we also placed some non-metal objects in the environment if there were not many visually striking objects (e.g., white wall) to aid ARKit's VIO tracking.In every condition, we used the first 20 seconds for calibration; during this time, we were careful not to move the apparatus close to metal objects in the environment, regardless of the condition.Furthermore, the apparatus was placed statically within a range of 1 meter from the EMF transmitter for about 3 seconds within the same 20 seconds.We used the static part to calibrate the IMU sensor to obtain linear acceleration, while the entire 20-second data was used to calibrate the sensor and iPhone coordinates.The calibration was conducted as an offline process after collection.Each session lasted approximately 80 seconds, resulting in 1-minute data after excluding the calibration part.  1 summarizes the dataset we collected.The position and rotation error was calculated as the distance between the ground truth and the EMF position and rotation, respectively.This table shows that the open-space condition has the smallest error, while the extreme condition has the largest.Figure 7 shows sample error plots within a session in the three different conditions.From this plot, we can trend corresponding to the two types of errors we discussed in Section 3.1.There is almost no error in the open-space condition, supporting the use of MI-Poser in such environments.However, there are spike-like significant errors in the standard condition, indicating that MI-Poser's pose tracking can deteriorate for those short periods.On the other hand, significant errors persist for several seconds in the extreme condition corresponding to when users intentionally hold or get close to metal objects.This result confirmed the two types of interference we discussed in Section 3.1.
6.1.5Preliminary Analysis.Figure 7 shows a high correlation between the rotation and position error.As discussed in Section 3.2, we rely on the gyro error between the IMU and EMF sensor to detect metal interference.
To verify the validity of the approach, we analyzed the correlation between these values.Table 2 summarizes the frame-by-frame correlation across the three conditions.The position and rotation errors between ground truth and EMF are highly correlated, indicating that the interference causes both errors.This trend is more obvious when there is more metal interference, i.e., in the extreme condition.Moreover, the gyro error correlates with these two errors with high coefficients, suggesting that the gyro error can serve as an indicator to identify moments with interference, supporting our detection approach.

Detection Result
We first evaluated our method's efficacy in detecting interference.Although interference is inherently continuous, simplifying it into a binary state streamlines user notification (e.g., through an HMD) and subsequent trajectory correction, as detailed in Section 3.3.Consequently, we implemented a threshold-based approach to determine frame interference.As previously noted, position and rotation errors correlate and impact IK models.Thus, we used position error as the interference criterion in the following analyses, though similar results can be achieved with rotation error.We first regarded a frame at time  where the error    between the ground truth and EMF position is larger than a given threshold   ℎ as a frame with interference.By varying   ℎ from 0 cm to 50 cm in 0.1 cm increments, we identified the optimal threshold  ΔΦ ℎ (refer to Section 3.2) using a naïve full search.Due to imbalanced label distribution (i.e., few frames with interference), we employed Matthews correlation coefficient (MCC) [13] as the performance metric.
Figure 8 depicts the evaluation outcomes as   ℎ was altered.The blue line denotes the MCC value, while the orange line signifies the positive (interference) sample ratio.For instance, using   ℎ = 0.1 as an empirically determined threshold, positive frame ratios for open-space, standard, and extreme conditions are 0.13%, 3.14%, and 25.8%, respectively.Table 3 shows an in-depth sample result with the same threshold.The results indicate that our approach, comparing IMU and EMF gyro data, can effectively detect rare interference instances with Fig. 8. Result of the interference detection by condition.Our approach reasonably detects the interference (MCC ∼ 0.6)., we can develop alternative models, such as high-precision models, to minimize false-positive user notifications.Our current online frame-by-frame method does not utilize future data; however, if developers permit slight delays, employing data from several frames ahead could reduce false positives and enhance detection accuracy.

Correction Result
Our correction approach's hyperparameter,  ΔΦ ℎ , determines the frequency of system corrections.A smaller  ΔΦ ℎ leads to prolonged IMU-based trajectory corrections, which can cause drift problems and result in incorrect adjustments.Consequently, we varied  ΔΦ ℎ to optimize performance within this trade-off.In extreme conditions, trajectory correction proves challenging due to extended interference periods (See Figure 7).Therefore, we focused on enhancing tracking performance in open-space and standard conditions, where spike-like interference is prevalent.We recommend using the detection model to inform users of interference presence for the extreme condition, as outlined in Section 3.1.

Rotation
Error. Figure 9 shows the best performance of our correction approach when we varied  ΔΦ ℎ .The used  ΔΦ ℎ is fixed across sessions.In the open-space condition, the lowest mean and maximum rotation errors per session, 1.17°± 0.13°and 3.35°± 0.70°, respectively, are reduced with the correction compared to the raw errors.Similarly, the standard condition exhibits lower mean and maximum rotation errors per session, 1.78°± 0.52°a nd 6.59°± 4.21°, respectively.It is worth noting that the improvement in the mean error per session appears small since the correction only occurred during a short moment in each session.Here, in the same manner as the detection result, we used   ℎ = 0.1 and calculated the improvement within the interference period instead of per session.As a result, the improvements in the open-space and standard conditions are 71.0%(6.92°→ 2.01°) and 57.1% (8.88°→ 3.81°), respectively.This result confirms that the large rotation error is significantly reduced thanks to our EMF-IMU fusion approach.

Position
Error.Similar to the rotation correction, the performance of our position correction depends on  ΔΦ ℎ , so we varied it and computed the total improvement achieved by the model.Figure 10 compares the result across different conditions and methods.Without correction, the position error in the open-space condition was 0.72 cm (SD=0.18cm).Notably, vision-based marker-less fingertip tracking in Meta Quest 2 achieved approximately 1.0 cm static positional error in a similar setting [1], indicating that EMF tracking in open-space conditions performs reasonably accurately.
Comparing different correction approaches, the IMU odometry model produced inferior results due to the drift error and sensor noise, as anticipated in Section 3.3.Conversely, our fusion model effectively leverages both IMU odometry and trajectory forecasting.Specifically, the lowest mean and maximum position errors per session in the open-space condition are 0.71 cm (SD=0.18cm) and 5.21 cm (SD=4.22cm), respectively.In the standard condition, the lowest mean and maximum position errors per session are 1.98 cm (SD=0.90cm) and 22.2 cm (SD=15.2cm).It is important to note that the mean position error improvement is not readily apparent since only a small portion of a session undergoes correction, while the maximum position error displays a significant reduction.
Despite this, the improvement (error reduction rate) is not as substantial as the rotation correction for several reasons.First, the noise profile in the measured acceleration and gyro values is different in the IMU sensor.Secondly, while a double integral is applied to the measured acceleration for correcting position, only a single integral is applied to the measured gyro for correcting rotation, resulting in less drift error.

END-TO-END SYSTEM DEMONSTRATION
We now demonstrate how MIM works in the MI-Poser's end-to-end pipeline.Figure 11 shows a user moving their hands along with the detected interference level.Only when the hand (i.e., the EMF sensor) goes close to the metal door is there an increased value in the interference level, providing users with a way to identify potential tracking degradation due to metal objects in the environment.Moreover, Figure 12 presents a user moving their hand horizontally over a metal object and illustrates two outputs of the proposed method with and without MIM.Without MIM, the hand pose exhibits significant variance due to interference.However, with MIM, the hand pose becomes more stable, and the orientation appears more reasonable (i.e., more horizontal).Please refer to the Video Figure for demonstrations.

DISCUSSION
We have demonstrated that MI-Poser can achieve real-time pose reconstruction with minimal errors while maintaining natural form factors for AR scenarios, such as hands-free operation.Furthermore, its robustness against metal interference due to interference detection and correction makes it a promising solution for AR/VR applications.Several limitations and areas for future work remain, which we discuss in this section.

Limitations
First, the ground truth upper-body pose data in User Study 1 was obtained using Kinect rather than high-end MoCap systems with external cameras, such as OptiTrack [38] and Vicon [59].Consequently, there is uncertainty in the ground truth data.For instance, Kinect is known to be slower than OptiTrack by 50 ms [10], which may have led to overestimated errors for fast motions (e.g., tennis) in our results.While our setup is sufficient to demonstrate the proof-of-concept (as done in [4]), further investigation with high-end MoCap systems is necessary to assess more precise performance.
Similarly, in User Study 2, we used ARKit on an iPhone with LiDAR to collect ground truth pose data, aiming for convenient data collection in multiple environments with different metal profiles.Although state-of-the-art in VIO, this approach is less accurate compared to MoCap systems.Nevertheless, the difference between EMF and ARKit tracking is small (0.72 cm) in the open-space condition (See Table 1).Considering the error of EMF tracking in an ideal metal-free environment is 0.9 mm, we conjecture that ARKit's tracking performance has been reasonable, at least for analyzing significantly larger errors due to interference.We should note, as discussed in Section 6.1.1,our data collection setup could not include excessively fast arm movements, which necessitates an alternative setup.
In addition, our system relies on VIO tracking on AR glasses, causing it to suffer from degraded performance under low light conditions.Therefore, applications should allow users to disable global body tracking in such situations, as implemented in conventional VR headsets.

Future Work
8.2.1 Further Study about the Efficacy of MIM.While MIM effectively mitigated metal interference at the sensor tracking level, and we demonstrated its qualitative improvement in body pose tracking, its quantitative improvement remains to be examined.In the future, we plan to emulate a variety of user environments (e.g., a living room with some metal objects) and collect in-situ ground truth body pose tracking using a system such as OptiTrack, and evaluate MI-Poser.Such a study will reveal when metal interference occurs in actual scenes and how much MIM contributes to pose estimation and remains an important future work for this research.Furthermore, as outlined in Section 3.1, MI-Poser can notify users about potentially degraded tracking performance based on interference detection.Examining how users appreciate such feedback from the perspective of maintaining trust between users and systems [6] would be insightful.

Algorithm
Refinement for MIM.The solutions we tested for MIM are simple yet effective.While simple solutions are preferred for their ease of deployment and maintenance, more sophisticated algorithms can be explored.For instance, we employed an algorithm to detect interference by analyzing the data of a single frame.However, other unsupervised anomaly detection methods [67] can leverage time-series trends to improve precision.Furthermore, collecting more data in various environments to increase samples with interference makes it possible to train models in a supervised manner for both detection and correction.

Evaluation of Full-Body Tracking
Performance.As we mentioned in Section 3, prior work showed the possibility of full-body tracking with a similar setup as ours (i.e., sensors on the head and wrists) by training a model with a large dataset.The IK model we used [22] can also estimate the pose of joints from the lower-body.However, this is essentially a hallucination from the data and is known not to be robust against motions not in the training dataset [37].Thus, we focused on upper-body tracking in our proof-of-concept.It is worth testing the current performance of full-body tracking with various motions.8.2.4 Integration of the EMF Sensor into Smartwatch.Given the form factor of MI-Poser, integrating an EMF sensor into a smartwatch is a promising direction.In this way, we can use MI-Poser with an even sparser sensor setup, namely, AR glasses and a smartwatch, enabling more ubiquitous scenarios such as video calling with a rich upper-body expression of a 3D avatar from outside.Undoubtedly, this setup loses information about the wrist without the watch, leading to lower fidelity as demonstrated in Section 5.2; hence, the benefit of applications needs to be tested.Note that a shield must be added to the electronics inside the watch to prevent interference with the EMF sensor.

CONCLUSION
We proposed MI-Poser, a body pose tracking system using an EMF transmitter attached to AR glasses and two wrist-worn EMF sensors, which offers a hands-free, wide-range solution.In User Study 1, we demonstrated that MI-Poser can reconstruct upper-body pose with small errors across various movements, including cases where hands are out of sight from AR glasses, highlighting the benefit of combining EMF tracking and IK.As critical in utilizing body-scale EMF sensing with a sparse sensor setup, we also dealt with the metal interference issue to improve MI-Poser's robustness in end-user environments, namely, metal interference mitigation (MIM).In User Study 2, using a newly collected dataset, we quantified errors due to interference in different metal conditions and proposed solutions based on the EMF-IMU fusion approach.The effectiveness of the proposed interference detection and correction was demonstrated, which is the first of its kind in using EMF tracking.While future work remains, the results suggest that MI-Poser offers developers a practical body pose tracking system, especially for enabling many interesting everyday AR applications.

A DETAILED IMPLEMENTATION OF IK
Our hardware setup has two subsystems for measuring the pose of certain body parts: AR glasses that use VIO tracking and wrist-worn EMF sensors.Accordingly, we define the coordinates for our system as follows: • World Global Coordinate: The coordinate where AR glasses provide the absolute positions    ∈ R 1×3 and orientations in axis-angle representation Φ   ∈ R 1×3 , where  denotes glasses and  denotes the world coordinate.

Fig. 3 .
Fig.3.Overview of the proposed metal interference mitigation (MIM) according to the two types of errors.When a user encounters a metal object for a short period, it is detected and the trajectory is corrected (left).When the interference is longer, e.g., when the user holds a metal object, it is detected for notifying the user (right).
in axis-angle representation Φ   () ∈ R 1×3 given time  when there is no metal interference.We use  () as a binary index to represent the presence or absence of interference;  () = 0 in this case.Simultaneously, an angular momentum ΔΦ    () is measured in the same coordinate as Φ   ().At time  + Δ, there is rotation information from the EMF sensor as Φ   ( + Δ).If no interference occurs at  + Δ, an approximation holds:

Fig. 4 .
Fig. 4. EMF tracking hardware.The source (left) is integrated into AR glasses and the sensors (right) are attached to the user's wrists.

Fig. 5 .
Fig.5.MI-Poser's performance (joint position error) on real sensor data across body region and motion.The error is overall small thanks to the precise EMF tracking.The sensor-position calibration significantly reduces the error.Error accumulates from the hip (alignment root) to the end-effectors.The largest error is observed during the tennis motion, which involves rapid and extensive torso movements.

Fig. 6 .
Fig. 6.Setup for data collection in User Study 2. (a) An EMF sensor and iPhone 13 Pro are tightly attached to a non-metal object.The iPhone tracks its pose based on ARKit's world tracking function.(b) Open-space with few metal objects.We intentionally chose locations with visually striking objects to aid ARKit's VIO tracking.(c) Standard condition with common metal objects such as a monitor.(d) Extreme condition where we intentionally move the sensor close to common metal objects such as a can.Note that the apparatus is not visible in (c) and (d).

Fig. 7 .
Fig. 7. Example plots of the position and rotation error between the ground truth and EMF sensor values in the three different conditions.

Fig. 10 .
Fig. 10.Result of the interference correction in position error by condition and method.Mean position error per session (left).Max position error per session (right).Our position correction approach with the fusion model reduces the error but the improvement is relatively smaller than in the rotation error.The error bars are standard error.

Fig. 11 .Fig. 12 .
Fig. 11.Demonstration of the interference detection.The interference level is estimated on a frame-by-frame basis and displayed on the laptop (red line).When the sensor passes near the metal object (left figure), the interference level increases, while it remains unchanged in other situations (right figure).

•
HMD Local Coordinate: This is the local coordinate of the AR glasses with the origin in the center of the AR glasses.We represent sensor positions and rotations in HMD local coordinate as     ∈ R 1×3 and Φ    ∈ R 1×3 , where  denotes sensor and   denotes the HMD coordinate.• EMF Local Coordinate: The coordinate where two wrist-worn EMF sensors are tracked relative to the EMF source on the head.We represent their positions and rotations as    ∈ R 1×3 and Φ   ∈ R 1×3 , where  denotes sensor and  denotes the EMF coordinate.• Body Local Coordinate: Human body pose can be represented by each joint position    ∈ R 1×3 and orientation Φ   ∈ R 1×3 in the body local coordinate, where  denotes body joints and  denotes the body coordinate.

Table 2 .
Correlation between position error between ground truth and EMF, rotation error between ground truth and EMF, and gyro error between EMF and IMU.

Table 3 .
Example result of our interference detection when we regard   ℎ = 0.1 as a criteria for interference.
Result of the interference correction in rotation error by condition.Mean rotation error per session (left).Max rotation error per session (right).Our rotation correction approach reduces the error, especially suppressing large errors.The error bars are standard error.