MUSE-Fi: Contactless MUti-person SEnsing Exploiting Near-field Wi-Fi Channel Variation

Having been studied for more than a decade, Wi-Fi human sensing still faces a major challenge in the presence of multiple persons, simply because the limited bandwidth of Wi-Fi fails to provide a sufficient range resolution to physically separate multiple subjects. Existing solutions mostly avoid this challenge by switching to radars with GHz bandwidth, at the cost of cumbersome deployments. Therefore, could Wi-Fi human sensing handle multiple subjects remains an open question. This paper presents MUSE-Fi, the first Wi-Fi multi-person sensing system with physical separability. The principle behind MUSE-Fi is that, given a Wi-Fi device (e.g., smartphone) very close to a subject, the near-field channel variation caused by the subject significantly overwhelms variations caused by other distant subjects. Consequently, focusing on the channel state information (CSI) carried by the traffic in and out of this device naturally allows for physically separating multiple subjects. Based on this principle, we propose three sensing strategies for MUSE-Fi: i) uplink CSI, ii) downlink CSI, and iii) downlink beamforming feedback, where we specifically tackle signal recovery from sparse (per-user) traffic under realistic multi-user communication scenarios. Our extensive evaluations clearly demonstrate that MUSE-Fi is able to successfully handle multi-person sensing with respect to three typical applications: respiration monitoring, gesture detection, and activity recognition.


INTRODUCTION
Since we were first able to obtain CSI (channel state information) in certain Wi-Fi devices [21], Wi-Fi human sensing [25,35,44,56,58,59,67,72] has been attracting significant attention from both academia and industry.During the past decade or so, many applications of Wi-Fi human sensing have been developed, notably including vital signs monitoring [35,67], gesture detection [56,72], activity recognition [15,25], as well as localization and motion tracking [14,44,58,68].Whereas such sensing applications have a promising potential to be integrated with the ubiquitously deployed Wi-Fi communication infrastructures, they all face a major obstacle in conducting realistic multi-person sensing: the limited Wi-Fi bandwidth fails to offer a sufficient range resolution to distinguish different sensing subjects.
Because Wi-Fi communication does not seem to embrace a super-wide bandwidth due to its contention-based multiaccess nature, 1 existing sensing proposals often avoid its limitation by resorting to radars with a GHz-level bandwidth [3,70], yet radar sensing is somehow inferior to Wi-Fi sensing as it demands extra deployments.In order to continue exploiting Wi-Fi's potential in integrated sensing and communication (ISAC) [12,22], two makeshifts are often adopted.On one hand, many distributed antennas can be used to achieve enhanced spatial resolution for separating subjects [44], at the cost of messing up with the Wi-Fi communication functions.On the other hand, signal processing techniques for separating five subjects at the CSI level have been attempted [67] without offering guaranteed separability in general [69,70].In reality (as in Figure 1), each person often has its own wearable Wi-Fi devices, typically a smartphone or even a smartwatch.Although the communication link between such a personal device and the nearby Wi-Fi access point (AP) is deemed as the basic sensing media by earlier proposals on Wi-Fi human sensing, those proposals aim to leverage either a single link to perform sensing [35,44,59,67] or multiple links to offer a slightly improved spatial resolution [25,43,58,72].They all neglect two fundamental factors in such a realistic multi-user communication setting shown in Figure 1: i) each personal-AP link uniquely identifies the human subject to be sensed, and ii) since the subject is within the near-field (less than 0.2m in range) of its own Wi-Fi device, the channel variation caused by its motions to its personal-AP link could be so strong as to push the interference from other subjects down to the noise floor.In other words, the default multiuser communication setting of Wi-Fi does offer the potential to be naturally extended to multi-person sensing, if one can properly integrate sensing into communication.
From application perspective, such near-field multi-person sensing naturally supports various functions under the pervasive deployment of Wi-Fi infrastructure.As these functions include sensing vital signals, gestures, activities, and locations, they are especially applicable to eXtended Reality (XR).In particular, integrating gestures and activities recognition into Wi-Fi communication reduces the peripheral sensors, leading to lighter and less power-consuming virtual reality (VR) and merged reality (MR) headsets, making them more desirable for long-time wearing [4,54].Furthermore, the environmental and human sensing results indicate key contextual and localization information of nearby human and object motions; overlaying such information on the top of real-world visions facilitates augmented reality (AR) and MR applications in intrusion detection, patient monitoring, and machine status assessing [10,34].
Nonetheless, integrating multi-person sensing with multiuser communication is highly non-trivial, as existing practices, exploiting only artificial Wi-Fi traffic for the sensing purpose, barely offer any experience.In practice, multi-user scenarios typically cause a much lower and very irregular frame arrival rate per link, thanks to the contention-based multi-access nature of Wi-Fi.Since the CSI carried by each frame is a critical channel state sample for Wi-Fi sensing, a lower and irregular frame rate indicates a lower and irregular sampling rate, which may significantly confine the usability of Wi-Fi sensing.As most Wi-Fi sensing applications have been developed upon a high and regular frame rate (up to 1000 frame/s [25,44,67]), this challenge, crucial to seamless integration of multi-person sensing with multi-user communication for Wi-Fi, has never been seriously tackled.
To address these challenges, we propose MUSE-Fi as a novel MUlti-person SEnsing system leveraging Wi-Fi.To motivate MUSE-Fi, we first theoretically characterize and experimentally verify the dominating effect in near-field Wi-Fi sensing, upon which we develop criteria on the physical separability of multiple subjects.Based on the theoretical characterizations, we propose three sensing strategies for MUSE-Fi to be integrated with the traffic cross each personal-AP link, namely exploiting i) uplink (to AP) CSI, ii) downlink (from AP) CSI, and iii) downlink BFI (beamforming feedback information) [9].For all strategies, we propose a sparse recovery algorithm (SRA) to mask the potential variance in frame rate; it aims to regulate the input samples so as to deliver a unified data flow to later processing pipelines for respective sensing functions.In addition, we study the sensing effectiveness of these strategies by contrasting the BFI-enabled compressive sensing with conventional CSI-based Wi-Fi sensing.Our key contributions can be summarized as follows: • We propose MUSE-Fi as the first true multi-person Wi-Fi sensing system; it integrates multi-person sensing with multi-user communication in a seamless manner.• We, for the first time, expose the dominating effect of nearfield Wi-Fi sensing; it is exploited by MUSE-Fi to achieve physical separation of multiple subjects.• We design three sensing strategies for MUSE-Fi and equip them with an SRA to mask the variance in frame rate.• We reveal the pros and cons of BFI-enabled Wi-Fi sensing against the conventional CSI-enabled one.• We implement MUSE-Fi prototype and evaluate it with extensive experiments.The promising results confirm that MUSE-Fi indeed supports multi-person Wi-Fi sensing under realistic scenarios.
The rest of the paper is organized as follows.Section 2 discusses the dominating effect of near-field sensing both theoretically and experimentally.Section 3 presents the sensing strategies for MUSE-Fi, along with the crucial SRA to regulate the frame rate.Section 4 specifies how the MUSE-Fi prototype is implemented and how the application scenarios for case studies are set up.Section 5 reports the evaluation results of three case studies.Related works are briefly discussed in Section 6.Finally, Section 7 concludes our paper.

SENSING BY NEAR-FIELD DOMINATION
In this section, we introduce the Wi-Fi human sensing basics, and systematically study and validate the dominating signal variations in near-field sensing.Compared with conventional antenna near-field and capacitive coupling [8,20] not developed for practical multi-person sensing, our theoretical analyses allow for characterizing the feasible region of near-field sensing and shedding insights on the upper/lower bounds of subject number and spacing.

Wi-Fi Human Sensing Basics
We start by introducing a Wi-Fi sensing system with an AP and user equipment (UE) pair aiming to sense the physical motion of a human subject denoted by S. At time , denote the distance between the AP and S by  ,S () and the distance between S and the UE by  S, ().Further focusing on the influence of S, we model the wireless channel gain between the AP and the UE as: where ℎ  , and ℎ  , () represent the static and dynamic channel gains between the AP and UE due to, respectively, the direct communication path and interfering motions along it, and ℎ ,S, () indicates the channel gain from the AP to UE via the reflection of S, which can be expressed as: where  is the carrier wavelength,  ,S, represents the product of Tx and Rx antenna gains and the reflection coefficient of S, and  is the path loss exponent [18].Typically,  ∈ [2,4] with  ≈ 4 for indoor environments [47].Therefore, Wi-Fi human sensing can be described as follows: the physical motion of a human subject results in changes of  ,S () and  S, (), which in turn lead to the changes of channel gain ℎ ,S, () over time.Therefore, by analyzing the time series of ℎ ,S, () obtained from the CSI of the Wi-Fi frames, both AP and UE are able to sense the motion of S.

Feasible Region for Near-field Sensing
Consider a more general scenario where two persons exist in the Wi-Fi sensing system.Without loss of generality, we let one of them be the subject S, and refer to the other as the interferer denoted by I.
where ℎ ,I, () is the channel gain from the AP to UE via the reflection of I; it can be modeled in a similar manner as Eqn.(2).Eqn.(3) seems to suggest that it is hard to separate the channel influences imposed by S and I since their channel gains get mixed up.Nevertheless, we point out that, in the Wi-Fi sensing scenarios where S is close to or in the near-field of UE (i.e., distance below 0.2m, empirically), the variation of the channel gain is dominated by the S's physical motion; in other words, |ℎ ,S, ()|/ ≫ |ℎ ,I, ()|/.We term this phenomenon near-field domination effect, and we provide its theoretical analysis as follows.
Firstly, to quantify the variation of ℎ ,S, (), we evaluate it by the squared amplitude of the partial derivative of ℎ ,S, () w.r.t., which is referred to as the power of channel variation.To simplify the analysis, we assume  ,S ()/ =  S, ()/ =  S .The value of  S can be interpreted as the intensity of S's motion in terms of speed.The power of channel variation of S can then be calculated as: where we omit symbol  in the distance notations and let G,S, = (/4) 2  ,S, for the sake of brevity.In the second row of Eqn.(4), the first term inside the bracket is caused by the amplitude variation of the channel gain and the second term results from the phase variation of the channel gain.(★) holds because, in typical 5GHz Wi-Fi sensing systems with S in the near-field of UE (e.g.,  ,S ∼ 3 m,  S, ∼ 0.1 m, and  ∼ 0.06m), the first term in the bracket is much smaller than the second term and thus can be omitted, implying that the channel variation is mainly due to that S's motion changes the phase of the channel.The power of variation of ℎ ,I, () can be similarly derived for I as  I = G,I,  2 I ( ,I  I, ) − , with  ,I and  I, being the distance between the AP and I and between I and the UE, respectively, and  I being the intensity.Consequently, the near field domination effect can be interpreted as  S being significantly larger than  I , thanks to  S, being much smaller than  I, , given  ,S ≈  ,I and  S ≈  I .It is worth noting that assuming  S ≈  I may not be practical, because human sensing to different targets may have distinct meaning (e.g., respiration monitoring against gesture detection).Though our following derivation shall stick to this symmetric assumption for the ease of exposition, we will experimentally validate that the near-field domination still holds even with asymmetric intensities, as far as there is a sufficient discrepancy between  S, and  I, .
In order to concretely characterize the interferer's feasible region that maintains the domination of  S at the UE, we propose an novel metric variation to interference ratio (VIR); it evaluates the variation power ratio between ℎ ,S, () and the sum of ℎ ,I, () and dynamic channel ℎ  , ().Based on [59], ℎ  , () can be also treated as an interference, whose power   is in linear proportion to that of the static channel gain.Therefore, assuming a LoS path between the AP and UE, we have   =  2  , − + , where  and  are fixed for a given pair of AP and UE.Then, we have: Intuitively, the feasible region of I is indicated by VIR S value being greater than a threshold  th .
To deliver more visual insights, we illustrate the feasible region of I for  th = 50 in Figure 2, given  S ,  I , , , G,S, , and G,I, being normalized to 1 and  = 4.It can be observed (e.g., by the small infeasible circular regions around S and the AP) that the separation distance between S and I can be potentially short without resulting in poor VIR for S, given I is not close to the AP, too.Moreover, as I is also a potential subject of the Wi-Fi sensing system, it needs to be treated symmetrically to S, i.e., I is also interfered by S, so VIR I , similarly characterized as VIR S in Eqn.(5), should also be dominating at I's UE, meaning VIR I <  th when  ,I being large is infeasible too.Therefore, besides the small regions mentioned earlier, a bigger one around the whole system indicates the whole boundary of I's feasible region.
Although these results are obtained numerically and serve for indicative purpose only, the resulting feasible region for a single interferer I establishes the physical guarantee that the channel variation due to S can be well separated from that due to I, and vice versa.It can also be observed from Figure 2      Cassini ovals around the AP and S, which may appear to be similar to the experimental results in [59].Nevertheless, our VIR is derived from the perspective of channel variation rather than from the SNR metric adopted in [59], which evaluates the ratio between the power of signals reflected by the subject and the noises.We strongly believe our channel variation analysis is far more relevant to sensing applications, as the physical information of a subject is represented by the channel variation rather than the signal strength.

Insights into Multi-person Scenarios
Now we are ready to extend the analysis in Section 2.2 to multi-person scenarios, where  subjects ( ≥ 3) are using the Wi-Fi sensing system and each of them stays in the near-field of its UE.Though we could extend Eqn. ( 5) to multiperson case by putting  -fold interference in the denominator, the resulting characterization would be too general to shed any immediate lights on sensing performance, because the distribution patterns of these subjects have infinite possibilities.To this end, we consider two symmetric distribution cases, aiming to address two important questions separately: i) how many subjects can a Wi-Fi sensing system support?and ii) how close can adjacent subjects be?To simplify the presentation, we no longer distinguish between the positions of the subject and its UE.
To answer question i), we analyze the case where  subjects stay at distance  from the AP and are uniformly spaced, as shown in Figure 3(a).Due to the radial symmetry of their positions, we can focus on analyzing any of them, which is again named S. Extending Eqn.(5), we have: where G represents the identical values of G,S, and G,I  , (∀ ∈ {1, ... − 1}), and Δ denotes the short distance from S to its UE.Generally, the series summation in Eqn.(6), Moreover, based on (7), we can also derive the minimum and maximum distances (resp. min and  max ) between S and the AP for the considered case to be feasible by solving the inequality  max ≥ 3 in the field of real number.As shown p l 0 j y r e p f V i / v z S u 0 m j 6 2 I j t A x O k U e u k I 1 d I f q q I E I 4 u g Z v a I 3 5 8 V 5 d z 6 c z 1 l p w c l 7 D t E c n K 9 f m J 6 g u w = = < / l a t e x i t > r max (b) Mirror symmetry case.

Proof-of-Concept Pre-Experiments
We first conduct two preliminary experiments to briefly validate the theoretical analysis in Eqn.(7).We deploy a Wi-Fi 5 AP in the middle of a table, serving 4 users spaced 2m apart around the table, as depicted in Figure 4(a).Each user in this paper involves a UE and a subject placed 15cm apart from each other.We leverage the iPerf3 tool [55] to emulate the 4 UEs and the AP constantly exchanging data frames with each other for 40 seconds, during which the four subjects take turns holding their breath.Their irregular CSI sequences (blue points) are obtained by the uplink traffic from the UEs to the AP, and the outcomes after the processing of MUSE-Fi are shown as red (dotted) curves in Figure 4.
Our results in Figure 4(b) show that the respirations of the 4 subjects never interfere with each other, indicating an effective multi-person sensing.Though the theoretical results in Figure 3(c) suggest that up to 25 users can be supported with  = 1.41m, our preliminary experiments are conducted in a more conservative manner, as the theoretical results are obtained under ideal conditions without user asymmetry.This setting also leaves room for us to validate asymmetric sensing, where we let three subjects breathe normally while the Subject D performs the activity of standing up and sitting down.For brevity, we only show the sensing results for one of the breathing subjects and Subject D in Figures 4(c) and 4(d), respectively; these results clearly demonstrate that multi-person sensing can still be successfully achieved even under asymmetry scenarios.In our case studies later, more users will be involved to better confirm the effectiveness of near-field domination for MUSE-Fi.
We conduct another experiment to validate the case of close proximity between users in Eqn.(9).We let three subjects sit side by side with 40cm distance between the centers of gravity of the neighboring ones, as shown in Figure 4(e).Our results in Figure 4(f) focus on the centered subject (the most interfered one), whose respiratory (held or not) can be clearly sensed regardless of the behaviors of others.These results evidently validate the effectiveness of feasibility of near-field domination on Wi-Fi multi-person sensing even under close proximity between neighboring subjects.

SHAPING-UP MUSE-FI
Given the physical guarantees on multi-person separation through near-field domination, we hereby introduce the detailed components and sensing strategies of MUSE-Fi.In hardware, MUSE-Fi is comprised of a commodity AP and multiple UEs owned by subjects demanding sensing services from the system; all Wi-Fi devices follow the prevalent Wi-Fi standard and their CSIs can be readily obtained from the received frames [19,26].In the following, we first introduce three sensing strategies adopted by MUSE-Fi, along with     their practical issues.We then specifically investigate two critical issues faced by these strategies.

Three Sensing Strategies for MUSE-Fi
Since CSIs are carried by Wi-Fi frames, MUSE-Fi has three sensing strategies based on the traffic direction and how CSIs are carried.In particular, they are: • UL-CSI Sensing: The uplink (UL) traffic and the CSI, obtained from the long training sequence (LTS) carried in the preamble of a frame, are adopted as sensing primitives.• DL-CSI Sensing: The downlink (DL) traffic and the carried CSI are adopted as sensing primitives.• UL-BFI Sensing: The UL traffic and the carried BFI are adopted as sensing primitives.
Apparently, the entity to handle sensing (data processing) depends on the direction of data flow: both UL-CSI and UL-BFI require sensing to take place on the AP side, while DL-CSI demands the UE to handle sensing.Here, BFI is actually a compressed version of the DL-CSI but carried by UL traffic for the AP to be aware of the DL channel conditions, so as to fine-tune its MIMO precoding; it only becomes available since IEEE 802.11ac standard [37].As BFI is transmitted with action frames (part of UL traffic) in plain form, sensing can also take place at Wi-Fi devices capable of overhearing the traffic; the incurred privacy issue will be further discussed in one of our companion works.
3.1.1User Registration.When a user (a subject with its UE) demands access to MUSE-Fi, it should first announce its presence along with the sensing application it requires.This user registration process is necessary for three reasons: i) it lets MUSE-Fi be aware of the number of users and their respective motion types, so as to coordinate users more accurately (e.g., reject users if system capacity is reached), ii) it prepares MUSE-Fi with prior information to fine-tune its processing pipelines, such as selecting different filters according to motion intensity or involving algorithmic modules to handle excessive interference if the average motion intensity is too high, and iii) it preserves privacy for normal Wi-Fi users with no intention to access MUSE-Fi.In the following, we will focus only on the registered users, leaving detailed registration procedure in our extended report.

Practical Issues.
In stark contrast to existing Wi-Fi sensing systems working with artificially generated continuous traffic and mostly with only a single link, the most prominent challenge for MUSE-Fi is to handle the bursty and intermittent traffic in practice.Specifically, due to the multiuser communication infrastructure and the contention-based medium access mechanism on which MUSE-Fi is based, both UL and DL traffic exhibit bursty and intermittent characteristics, leading to CSI time series being sparse with many discontinuous parts.To illustrate this, we depict the frame arrival rates for both UL and DL when one or two users watch 1080p videos using their respective UEs in Figure 5.Both UL and DL traffics already exhibit bursty and intermittent patterns for one user, caused by upper-layer protocols' data caching and rate control [62]; these are exacerbated by channel contentions even with only one additional user, as shown in Figure 5(b).Moreover, the BFI is contained only in a small portion of UL traffic, and its sample rate is about 1/10 of DL frames, peaking at roughly 10 frames per second,

Sparse Recovery Algorithm
We propose an SRA for MUSE-Fi to recover continuous channel variation from the intermittently sparse samples due to realistic traffic.The SRA is comprised of two components: a data transformation pipeline to pre-process a sparse CSI sequence sampled under realistic traffic, and a self-supervised data recovering network to recover the densely sampled CSI sequence.The core novelty of SRA lies in eliminating the extensive label collection, by interpreting the correlations between sparse and non-sparse data slices.Without loss of generality (of three sensing strategies), we denote the input data to SRA by a 1D time series {  } drawn from the CSI of specific antenna pair and subcarrier, where  is the sampling time and   ∈ R is the phase of a corresponding CSI sample.SRA outputs {  } as an evenly and densely sampled multi-channel sequence where   ∈ R  F represents the  F frequency components of the CSI for time .This output can be used directly as the sensing result, or be further processed to recognize the activity or gesture of the subject.We elaborate on the two SRA components in the following, .

Data Transformation
Pipeline.This four-step pipeline transforms {  } into an evenly resampled output sequence { x ∈ [−1, 1]  F } 1≤≤ s with length  s as the total number of resampled time instants, which is shown in Figure 6.
Segmentation.We first segment the time series into two types of slices, i.e., sparse slices, where the samples are sparse in time, and non-sparse slices, for them to be treated differently.This segmentation is done by using a sliding window of length Δ in time to check whether it contains a sufficient number of samples.In particular, time slices with more than  nsp samples are marked as non-sparse; otherwise as sparse.Here Δ and  nsp are parameters specified by sensing applications and are empirically set in Section 4.
Resampling.The time series is resampled so that the samples can be evenly spaced in time, facilitating further denoising and sparse recovery.Within each non-sparse slice, the outliers are removed, and an interpolation is performed to meet a resampling frequency  rs .Each sparse slice is resampled with frequency  rs , with samples moved to their nearest resampled time instants; those time instants without data are tagged as "no-data" and filled linearly.The result is denoised through a low-pass filter with cut-off frequency  cut .Both  rs and  cut are empirically specified in Section 4.
Transformation.This step transforms the resampled time series into its spectrogram { x ∈ R  F } 1≤≤ s with  F frequency components.The reason behind this is that motions of subjects generally lead to channel variations whose patterns are environment-and subject-specific and hardly recognizable in the time domain.By transforming it into a spectrogram, the impact of subject's motion on the channel becomes more apparent, facilitating effective sparse recovery.
Normalization.Mapping each x into x ∈ [0, 1]  F via min-max normalization allows for focusing on the relative variation pattern while eliminating the magnitude difference of channel variation potentially caused by subject positions.Besides, the frequency components of time instants with no-data tags are assigned value −1, making them distinct from those with data and clearly indicating the data sparsity.

TCN-based Sparse Slice
Recovering.Rather than employing a heavy neural network like U-Net for audio inpainting applications [28], we adopt a temporal convolutional network (TCN) based autoencoder (AE) to achieve sparse recovery, involving fewer parameters to make it efficient to train and deploy in resource-limited devices as UEs and APs.TCNs are superior to other types of neural networks (e.g., LSTMs) as they exploit convolutional layers with dilated kernels to capture ultra long-range dependencies in samples while maintaining a manageable number of parameters [7].
Network Structure.As shown in Figure 7, the designed TCN-based AE consists of an input layer, 4 TCN blocks, a 1D convolutional AE module, and an output layer.The core of the network is the TCN blocks, whose components are featured by the dilated convolutional layers and a residual connection.In particular, taking the first dilated convolution layer as an example, it takes { x } 1≤≤ s as input and applys dilated convolution to it with  ch 1D-kernels to obtain a  chchanneled output {  } 1≤≤ s for the next layer.For the -th channel of the output ( = 1, ...,  ch ), given the 1D-kernel   = ( ,1 , ...,  , ) with  , ∈ R  F ( = 1, ..., ) and  being the kernel size, the dilated convolution can be expressed as: where x is zero-padded for  < 1, (•) ⊤ is the transpose operator, and  ∈ Z + denotes the dilation factor used to expand the receptive field of the output element.According to Eqn. (10), the operation with a small dilation factor (e.g.,  = 1) degenerates to a traditional convolution for extracting the features of local context around each element of the input.As  increases, the mutual dependency of local features can be captured by the kernel, and each element of the output can represent the local features of input in a wider range.Therefore, by utilizing a stack of dilated convolutional layers with exponentially increased  as shown in Figure 7, the local features are gradually extracted and collected, enabling each node of the output layer to take into account well-represented local features for almost the entire spectrogram.Finally, with the results of the TCN blocks, the AE can effectively predict and recover the missing data.Representing the TCN-AE network as a -parameterized function F  , re-arranging { x } 1≤≤ s into matrix form X ∈ R  F × s , and denoting the output by Ỹ ∈ R  F × s , the recovering process can be represented as F  : X → Ỹ .
Self-supervised Training.Generally, the pre-collected training dataset needs to contain data-label pairs: spectrogram with sparse slices as the data and corresponding ground truth with no sparsity as the label.Unfortunately, collecting the ground truth directly is almost impossible, as sparse slices are caused by the lack of frames (hence losing the carried ground truth samples) during certain periods.To overcome this impossibility, we propose a self-supervised training method; it leverages only the non-sparse slices for training the TCN-AE, aiming to restore the hypothetical non-sparse data that facilitate various downstream sensing tasks.Specifically, we collect the spectrograms of non-sparse time slices to form the training label set, while obtaining the corresponding input data by randomly assigning no-data tags to the elements for creating artificial data sparsity.We note that this tag assignment needs to preserve the bursty and random patterns in the occurrence of no-data elements.
Moreover, to augment the training dataset, each expanded non-sparse spectrogram data in the label set are reused for multiple times with random tag assignments.Consequently, we obtain a completely labeled training dataset, denoted by D train = {T ( ),  }, without resorting to the impossible ground truth collection process.Here  ∈ R  F × s represents the spectrogram data of a non-sparse time slice, and T (•)

Sparsity been recovered
Spectrogram with sparse time-slices

< l a t e x i t s h a 1 _ b a s e 6 4 = " P c H Y J P G u i i j A W 2 T L Z 6 K 3 2 s g g w I Y = " >
r M Y z T J p n d e c q 9 r l / U W 1 b h e x l c g x O S F n x C H X p E 7 u S I M 0 C S O P 5 J m 8 k j f r x X q 3 P q z P W e m K V f Q c k T l Y X 7 + k z J + i < / l a t e x i t > ✓ < l a t e x i t s h a 1 _ b a s e 6 4 = " E t X F h B Q P w S d G P e y H + T 1 g Q < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 P x / 6 / 3 5 p q 9 n r h 0 y S j 8 B S N U 4 is the random tag assignment.Finally, we can express the training process by the following optimization problem, to minimize the expected mean-squared error (MSE) between the recovered spectrogram and its label: The overheads of the training is minor because no online labeling process is needed, and thus MUSE-Fi can collect the dataset automatically and conduct the training offline without incurring any real-time overheads.

To Compress or Not to Compress?
In this section, we specifically study the effectiveness of BFI-enabled sensing.Consider a conventional CSI matrix  ∈ C  rx × tx for a given subcarrier of a DL link with  tx antennas for Tx at AP and  rx antennas for Rx at UE. Instead of directly feedbacking  to the AP, the UE piggybacks a compressive form of  (i.e., BFI) onto the UL traffic, containing only the necessary information for Tx beamforming.Consider a channel state represented by  0 , the UE first obtains the Tx beamforming matrix  by conducting the singular value decomposition (SVD) on  0 , i.e.,  0 =   * where  ∈ C  rx × rx and  ∈ C  tx × tx are unitary matrices,  ∈ R  rx × tx is a rectangular diagonal matrix with nonnegative real values on the diagonal, and (•) * denotes the conjugate transpose.The UE then compresses the channel state by converting  into BFI, which is represented by a series of real angles, and sends it to the AP.With the BFI received at the AP, a reconstructed beamforming matrix Ṽ is obtained, whose column vectors approximate those of  except for the column-wise phase-shifts that enforce the elements in the last row real-valued [16].
Based on the premise above, the BFI-enabled sensing under MUSE-Fi's context can be analyzed, following the illustration in Figure 8, where a subject is in the near-field of the UE while far from the AP.After a displacement of the subject, the altered channel condition changes the CSI matrix to  1 and also affects the SVD result by ℓ being the distance between adjacent Tx antennas, and   being the amplitude ratio between two channel gains respectively from the subject at new position and original position to the -th Rx antenna.We can observe that  ′ =  * tx  .Thus, based on the relationship between  ′ and  , the reconstructed beamforming matrix after motion becomes: Apparently, the BFI variation from Ṽ to Ṽ ′ depends only on the change of the relative direction from the subject to the AP, i.e., Δ , which does not concern the UE at all.As shown in Figure 9, this compressive sensing brings stability to the sensing signal, yet at a cost of reduced sensitivity compared with CSI-based sensing, because BFI-sensing is almost insensitive to the relative motion between the subject and UE.For cases where the subject has rapid movements as shown in Figures 9(b) and 9(c), BFI sensing is more preferable as it produces results that are more stable, compared with CSI sensing that often causes drastic changes blended with noise and outliers.However, BFI sensing can be rather insensitive to micro-motions (albeit still viable), as demonstrated by Figure 9(a).Therefore, whether to use CSI or compressed BFI depends on the specific application and the trade-offs between stability and sensitivity.Note that our analysis assumes that the subject is off the LoS path, which does not account for cases where, for example, the subject's hands are operating on a (smartphone) UE.

PROTOTYPING & EXPERIMENT SETUP
In this section, we first elaborate on MUSE-Fi's implementation, then we introduce the experiment setup.

Implementing MUSE-Fi
MUSE-Fi consists of an AP and multiple UEs owned by subjects seeking sensing services.The AP is a Netgear Nighthawk X10 router [38], and the UEs include smartphones such as iPhone 13 [5] and OnePlus 10T [40], as well as Acer Trav-elMate laptops [1].The Wi-Fi NICs adopted by MUSE-Fi employ 802.11b/g/n/ac for both UL-CSI and DL-CSI sensing, but utilize only 802.11ac for UL-BFI sensing (as BFI is available there only).The retrieval of CSIs is achieved via both Nexmon [19] and PicoScenes [26], while Wireshark [41] is sufficient to obtain cleartext BFI information from Action    No-ACK frames.The obtained CSI and BFI information is analyzed using Matlab.
For training the SRA, after the sensing signals are passed through the pipeline, non-sparse slices in the spectrogram with a duration greater than 4s are picked for self-supervised training.To be specific, the non-sparse slices are used as ground truth (labels), and we then perform a random no-data tag assignment to them, thus generating sparse slices as the corresponding training inputs.We use 70% of the slices for training TCN-AE and the remaining 30% for testing.The parameters for sparse recovery are set as follows: Δ = 0.1s,  nsp = 2,  = 5,  F = 32,  ch = 64,  rs = 64Hz, and  cut = 1Hz for respiration monitoring  cut = 20Hz for other cases.

Experiment Setup
We first conduct micro-benchmark studies with a real-time video conference application, then we perform three case studies for realistic sensing applications.The setups for the case studies share three commonalities: i) each UE is placed in the near field of its associated subject, and it connects to the AP and continuously streams 1080p videos to emulate daily network usage, ii) all subjects perform specified activities simultaneously to test MUSE-Fi's ability in performing multi-person sensing, and iii) each experiment is conducted in a typical indoor meeting room with a different interior furniture arrangement.We also compare MUSE-Fi with a non-near-field baseline that employs another Wi-Fi device placed on the LoS path of the AP but not in the near-field of any subjects to collect CSI and BFI. Figure 10(a) illustrates our experiment setup for case studies, where the AP, subjects, UEs, and baseline device are all exhibited and annoted in Figure 10(b).
Respiration Monitoring.We let 8 subjects breathe simultaneously, and use NeuLog chest belts [39] to obtain the ground truth.The total respiration recording period is 80-minute.During the Transformation step of the SRA, the short-time Fourier transform (STFT) is employed to focus on the lowfrequency components of respiration We employ a 3-layer convolutional neural network (CNN) to extract respiratory rate from the spectrogram.Gesture Detection.We let 8 subjects simultaneously perform six gestures, namely circle (CR), front-back (FB), slide (SL), star (ST), wave (WV), and zig-zag (ZZ).Each activity is performed 500 times, resulting in 24,000 CSI time series each containing 256 samples.We adopt the wavelet synchrosqueezed transform (WSST) [2] in the Transformation step, as it is highly effective in interpreting gesture signals that are non-stationary and contain complex frequency components.Besides, we employ the same classifier as in Widar3.0 [72] to achieve gesture detection from the spectrogram.
Activity Recognition.We let 8 subjects simultaneously perform six daily observed human activities: bending (BD), jumping (JM), rotating (RT), sitting down (SD), standing up (SU), and walking (WL).Each activity is performed 200 times, resulting in 9,600 CSI time series each containing 256 samples.Similar to gesture recognition, we employ WSST for transforming a time series into a spectrogram.To classify these activities, we utilize the same classifier as in RF-Net [15].
All evaluations focus on demonstrating MUSE-Fi's capability of multi-person sensing with commodity Wi-Fi devices; they, by no means, aim to show competitive performance against existing single-person monitoring systems.Instead, our objective is to validate MUSE-Fi's physical separability and quantify its benefits over non-near-field sensing.The comparisons between them are done by contrasting the sensing accuracy results of the former for an arbitrary subject against those of the latter.Our experiments have strictly followed the IRB of our institute.

EVALUATIONS
In this section, we begin with two micro-benchmark studies, verifying the effectiveness of SRA and further testing the differences in sensing via BFI vs CSI.This is then followed by the three case studies specified in Section 4.2.

Micro-benchmark Studies
5.1.1Effectiveness of Sparse Recovery.To demonstrate the effectiveness of SRA, we collect CSI time series for one subject performing respiration, gesture, and activity.We use the MSE loss between the recovered and ground truth spectrograms as in Eqn.(11) to evaluate the performance of SRA. Figure 11(   Upon further inspection, it is noticeable that gestures induce the largest MSE loss in recovery.This is likely because the hands of the subject, when compared with the subject's body, are closer to the UE, making the sensing results more sensitive to hand movements.The higher sensitivity to gestures introduces more complicated time-frequency patterns to the input spectrogram, naturally lowering the accuracy of the sparse recovery.On the contrary, respiration induces the least MSE loss because of its relatively stable and periodic style, which results in more regular patterns in the spectrogram and hence facilitates sparse recovery.

Comparison between CSI and BFI.
Since UL-CSI and DL-CSI are symmetric, we refrain from comparing them but rather combine their outcomes and analysis in the following.
To further analyze the brief observations made in Section 3.3, we perform sensing on a subject carrying out Since the timedomain results are consistent with those shown in Figure 9, we do not show such results again for brevity.
We further investigate the fluctuations of the BFI and CSI signals by calculating the standard deviations of the detrended signals over periods of 0.1 seconds, as shown in Figures 12(a   (c) Activity.(e) Gesture.spectrum of the CSI and BFI in Figures 12(d) to 12(f).One may readily observe that while CSI preserves the respiration signal and presents a smooth spectrum, it is too sensitive to rapid and large-scale motions and results in excessive power in high frequencies for gesture and activity.In comparison, BFI effectively suppresses high-frequency components, while its response to low-frequency subtle movements (e.g., respiration) is less pronounced.
To explain these phenomena, it is worth noting that the phase of CSI is directly related to relative displacement, making it sensitive to small-scale movements (e.g., respiration).However, large-scale movements cause abrupt phase changes that cannot be captured by insufficient sampling, resulting in irregularities in the CSI signal.As explained in Section 3.3, the BFI-based sensing strategy only captures the relative directional changes from the subject to the AP: if one deems the conversion from CSI to BFI as "low-pass" filtering, it would be natural to expect fewer variations but also lowered strength in the resulting signals.This property is particularly beneficial for a future study on subjects carrying smartphones on their bodies for continuous vital signs monitoring [13,71], as BFI sensing may filter out body movement interference.

Case-I: Respiration Monitoring
We conduct experiments to monitor multi-person respiration using the setup described in Section 4.2.After obtaining sparse recovery results, we use a 3-layer CNN to extract the respiration rate and measure the respiration rate error as | E −  A |, where  E is the estimated respiration rate and  A is the actual respiration rate.Figure 13(a) showcases the respiration waveforms obtained by MUSE-Fi and the baseline method.One may clearly observe that MUSE-Fi recovers the respiration waveforms effectively, whereas the baseline method only captures a noisy signal mixture contributed by multiple subjects.We further assess the accuracy of respiration rate estimation of both MUSE-Fi and the baseline in Figure 13  respiration rate errors less than 1bpm.In contrast, the baseline exhibits a median and mean respiration rate error of 7 and 8bpm, respectively, making it almost useless in multiperson scenarios.
To further understand the performance difference, we present the spectrograms of MUSE-Fi and the baseline method in Figures 14(a)-14(h) and 14(i), respectively.We can observe that MUSE-Fi recovers a clear signal around the ground truth frequency thanks to the near-field domination.Moreover, we let the subjects sequentially hold their breath for 20s, and the correspondence between the breath-holding periods and signal interruption (whose boundary is denoted by two triangular markers) on the spectrograms firmly proves that the respiration signals from different subjects are well separated.On the contrary, the baseline method fails to distinguish respiration from multiple subjects, resulting in a noisy spectrogram where no accurate respiration rate can be obtained.We further employ the average spectral entropy [48] of the normalized spectrogram to measure the residual uncertainty in determining respiration rate.Specifically, Figure 14(j) indicates a 2.4bit entropy for the baseline, much higher than the 1.2 bit entropy of MUSE-Fi.Intuitively, the variety of potential respiration rates represented by the spectrogram increases exponentially with its spectral entropy.Therefore, this halved entropy value implies that the respiration rate decision of MUSE-Fi can be much more precise than that of the baseline, thus explaining our result in Figure 13

Case-II: Gesture Detection
We also conduct experiments on gesture detection, and summarize the statistics in Figure 15(a).One may readily observe that MUSE-Fi achieves a mean test accuracy of more than 98%, while that of the baseline is only 57%.We further inspect the confusion matrices of MUSE-Fi and the baseline in Figures 15(b) and 15(c), respectively.The confusion matrices reveal that MUSE-Fi can correctly classify most gestures, while the baseline often confuses one gesture with others.Specifically, the circle (CR) and slide (SL) gestures are the most confusing pair for the baseline, as they both involve moving one's hands smoothly over the phone, which may appear similar to the baseline in the far field, but are readily differentiable by the near-field MUSE-Fi.The baseline's inferior performance can also be attributed to its inability to disentangle signals from interference caused by the moving hands of multiple people, eventually causing its unacceptable detection behavior (i.e., 39% lower than MUSE-Fi).These results evidently confirm MUSE-Fi's effectiveness in resolving multi-person gesture detection for real-world applications.

Case-III: Activity Recognition
We further conduct experiments on activity recognition, with statistics summarized in Figure 15(a); MUSE-Fi's mean accuracy of more than 98% is doubled of the baseline's that drops by 8% compared with the gesture detection task.This performance degradation is attributed to the greater interference induced by large-scale and rapid human activities.We further inspect the confusion matrices of MUSE-Fi and the baseline in Figures 15(d) and 15(e), respectively.One may readily observe that the accuracy for all 6 activities is above 0.98 for MUSE-Fi, while the baseline's accuracy is all less than 0.52 (with SU bearing the worst accuracy of 0.33).The results from all three case studies have successfully demonstrated a great potential to realize a long-standing vision for Wi-Fi human sensing: multiple people sitting around a table (e.g., holding a meeting), while leveraging contactless sensing to accomplish diversified tasks with the support of their respective smartphones and only one Wi-Fi AP.

Extended Experiments and Discussions
To prove the generalizability of MUSE-Fi, we evaluate it in another practical scenario, where the subjects carry their smartphones inside their pockets, hence the LoS paths between the AP and UEs are blocked.Here we focus on the gesture detection and activity recognition tasks, given their relevance to enabling the computer-human interfacing for XR applications.In Figure 16, we compare the sensing accuracy of MUSE-Fi in two scenarios: 1) the on-desk scenario with LoS condition as in Figure 10, and 2) the in-pocket scenario with non-LoS (NLoS) condition, where the sensing accuracy of MUSE-Fi are shown to be similar and higher than 92% for the both scenarios.This is because the Wi-Fi signals can diffract and bypass the boundary of body, while the condition for near-field domination effect still holds.
Based on the above case studies and extended experiments, we make the following discussions on MUSE-Fi's generalizability, potential applications, and key factors.
Generalizability.MUSE-Fi is capable of generalizing beyond current experimental setup because, firstly, environment dynamics have a small impact on MUSE-Fi due to the near-field domination effect; and secondly, environment layout changes typically manifests as additional biases to the CSI and can be removed during the normalization of SRA.
Potential Applications.With the physical separability guaranteed by the near-field domination effect, MUSE-Fi is scalable to ubiquitous Wi-Fi networks in daily life and can provide solutions for XR.Based on respiration monitoring results in Section 5.  subjects near distributed UEs, which can extend to AR/MR solutions for visualizing intrusion, vital signs of human, and operation status of machines.Besides, the results in Sections 5.3, 5.4, and 5.5 indicate its capability to accurately recognize individuals' gestures and activities.This means MUSE-Fi enables the sensing functionality of gesture detection and activity recognition to be integrated into Wi-Fi modules naturally carried by each individual, potentially reducing the cost, weight, and power consumption of VR/MR headsets for computer-human interaction.
Key Factor Analysis.The near-field domination effect assumes the existence of LoS UE-AP paths, yet MUSE-Fi is also robust to the LoS blockage by the subject's body as shown in Figure 16.Therefore, MUSE-Fi is effective in common practice where APs are located above the subjects.If LoS paths are blocked, and the signals travel via NLoS paths involving reflection in the environment, then the near-field domination condition can be naturally extended to the shortest NLoS paths, provided that these paths are clear of environment dynamics (e.g., other irrelevant motions) to avoid injecting interference directly into the received signals.In addition, MUSE-Fi efficiently handles the sparsity of realistic data traffic for online videos and meetings etc, while the highly sparse data traffic for idle UEs may be beyond recovery and lead to invalid sensing results.Finally, sensing security [23,36] and co-existing with other co-channel communication systems [32,33,63] should be handled in the future.
While the above proposals mainly focus on single-person scenarios due to the limited range resolution of existing Wi-Fi technology, recent research has explored the use of next-generation Wi-Fi technologies to overcome this challenge.ViMo [57] uses 60 GHz 802.11ad devices [45] with high bandwidth and a 32-element phased array to emulate a radar.Similarly, mmTrack [61] employs the same 802.11addevice for multi-person localization.However, the limited adoption and high cost of 802.11ad devices may hinder the goal of multi-person monitoring.It should be noted that MUSE-Fi shares part of its name with MUSE [52], but these two systems bear distinct objectives: whereas MUSE focuses on communication scheduling under MU-MIMO, MUSE-Fi targets multi-person sensing.
Other techniques than future Wi-Fi hardware may also help enhance human sensing.Widar2.0 [44] provides partial support for multi-person sensing by using multiple antennas to improve spatial resolution.Karanam et al. [27] use the magnitude measurements from an array of receivers to perform multi-person tracking.Lan et al. [31] employ metasurface antennas with varying beam patterns to perform multi-person activity recognition.Liu et al. [35] estimate multi-person respiration rates by analyzing CSI's power spectral density.PhaseBeat [60] and TR-BREATH [11] leverage root-MUSIC [46] to separate multi-person sensing signals, while Yang et al. [65] optimize transceiver deployment using the Fresnel zone model to reduce interference, but with the requirement of accurate subject location and fixed transceiver placement.MultiSense [67] treats multi-person sensing as a blind source separation problem and uses ICA [24] to extract waveforms.Last but not least, SPARCS [42] recovers the micro-doppler spectrum by using intrinsic sparsity of wideband mmWave channels.However, it does not fit for narrowband Wi-Fi systems operating at microwave band.

CONCLUSION
Taking an important step towards ubiquitous human sensing, MUSE-Fi has innovated in Wi-Fi multi-person sensing by addressing the major challenge of physically separating multiple subjects.Leveraging the near-field channel variation caused by a subject in close proximity to a Wi-Fi device, MUSE-Fi has demonstrated successful handling of multiperson sensing for respiration monitoring, gesture detection, and activity recognition.This success also stems from our two technical developments: i) an SRA to cope with realistic (intermittent) Wi-Fi traffic under multi-user scenarios, and ii) a study on the difference between CSI and BFI sensing.Our extensive evaluations have evidently confirmed that MUSE-Fi is a cost-effective alternative to radar-based systems that often require extra deployments.Moving forward, we believe that MUSE-Fi has significant potential to be extended into various applications, including healthcare, smart homes, and even security; we are also planning to deploy MUSE-Fi on larger scales so as to evaluate its performance in more diversified scenarios.

Figure 1 :
Figure 1: While each personal device uniquely identifies a person, the sensing signal (upon the person) offered by the identifying device within near-field overwhelms the interference from other persons.
that the contours of VIR S constitute a set of y t e x i t s h a 1 _ b a s e 6 4 = " I w a W s s 6 B m 9 o j f n x X l 3 P p z P W e m K k / c c o T k 4 X 7 8 8 0 6 D 5 < / l a t e x i t > I Feasible Region of < l a t e x i t s h a 1 _ b a s e 6 4 = " I w a W s s M k L k m Z x Z h R x 5 L + e + 8 A y 7 6 B m 9 o j f n x X l 3 P p z P W e m K k / c c o T k 4 X 7 8 8 0 6 D 5 < / l a t e x i t > I UE and < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 x w 4 6 k Z z 8 i S k p 8 N 0 4 X 1 k 9 L K j p d g = " > A A A C G X i c b V D L S g M x F M 3 4 r P V V d e k m W A R X Z a b 4 W h b c u K x o H 9 g O 5 U 4 m 0 4 Y m m S H J C K X 0 L 9 y J / o s 7 c e v K X 3 F l p p 2 F b T 0 Q O J x z X z l B w p k 2 r v v t r K y u r W 9 s F r a K 2 z u 7 e / u l g 8 O m j l N F a I P E P F b t A D T l T N K G Y Y b T d q I 4 a G 5 I N h Z 4 M g B b H z I g i t k / Y T I A B c T Y M I s 2 L G 8 x m m X S r F a 8 y 8 r F 3 X m 5 V s 1 j K 6 B j d I L O k I e u U A 3 d o j p q I I I k e k a v 6 M 1 5 c d 6 d D + d z V r r i 5 D 1 H a A 7 O 1 y 9 N i 6 E D < / l a t e x i t > S Contours of Contours of < l a t e x i t s h a 1 _ b a s e 6 4 = " M l e K p W O v / T T o k 8 J g V b i k 5 Q p t y 7 s = " > A A A C K n i c d V D L S s N A F J 3 4 t r 6 q r s T N Y B F c h a R N W 5 e C G 9 3

Figure 2 :
Figure 2: Feasible region of I and the contours of VIR levels for both S and I.
Standing up and sitting down.
Respiration of the middle subject.

Figure 4 :
Figure 4: Preliminary experiments.(a) Setting for the first two experiments.(b) Multi-person respiration sensing in action.(c) and (d) Multi-person asymmetric sensing with both respiration and activity.(e) Setting for the 3rd experiment.(f) Respiration (held or not) from the middle subject, i.e., Subject B in (e).

Figure 5 :
Figure 5: Frame arrival rates in terms of number of frames per 100ms versus the observation time when (a) one or (b) two users stream 1080p videos.

Figure 6 :
Figure 6: Data transformation pipeline of MUSE-Fi.which means the UL-BFI sensing strategy faces the severest data sparsity.Consequently, MUSE-Fi needs to be capable of recovering continuous channel variation from a CSI time series with sparse samples.

Figure 7 :
Figure 7: The structure of TCN-AE in MUSE-Fi.

Figure 8 :
Figure 8: Channel variation due to subject motion: dashed and filled circles respectively represent the original subject position and that after motion.

Figure 9 :
Figure 9: CSI vs. BFI in the time domain.

Figure 10 :
Figure 10: Experiment scene (a) and layout of subject arrangement (b) for all three case studies.The 15cm UE-subject distance is only meant to indicate a nearfield layout rather than be fixed to that given value.
a) displays how the MSE losses of the recovery for all three categories vary with the amount of missing slices, clearly showing the MSE losses for respiration, gesture, and activity as below 2 × 10 −3 , 5 × 10 −3 , and 7 × 10 −3 , respectively.Given the normalized spectrogram data, these resulting MSE values are sufficiently low to indicate successful recovery, thus validating the effectiveness of SRA.We also provide, in Figure11(b), examples of recovered spectrograms.It is evident that SRA successfully recovers a significant portion of the input spectrogram, albeit miss a few minor details.
Illustration of sparse recovery.
) to 12(c).The figure reveals that BFI signals are more stable than CSI signals, but this stability comes at the cost of reduced sensitivity.We also examine the powerStd [degree]

Figure 12 :
Figure 12: Comparing CSI and BFI in terms of standard deviations (a)-(c) and power spectrum (d)-(f).
(b); the results reveal that MUSE-Fi achieves accurate respiration monitoring with both median and mean

Figure 13 :
Figure 13: Comparison between MUSE-Fi and the baseline in terms of respiration sensing.

Figure 14 :
Figure 14: Comparison and analysis on MUSE-Fi and the baseline, in terms of the respiration spectrograms.
2 MUSE-Fi can track subtle motions of

Figure 16 :
Figure 16: Comparison between the sensing accuracy of MUSE-Fi under LoS and NLoS conditions.

ACKNOWLEDGEMENT
This research is support by National Research Foundation (NRF) Future Communications Research & Development Programme (FCP) grant FCP-NTU-RG-2022-015.