Using Federated Learning and Channel State Information-Based Sensing for Scalable and Realistic At-Home Healthcare

This paper explores the use of federated learning in a realistic household employing existing infrastructure to add new devices and locations by rotating the role of the transmitter among smart devices in a multi-person scenario. Current solutions employ channel state information-based sensing for health care monitoring in various ways to propagate knowledge efficiently; however, these solutions often consider (i) ideally placed devices in (ii) single-participant scenarios and (iii) do not consider the different roles of these devices in a network. Data is collected from four smart devices in a household, assuming three participants, one of which is monitored and the other two function as noise, are assigned to perform activities to replicate a realistic household scenario. Insights are provided on using federated learning in realistic at-home health care when adding a new activity location and client devices, both transmitter-only and full communication devices. Results indicate new devices and locations can quickly be adopted with less data by the federated model without intensive retraining, even in multi-person environments, when doing extensive pre-training.


INTRODUCTION
Wi-Fi channel state information is commonly used to unobtrusively monitor human activity recognition and physiology to facilitate smart cities and good health and well-being [14] besides the more established methods such as computer vision, wearable motion sensors, and acoustic-based methods [22].Channel state information shows the multipath propagation between a transmitter and receiving antenna at the receiver side; it is a phenomenon caused by environmental influences on the signal, such as scattering, absorbing, and reflecting.These environmental influences include humans, and when monitoring the changes in the channel state information over time, the activities of them.This truly unobtrusive approach has proven valuable in numerous healthcare applications, such as recognizing human activities [1,6], monitoring vital signs [2,19], tracking sleep patterns, and localizing individuals [5,23].Therefore, it is an interesting solution to consider in elderly or nursing homes, where continuous monitoring may be required, but existing solutions may not be a good fit for the target demographic.
However, due to the different propagation paths of radio waves, channel-state information-based sensing is highly sensitive to environmental changes and moving obstacles.Due to the unpredictable impact of these obstacles on channel state information, current solutions struggle with adapting new domains (devices, locations, participants) efficiently.This often results in requiring large training models, artificially generated data [20,21,26] or other approaches that depend on the pre-training of models [4,7,24] for any new domain.Additionally, as these solutions are based on existing data, there may be privacy concerns due to channel state information potentially containing privacy-sensitive data.A possible solution that mitigates the need for data generation and minimizes the need for training new models is federated learning [3,11], as models are aggregated in a distributed fashion while training.
Federated learning could increase security as only model information is shared between domains, meaning the privacy-sensitive information of participants is stored locally.However, it comes with increased training overhead to converge on a global model that new domains can quickly adapt.More specifically, while the application of channel state information-based sensing for human activity recognition in federated learning environments has shown promise for a single participant with activities in similar locations relative to the placed devices [11], several challenges impacting its practical implementation need to be explored before they may be adapted in real secure healthcare applications: • Existing infrastructure.Current state-of-the-art considers device locations with similar positions to the monitored activity, which does not accurately reflect real-life scenarios where wireless devices might be found in less ideal locations (in TV cabinets, as fridges, or on coffee tables).• Multi-person households.At the same time, current research has not considered additional humans additional humans (such as visiting relatives or informal caregivers), which are essentially moving obstacles.• Limited data required.Additionally, new Wi-Fi devices may join the network randomly, and it is crucial to add them as channel state information-based sensors efficiently, which also includes using as little data as possible.Additionally, exploring different combinations of transmitters and receivers may help prevent potential signal blockages caused by furniture rearrangement [8], additional people, or the monitored individual's movement.Leveraging existing hardware in diverse locations in a household offers unique insight by allowing observations from multiple angles, which can mitigate performance issues related to signal obstruction or increase robustness by focusing on a specific person's location.Additionally, not all devices may be able to participate in the training, but they could still function as transmitters (data generators).To this extent, federated learning is chosen as a valid option, as it nicely combines the desire to explore different combinations of devices while simultaneously generalizing the learning process.
Therefore, the main research question is: What is the impact of adding new locations and devices with different data availability using federated learning in multi-person households utilizing existing Wi-Fi infrastructure using channel state information-based sensing for human activity recognition in terms of accuracy and convergence time?.To answer this question, experiments were conducted in an actual apartment with three people performing different activities (one designated to be monitored, the potential patient), while the transmitter rotates among four different devices for three locations for human activity recognition.A scenario in Fig. 1 shows how different devices may collaborate in a household to sense a person.Different parameters are considered to test the performance of federated learning in multi-person households: i) the amount of pre-training done, ii) new locations and networked devices are excluded from training and added later with different levels, and iii) different levels of data availability.The results are aggregated, and their overall  1 -scores and convergence times are analyzed.Both their role as potential transmitters and transceivers is explored when adding devices.
The remainder of this paper is structured as follows: first, the state of the art is outlined and discussed (Section 2).Then, the data acquisition and methodology to replicate data analysis and federated implementation are described in Sections 3 and 4, respectively.The results are outlined in Section 5 and an overview of findings are discussed in Section 6.The paper concludes in Section 7.

STATE OF THE ART 2.1 Channel state information and human activity recognition
Channel state information is logged as the phase and amplitude of the received signal.The collective channel state information from individual antenna pairs is collected in a channel state information matrix H (Equation 1), which has the shape of   ×  ×  , where   1) Alice (yellow) is sitting on the sofa watching TV, but there are no devices nearby.This means she is in an uncovered location.
2) Bob (red) visits Alice and puts the phone on the table, and is connected to the network, which then synchronizes with the global model 3) Alice turns off the TV, which breaks a connection, but moves to a new location to converse with Bob, which is then covered by one of the temporary channels 4) As Bob leaves Alice, she picks up her laptop.The laptop can quickly join the federated network, as it is spatially closely related to the phone's location.
Figure 1: Short scenario showcasing the ideal use of federated learning to quickly allow new devices to function as sensors.
is the number of receiving antennas,   the number of transmitting antennas, and   the number of subcarriers.Every ℎ   ∈ H is a complex number denoted as  +  , where  and  represent the amplitude and phase, respectively.Due to the sensitivity of radio waves to changes in the environment and humans, the impact of human activities on the amplitude and phase can be used to fingerprint these activities.

Transferring cross-domain CSI knowledge
In recent years, channel state information has been widely used for human activity recognition and physiology monitoring [15,[17][18][19]23], but there are significant challenges in scaling this sensing across different domains.One common approach is transfer learning [4,7,24], where different features (or feature sets) or embedding spaces are extracted from the dataset and are used to generate other models for new locations or participants.However, this oftentimes still requires extensive model generation or finding and extracting relevant features.Another common approach is data generation through generative adversarial networks, or GAN [20,21,26], which uses existing datasets or features to generate artificial data that could correlate to different participants or locations and that can then be used to train or fine-tune existing models.However, GANs could be resourceintensive and still require additional training on the models.

Federated learning and channel state information
Federated learning could mitigate some of the downsides in human activity recognition using channel state information by sharing model knowledge.Hernandez et al. [10] achieved promising prediction accuracy while allowing nodes to train with limited data in an indoor environment, possibly classifying activities in unseen locations.The setup is for a fixed Wi-Fi transmitter and receiver pair that may be placed in any location, with the activity being performed in a relatively similar location compared to the devices.The results for this approach seem promising (0.90 in certain locations).
The model has a small memory footprint, meaning it could nicely integrate with less powerful hardware.However, the setup does not consider different types of devices, existing infrastructure, or multiple persons to test the system.

Challenges
As can be seen from state of the art, there is a need for more scalable and robust ways to propagate the knowledge of channel state information-based sensing that do not require extensive retraining or artificial data generation.Federated learning allows devices to collaborate without the need to collect, transmit, and/or process a large amount of data.However, current research on federated learning in combination with channel state information in realistic scenarios ignores realistic scenarios.This limits the use case for wide-scale adaptability.To progress towards a more wide-scale solution, this work explores how robust federated learning is in a life-like setting where wireless devices are placed in representable life-like positions (e.g., living room TV on its cabinet, smartphone on a table, smart refrigerator in a kitchen) in a multi-person household scenario.Therefore, this study's work explores the adaptability of new locations and devices in real-life multi-person households for different levels of data availability.

DATA ACQUISITION
The channel-state information must first be captured to explore the feasibility of federated learning and channel-state information in Table 1: Node identifiers and their corresponding locations, with the relevance of presence in the indoor environment.

Node
Location Relevance

𝑛 𝑇𝑉
In between the television and the armchair.
Smart TV, home assistant, casting device

𝑛 𝑡𝑎𝑏𝑙𝑒
On the table between the living room and kitchen.

𝑛 𝑘𝑖𝑡𝑐ℎ𝑒𝑛
On the kitchen countertop, next to the fridge.
Smart fridge or Wi-Fi-enabled smart kitchen device In the middle of the eHealth House elevated (simulating being mounted on the ceiling) Access point, router realistic households.This section describes the hard-and software used for capturing, outlines the experimental setup, participants and activities, and discusses the resulting dataset.

Hardware and software
The Linux CSI Tool [9] was combined with the Intel Ultimate Wi-Fi Link 5300 NIC with a centre frequency of 5.32 GHz.Two antennas were used for transmitting and receiving was done using three antennas (2 × 3 MIMO) with a packet transmission rate of 100 Hz.The rate of 100 Hz was chosen to easily capture all required details of the activities: human movement usually lies between 0 and 20 Hz, with daily activities between 0.3 and 3.5 Hz [12].Voluntary human movement generally does not exceed a frequency of 10 Hz.The nodes were not connected over a Wi-Fi network: the transmitter was put in the injector mode and broadcasted random packets, while the receivers were placed in monitor mode to listen to a specific MAC address (conforming to the 802.11n specifications).Due to the limitations of the Linux CSI Tool, 30 subcarriers are captured per measurement, meaning the shape of the channel state matrix H is 2 × 3 × 30, following   ×   ×   .The amplitude is calculated as the absolute of Figure 3: Visualization of the federated dataset    , where  is the data received by all nodes  ,  the subset of received data per locations,  the individual sets of all main activities, and  the subset  \ {} of received information from a node .Note that  ∉ , as a receiver  cannot receive data from itself.Additionally, note that all element in  , ,  have the same type of branches, but with their respective received elements.

Experimental setup
Experiments were conducted in a replicated, fully functioning apartment (the e-Health House at the University of Twente), where participants performed activities in different realistic locations, with the nodes in different locations inside the area apartment and receiving packets from which the channel state information can be extracted.In total, four nodes were used to measure the channel state information over time, and they were placed in four locations where one might expect actual wireless devices.Figure 2 shows a Lidar scan of the eHealth House, with node identifiers indicating the location of devices.Table 1 lists the identifiers and locations of each node, together with its relevance in a realistic environment.

Participants and activities
The experiment involves four activities (sitting/standing, eating/drinking, working, resting) inspired by the Activities of Daily Living (ADL).Three participants were asked to enter the apartment per run, one per location.One of these participants is designated to perform all activities while all nodes have had their chance to transmit.The other two participants perform randomly assigned activities (to generate noise), ensuring that each combination of activities occurs only once for each set of three participants to prevent the learning algorithm from recognising activity combinations instead of individual activities.Each run took twelve minutes per activity for all nodes, meaning the whole activity recognition part of the experiment took 48 minutes per set of participants.

Existing network
Train global model ) are aggregated using FedAvg [25].When either of the criteria is reached, the new client (  ) is added with a specific split of the data (  ).Note that in the second part, all updates are shared immediately after training  = 1,   = 1 until all nodes report   ≥ 0.80 or convergence was reached.

Resulting datasets
The resulting federated dataset    is visualized in Figure 3.In the resulting data,  ∈  may be defined as shape of 180×100×2×3×30, where 180 is the time in seconds for each transmission, 100 is the transmission rate per second, and 2 × 3 × 30 the aforementioned channel state matrix H, resulting in approximately 3,240,000 elements per , with the total being ≈ 466, 560, 000 elements.Here, each element is a complex number converted into the amplitude by taking its absolute value, as outlined in Section 3.1.In total, two of these datasets were collected and combined into a single dataset.Removing a branch location  from  ( \ {  }) or a node  from  ,  ( \ {  }) represents training without location  or node , respectively.Note that node  is removed from both  and  to properly simulate no nodes receiving any data from node , while node  may not receive any data.

METHODOLOGY 4.1 Data preprocessing
While packets were transmitted at 100 Hz, packet loss may occur at the receiver side, resulting in non-uniform inter-packet arrival times.To guarantee a balanced dataset with sufficient information to allow a network to learn to perform accurate classification, the data was first interpolated to account for missing data points [16].This is achieved utilizing linear interpolation.Signals with a lower sampling rate than desired can be replicated appropriately using interpolation if the sampling rate equals the Nyquist rate or higher.For the activities performed in this research, the Nyquist rate is around 5 Hz [13].After interpolation, data was normalized and a rolling filter was used to mitigate the effect of signal interference and environmental noise, which could cause sudden changes in amplitude not caused by the activity [10].

Neural network and federated model
The study employs a dense neural network for classification, utilizing a 2D input array of shape 180 × 100, derived from the channel state matrix H to accommodate a transmission rate of 100 Hz [10].The network architecture comprises three fully connected layers with 100 hidden units each, using ReLu activations and dropout of 0.5.The model aims to minimize mean-square error loss through stochastic gradient descent.While activity regularization is excluded due to limitations in the used framework,  2 kernel regularization is applied in a federated learning context to minimize weights via FedAvg [3] optimization.The models are aggregated into a global model following where   is the global model,   model belonging to a node , and [] the weights belonging to a model.A learning rate of 10 −5 , and the server learning rate of 1.For optimisation, a split of 60% of the original data is used, whereas 40% is preserved for testing the model.

Introducting new locations and clients
Federated learning is applied to introduce new (unseen) locations and clients to a pre-trained network after different training durations to analyze its performance.The pre-trained model is trained on a subset of locations ( \ {  }) or nodes ( \ {  }) until  score ≥ 0.8, to prevent overfitting, allowing spatially distributed devices to more easily fine-tune their model afterwards.The model is trained for different budgets defined as  =   , where  is the number of local epochs and   is the number of federated rounds.
It should be noted that the data associated with a new location or node only consists of data that was not collected after it was added to the network; rather, the data was collected simultaneously with the data collected for the pre-trained model.Therefore, the methodology only serves as a proof of concept and represents a real-life environment only up to a certain point.
Adding a new location.When  \  reaches  1 ≥ 0.8 when training on a from a subset of existing locations,  joins federated learning rounds.Retraining involves evaluating for different  and   models until all devices using the federated model reach   ≥ 0.8 after the new location joins.
Adding a new client.Training new client data involves training models with varying local epochs and federated rounds for | | − 1 nodes.Removing a node is done by excluding its transmitted and received data from the remaining nodes and is denoted as  \ {} where  is the removed node.Each node is added in two different ways: • Transmitter only.The global model not shared with the new device; it functions solely as a data generator for other devices, such as smartphones, contributing valuable insights without burdening itself with neural networks or data collection.• Transceiver.The global model is shared with the new device to fully participate in the network; the new device is integrated into the network as a full communication device (e.g., other access points or smart devices), allowing it to train in a federated manner, facilitating communication.
Figure 4 shows a schematic overview of using the client set to train on data from new clients.Note that for all scenarios after a new device or node is added, all weights are shared after every epoch (  = 1).is slightly longer, as outlined before.As noted before,   scores lowest while taking the longest to convergence.This could be because it adds significantly different information into the network: the other three nodes are close to locations where activities are performed (thus more immediately impacting the signal), while there was no activity performed to   , and it was placed significantly higher than the other three nodes to replicate a ceiling-mounted device.
For ∀ ∈  , it appears all  ≥ 0.5 results in comparable  scores for  = 2500, akin to adding a new location.Likewise, the convergence time takes longer, following a pattern of   = 1   =1 .For all added nodes except   ,  = 0.2 results in a comparable   , at the lowest   = 0.77 for  ℎ .This indicates significantly less data is needed for convergence, though the time (in epochs) to convergence is increased by a factor 3 on average.This implies the local number of epochs and required additional data could be balanced depending on the available data and training capacity of a node.
Transceiver.When a transceiver is added (Fig. 7), the main observation is that it follows similar patterns to adding a new transmitter, except there appears to be a scaling factor for the convergence time in epochs of on average 2.14 averaged over  = 0.5, 0.75, 1 for convergence times that result in a slightly lower restored performance of   ≥ 0.75.This is likely because the network needs to adapt to both the newly added data into the federated network on other clients and a newly added client simultaneously, compared to the transmitter where only new data is injected at other clients.This implies that, unlike adding a client as a transmitter-only, it may be beneficial to continue federated learning (rather than just training locally after receiving the global model).This also becomes apparent looking at the restored performance: only for  ≥ 5000 are added clients   and    able to restore the performance to   ≥ 0.8.
Unlike adding a transmitter, for certain nodes, the model seems to be overfitted for  ≥ 5000 for    and  ℎ for  = 0.5, 0.75.The total number of additional epochs ranges between 10 − 20 epochs (+10 − 30% when compared to the last non-overfitted entry).However, it should be noted that, realistically, no pre-training of  ≥ 5000 will happen due to the resource consumption and network overhead required.

DISCUSSION
New location.A comparison has been made between the training of existing models on data from various unseen locations.The models' performance is similar for each omitted location for the same data, with  = 2500 being the lowest number needed to recover the performance.Pre-training the models extensively improves the  1 -score as expected, but different locations affect the model's learning curve and may inherently limit its growth.Adding new locations to be recognized within a household can overall be dealt with efficiently by using existing models if these are sufficiently trained beforehand, which could mean either a) a longer existing network that occasionally updates its weights or b) fewer powerful devices that can be used to off-load the training to so that less powerful devices only have to focus on personalization.
New Devices.It is revealed that the location of nodes compared to the activity, their proximity to sharp corners, or their height impacts the accuracy: situations where   is added require more convergence time while resulting in lower performance, indicating that the data added and model tuned by this client are inherently noisier.This study implies that in federated learning, the physical location of a new client impacts the network's convergence time and efficiency, with some clients accelerating adaptation to new data due to closer data similarity of other devices.The findings imply the possibility of using federated learning for new devices, though further investigation is required to optimize client placement for improved performance outcomes or automatic client selection.
Limited data.This study finds that while using more data typically leads to faster convergence in model training, 75% of the total data often suffices to achieve equivalent performance, and in certain cases, even 50 or 25% can be adequate.The balance between data volume, performance, and convergence speed must consider computational and memory constraints and timing needs.Extensive data collection was limited by the practicalities of time, suggesting that an exhaustive exploration of variables (activities, locations, and interference levels) would offer more precise insights but at the cost of significant time investment for each participant.
Device availability.The research presented in this paper assumes that indoor environments contain sufficient computing devices to establish a federated model that may be sufficiently scaled up for other tasks.This study also implies that extensive retraining is required, which may necessitate the need for either devices that are on for extended periods of time (and can handle training) or a lower number of more powerful computers (such as servers or more powerful computers).

CONCLUSION AND FUTURE WORK
This main research question of this paper was: What is the impact of adding new locations and devices with different data availability using federated learning in multi-person households utilizing existing Wi-Fi infrastructure using channel state information-based sensing for human activity recognition in terms of accuracy and convergence time?While the proposed solution takes longer to pre-train (i) compared to local approaches in state of the art, the federated approach shows that a minimum amount of epochs are required when personalising on unseen locations and nodes (ii), allowing for dynamic environments with changing activity location and device participation, assuming the training can be done in static, more powerful devices (such as computers).Finally, devices can start joining a network sooner, as only a fraction of the data is needed (iii).Overall, federated learning may be used to efficiently add new devices and locations in environments where one participant needs to be monitored, among others, when using existing Wi-Fi infrastructure after enough pre-trained.In the future, research should consider looking into ways to automatically select which devices could be paired to minimize the pre-training needed.

Figure 2 :
Figure 2: Lidar scan of the apartment (the e-Health House at the University of Twente) with all nodes (  ,   ,  ℎ and   , * means behind object).Locations where activities are performed are in yellow (  ,  ℎ and   ).

Figure 4 :
Figure 4: Knowledge flow when adding a new client.A global model   is trained using  − 1 clients (  −1 ) for different numbers of epochs and federated rounds.The weights of the models (  []) are aggregated using FedAvg[25].When either of the criteria is reached, the new client (  ) is added with a specific split of the data (  ).Note that in the second part, all updates are shared immediately after training  = 1,   = 1 until all nodes report   ≥ 0.80 or convergence was reached.

1 Figure 5 :
Figure 5:  1 -score (top) and convergence time (in epochs, bottom) visualized per budget  =  *   for different locations (TV, Table,and Kitchen for left, middle, and right, respectively) and different data availability.

Figure 7 :
Figure 7:  1 -score (top) and convergence time (in epochs, bottom) visualized per budget  =  *   for different transceivers (from left to right: TV, Table, Kitchen, and AP) and different data availability ().