Multimodal Indoor Localisation in Parkinson’s Disease for Detecting Medication Use: Observational Pilot Study in a Free-Living Setting

Parkinson’s disease (PD) is a slowly progressive, debilitating neurodegenerative disease which causes motor symptoms including gait dysfunction. Motor fluctuations are alterations between periods with a positive response to levodopa therapy ("on") and periods marked by re-emergency of PD symptoms ("off") as the response to medication wears off. These fluctuations often affect gait speed and they increase in their disabling impact as PD progresses. To improve the effectiveness of current indoor localisation methods, a transformer-based approach utilising dual modalities which provide complementary views of movement, Received Signal Strength Indicator (RSSI) and accelerometer data from wearable devices, is proposed. A sub-objective aims to evaluate whether indoor localisation, including its in-home gait speed features (i.e. the time taken to walk between rooms), could be used to evaluate motor fluctuations by detecting whether the person with PD is taking levodopa medications or withholding them. To properly evaluate our proposed method, we use a free-living dataset where the movements and mobility are greatly varied and unstructured as expected in real-world conditions. 24 participants lived in pairs (consisting of one person with PD, one control) for five days in a smart home with various sensors. Our evaluation on the resulting dataset demonstrates that our proposed network outperforms other methods for indoor localisation. The sub-objective evaluation shows that precise room-level localisation predictions, transformed into in-home gait speed features, produce accurate predictions on whether the PD participant is taking or withholding their medications.


INTRODUCTION
Parkinson's disease (PD) is a debilitating neurodegenerative disease affecting around 6 million people worldwide.It is characterised by a variety of movement-related (motor) symptoms, such as slowness of movement, rigidity and gait dysfunction [23].A complication of the mainstay medication used to treat PD, levodopa, is that patients start to experience motor symptom fluctuations related to medication timings.When levodopa is first started, patients experience a smooth and prolonged therapeutic response.As disease progresses (and in a substantial proportion of patients within the first five years), patients start to "wear off" from their medications before the next dose, causing a reemergence of parkinsonian symptoms including slowness of gait.These symptom fluctuations impair patients' quality of life and often necessitate changes in medication regime.Motor symptoms can become severe enough to hinder the subject's gait and movement around their own house [49].As a result, the subject may be more likely to stay in one room; once they move, they may typically need more time to transition between rooms.Such outcomes could be used to detect ON and OFF medication motor fluctuations in PD and to inform clinicians and patients of such symptoms.Furthermore, a sensitive and accurate ecologically-validated biomarker of PD progression is currently lacking [15], resulting in multiple failures of clinical trials testing putative neuroprotective therapies in PD [11,21,27].Gait parameters are sensitive to disease progression symptom change in unmedicated early-stage PD [52] and show promise as markers of disease progression [44], making measuring gait parameters potentially of use in clinical trials of disease-modifying interventions which typically recruit recently diagnosed patients [42].Clinical evaluation of PD is normally undertaken in an artificial clinic or laboratory environment where only a snapshot view of the individual's motor function can be captured.Constant monitoring could capture symptom progression, including motor fluctuations, and sensitively quantify them over time [36].
Although PD symptoms including gait and balance parameters can be measured continuously at home (with varying degrees of reliability and accuracy) by wearable devices containing inertial motor units (IMUs) or smartphones [12,14,37,38], this data does not show the context in which the measurements are taken (for example where someone is at the time of the symptom).Knowing which room someone is in (indoor localisation) could add valuable holistic information to the interpretation of symptoms of PD.For example, symptoms like freezing of gait [2] and turning in gait [18] vary according to the nature of the setting the person is in, so knowing where someone is could help predict such symptoms or interpret their severity.Furthermore, knowing how much time someone spends alone or with others in a room is a step towards understanding their societal participation [25], which affects quality of life in PD [13].Localisation could also add valuable information in the measurement of other behaviors such as non-motor symptoms such as urinary function [16,20] (e.g.how many times someone visits the toilet room overnight).
To perform indoor localisation in home environments, IoT-based platforms with sensors capturing various modalities of data combined with machine learning can be used to provide an unobtrusive and continuous localisation [39].Typically, many of these techniques take advantage of the radio-frequency signals, the Received Signal Strength Indication (RSSI), emitted by wearables and measured at access points (AP) throughout a home.These signals to estimate the user's position from the perceived signal strength, thereby creating radio-map features for each room [22].To provide more accurate localisation, accelerometer data measured by wearable devices, equipped with RSSI measured at receivers, can also be used as it provides a means to distinguish different activities (e.g., walking vs standing).Furthermore, as some activities are tied to particular rooms (e.g.stirring a pan on the hob is very likely to be in a kitchen), accelerometer data may enrich RSSI in differentiating adjacent rooms, which RSSI alone may struggle with [35].
If accelerometer data are to provide extra features for separating adjacent rooms, greater consideration must be given to data generalisation across different PD patients.As PD is a heterogeneous disease, the symptoms experienced and their severity may vary from one patient to another [19].These severe symptoms, such as tremor, may affect the generalisation of accelerometer data which are prone to bias and accumulated errors [34], especially those worn on the patient's wrists, which is a common and well accepted placement location [17].Naively combining the accelerometer data with the RSSI may impair the performance of indoor localisation due to differing levels of tremor manifesting in the acceleration signal.In this work, we make two main contributions.
(1) We describe the utilisation of RSSI enriched by the accelerometer data to perform room-level localisation.Our proposed network 1 intelligently chooses accelerometer features which may improve the RSSI performance in performing indoor localisation.To properly evaluate our proposed method, we use a free-living (a person living their life freely, without external intervention) dataset created by our group, where the movements and mobility are greatly varied and unstructured as expected in real-world conditions.Our evaluation on such a unique dataset, which includes subjects with and without PD, demonstrates that our proposed network outperforms other approaches in all cross-validation categories.(2) We also demonstrate how the accurate room-level localisation predictions can be transformed into in-home gait speed biomarkers (e.g.number of room-to-room transition, roomto-room transition duration) which can be used to effectively classify the OFF or ON medication state of a PD patient from this pilot study data.

RELATED WORK
There has been substantial work using home-based passive sensing systems to assess how the activities and behaviour of people with neurological disease (mainly cognitive dysfunction) change over time [33,55].There is very limited work assessing room use in the home setting in people with Parkinson's.However, gait quantification using wearables or smartphones is an area where a significant amount of work has been done (with several systematic reviews such as these [4,7]).Cameras can detect also Parkinsonian gait and some gait features including step length and average walking speed [46].Time of flight devices (which measure distances between the subject and the camera [24]) have been used to assess medication adherence through gait analysis [45].From free-living data, one approach to gait and room use evaluation in home settings is by emitting and detecting radio waves to non-invasively track movement.Gait analysis using radio wave technology shows promise to track disease progression, severity and medication response [30].However, this approach cannot identify who is doing the movement and also suffers from technical issues when the radio waves are occluded by another object.Much of the work done so far using video to track PD symptoms has focused on the performance of structured clinical rating scales during telemedicine consultations as opposed to naturalistic behaviour [41], and there have been some privacy concerns around the use of video data at home [48].
RSSI data produced from wearable devices is a type of data with fewer privacy concerns; it can be measured continuously and unobtrusively over long periods of time to capture real-world function and behavior in a privacy-friendly way.In indoor localisation, fingerprinting using RSSI is the typical technique used to estimate the wearable (user) location by using signal strength data representing a coarse and noisy estimate of the distance access point from the wearable [5,40].RSSI signals are not stable, they fluctuate randomly due to shadowing, fading and multi-path effects.However, many techniques have been proposed in recent years to tackle these fluctuations, and, indirectly, improve the localisation accuracy.Some of the works [54] utilise deep neural networks (DNN) to generate coarse positioning estimates from RSSI signals, which are then refined by a hidden Markov model (HMM) to produce a final estimate location.Other works, [22], try to utilise a time-series of RSSI data and exploit the temporal connections within each access point to estimate room-level position.A CNN is used to build localisation models to further leverage the temporal dependencies across time-series readings.
It has been suggested that we cannot rely on RSSI alone for indoor localisation in home environments for PD subjects due to shadowing rooms with tight separation [32,35,39].Sansano et al. combine RSSI signals and inertial measurement unit (IMU) data to test the viability of leveraging other sensors in aiding the positioning system to produce a more accurate location estimate [39].Classic machine learning approaches such as Random Forest (RF), Artificial Neural Network (ANN), k-Nearest Neighbour (k-NN) are tested, and the result shows that the RF outperforms other methods in tracking a person in indoor environments.Poulose et al. combine smartphone IMU sensor data and Wi-Fi received signal strength indication (RSSI) measurements to estimate the exact location (in Euclidean position X, Y) of a person in indoor environments [35].The proposed sensor fusion framework uses location fingerprinting in combination with a pedestrian dead reckoning (PDR) algorithm to reduce the positioning errors.
Looking at this multi-modality classification / regression problem from a timeseries perspective, there has been a lot of explorations in tackling a problem where each modality can be categorised as multivariate timeseries data [8,28,51].LSTM and attention layers are often used in parallel to directly transform raw multivariate time series data into low-dimensional feature representation for each modality.Later, various processed is done to further extract correlations across modalities through the use of various layers (e.g.concatenation, CNN layer, transformer, self-attention) [28,51].Our work is inspired by Sansano-Sansano et al. [39] where we only utilise accelerometer data to enrich the RSSI, instead of utilising all IMU sensors, in order to reduce battery consumption.In addition, unlike Sansano-Sansano et al. who stop at predicting room locations, we go a step further and use room-to-room transition behaviours, as features for a binary classifier predicting whether people with PD are taking their medications or withholding them.

COHORT AND DATASET
Dataset.This dataset was collected using wristband wearable sensors, one on each wrist of all participants, containing tri-axial accelerometers2 and 10 Access Points (APs) placed through the residential home (see Fig. 1 for house layout and AP location), each measuring the RSSI [26].The wearable devices wirelessly transmit data using the Bluetooth Low Energy (BLE) standard which can be received by the 10 APs.Each AP records the transmitted packets from the wearable sensor which contains the accelerometer readings sampled at 30Hz, with each AP recording RSSI values sampled at 5 Hz.
The dataset contains 12 spousal/parent-child/friend-friend pairs (24 participants in total) living freely in a smart home for five days.Each pair consists of one person with PD and one person as the healthy control volunteer (HC).This pairing was chosen to enable PD vs HC comparison, for safety reasons and also to increase the naturalistic social behaviour (particularly amongst the spousal pairs who already lived together).From the 24 participants, five females and seven males have PD.The average age of the participants is 60.25 (PD 61.25,Control 59.25) and the average time since PD diagnosis for the person with PD is 11.3 years (range 0.5-19).
To measure the accuracy of the machine learning models, wallmounted cameras are installed in the ground floor of the house which capture red-green-blue (RGB) and depth data 2-3 hours daily (during daylight hours at times when participants were at home).The videos were then manually annotated to the nearest millisecond to provide localisation labels.Multiple human labellers used a widely available software called ELAN [1] to watch up to 4 simultaneously-captured video files at a time.The resulting labelled data recorded the kitchen, hallway, dining room, living room, stairs, and porch.The duration of labelled data recorded by the cameras for PD and HC is 72.84 and 75.31 hours, respectively, which provides a relatively balanced label set for our room-level classification 3 .Finally, to evaluate the ON/OFF medication state, participants with PD were asked to withhold their dopaminergic medications so that they were in the practically-defined OFF medications state for a temporary period of several hours during the study.Withholding medications removes their mitigation on symptoms, leading to mobility deterioration which can include slowing of gait.
Data pre-processing for indoor localisation.The data from the two wearable sensors worn by each participant were combined at each time point, based on their modality, i.e. twenty RSSI values (corresponding to 10 APs for each of the two wearable sensors), and accelerometry traces in six spatial directions (corresponding to the three spatial directions (x, y, z) for each wearable) were recorded at each time point.The accelerometer data is resampled to 5Hz to synchronise the data with RSSI values.With a 5-second time window and 5Hz sampling rate, each RSSI data sample has an input of size (25 x 20) and accelerometer data has an input of size (25 x 6).Imputation for missing values, specifically for RSSI data, is applied by replacing the missing values with a value that is not possible normally (i.e., -120dB).Missing values exist in RSSI data whenever the wearable is out of range of an AP.Finally, all time-series measurements by the modalities are normalised.
Data pre-processing for medication state.Our main focus is for our neural network to continuously produce room predictions which are then transformed into in-home gait speed features, particularly for persons with PD.We hypothesise that during their OFF medication state, the deterioration in mobility of a person with PD is exhibited by how they transition between rooms.These features include 'Room-to-room Transition Duration', and the 'Number of Transitions' between two rooms.'Number of Transitions' represents how active PD subjects are within a certain period of time, while 'Room-to-room Transition Duration' may provide insight into how severe their disease is by the speed with which they navigate their home environment.With the layout of the house where participants stayed (see Fig. 1), the hallway is used as a hub connecting all other rooms labelled, and 'Room-to-room Transition' shows the transition duration (in seconds) between two rooms connected by the hallway.The transition between (1) kitchen and living room, (2) kitchen and dining room, and (3) dining room and living room are chosen as the features due to their commonality across all participants.For these features, we limit the transition time duration (i.e. the time spent in the hallway) to 60 seconds to exclude transitions likely to be prolonged and thus may not be representative of the person's mobility.
These in-home gait speed features are produced by an indoorlocalisation model by feeding RSSI signals and accelerometer data from 12 PD participants from 6 a.m. to 10 p.m. daily which are aggregated into 4 hour windows.From this, each PD participant will have 20 data samples (four data samples for each of the five days), each of which contains six features (three for the mean of room-to-room transition duration, and three for the number of room-to-room transitions).There is only one 4-hour window during which the person with PD is OFF medications.These samples are then used to train a binary classifier 4 determining whether a person with PD is ON or OFF their medications.
For a baseline comparison to the in-home gait speed features, demographic features which include age, gender, years of PD, and MDS-UPDRS III score (the gold-standard clinical rating scale score used in clinical trials to measure motor disease severity in PD) are chosen.Two MDS-UPDRS III scores are assigned for each PD participant; one is assigned when a person with PD is ON medications, and the other one is assigned when a person with PD is OFF medications.For each in-home gait speed feature data sample, there will be a corresponding demographic feature data sample which are used to train a different binary classifier to predict whether a person with PD is ON or OFF medications.
Ethical approval.Full approval from NHS Wales Research Ethics Committee 6 was granted on 17 ℎ December 2019, and Health Research Authority and Health and Care Research Wales approval confirmed on 14 ℎ January 2020; the research was conducted in accord with the Helsinki Declaration of 1975; written informed consent was gained from all study participants.In order to protect participant privacy supporting data is not shared openly.It will be made available to bona fide researchers subject to a data access agreement.If you wish to apply to access this data, please email data-bris@bristol.ac.uk.

METHODOLOGIES AND FRAMEWORK
We introduce Multihead Dual Convolutional Self Attention (MD-CSA), a deep neural network that utilises dual modalities for indoor localisation in home environments.The network tackles two challenges that arise from multimodality and time-series data: ( The MDCSA architecture, shown in Figure 2, addresses the aforementioned challenges through a series of neural network layers which are described in the following sections.

Modality Positional Embedding
Due to different data dimensionality between RSSI and accelerometer, coupled with the missing temporal information, a linear layer with a positional encoding is added to transform both RSSI and accelerometer data into their respective embeddings.Suppose we have a collection of RSSI signals where W  ∈ R  × , and b  ∈ R  are weight and bias to learn,  is the embedding dimension, and − →   ∈ R  is the corresponding position encoding at time .

Locality Enhancement with Self-Attention
As it is time series data, the importance of an RSSI or accelerometer value at each point in time can be identified in relation to its surrounding values -such as cyclical patterns, trends, or fluctuations.Utilising historical context that can capture local patterns on top of point-wise values, performance improvements in attention-based architectures can be achieved.One straightforward option is to utilise a recurrent neural network such as a long-short term memory (LSTM) approach.However, in LSTM layers, the local context is summarised based on the previous context and the current input.Two similar patterns separated by a long period of time might have different context if they are processed by the LSTM layers [3].We utilise a combination of causal convolution layers and self-attention layers which we name Dual Convolutional Self-Attention (DCSA).The DCSA takes in a primary input x1 ∈ R  × and a secondary input x2 ∈ R  × and yields: where  (.) is Gated Residual Network (GRN), introduced in [28], to integrate dual inputs into one integrated embedding, (.) is a standard layer normalisation, (.) is a scaled dot-product self-attention introduced in [47], Φ  (.) is a 1D-convolutional layer with a kernel size {1,  } and a stride 1, ,  V ∈ R  × are weights for keys, queries and values of the selfattention layer, and  is the embedding dimension.Note that all weights for GRN are shared across each time step .

Multihead Dual Convolutional Self-Attention
Our approach employs a self-attention mechanism introduced in [47] to capture global dependencies across time steps.It is embedded as part of the DCSA architecture.Inspired by Vaswani et al. [47] in utilising multihead self-attention, we utilise our DCSA with various kernel lengths with the same aim: allowing asymmetric long-term learning.The multihead DCSA, shown as part in Figure 2, takes in two inputs x1 , x2 ∈ R  × and yields: with where Φ  (.) is a 1D-convolutional layer with a kernel size {1, } and a stride , are weights for keys, queries and values of the self-attention layer, and Ξ  1 ,...,  (.) concatenates the output of each   (.) in temporal order.For regularisation, a normalisation layer followed by a dropout layer is added after Equation 4.

Final Layer and Loss Calculation
We apply two different layers to produce two different outputs during training.The room-level predictions are produced via a single conditional random field (CRF) layer in combination with a linear layer applied to the output of Eq. 7 to produce the final predictions as ŷ where W  ∈ R  × , and b  ∈ R  are weight and bias to learn,  is the number of room locations, and  =  1 , . . .,   ∈ R  × is the refined embedding produced by Eq. 7.Even though the transformer can take into account neighbour information before generating the refined embedding at time step , its decision is independent; it does not take into account the actual decision made by other refined embeddings .We use a CRF layer to cover just that, i.e. to maximise the probability of the refined embeddings of all time steps, so it can better model cases where refined embeddings closest to one another must be compatible (i.e.minimising the possibility for impossible room transitions).When finding the best sequence of room location ŷ , the Viterbi Algorithm is used as a standard for the CRF layer.
For the second layer, we choose a particular room as a reference and perform a binary classification at each time step .The binary classification is produced via a linear layer applied to the refined embedding  as β = W    + b  (10) where W  ∈ R  ×1 , and b  ∈ R are weight and bias to learn, and β = β1 , . . ., β ∈ R  is the target probabilities for the referenced room within time window  .The reason to perform a binary classification against a particular room is because of our interest in improving the accuracy in predicting that room.In our application, the room of our choice is the hallway where it will be used as a hub connecting any other room.
Loss Functions.During the training process, the MDCSA network produces two kinds of outputs.Emission outputs (outputs produced by Equation 9prior to prediction outputs) ê = Φ  1 , . . ., Φ   are trained to generate the likelihood estimate of room predictions, while the binary classification output β = [ β1 , . . ., β ] is used to train the probability estimate of a particular room.The final loss function can be formulated as a combination of both likelihood and binary cross entropy loss function described as: where L   (.) represents the negative log-likelihood and L  (.) denotes the binary cross entropy, y = [ 1 , . . .,   ] ∈ R  is the actual room locations, and  = [ 1 , . . .,   ] ∈ R  is the binary value whether at time  the room is the referenced room or not. ( | ) denotes the conditional probability, and  (  |   −1 ) denotes the transition matrix cost of having transitioned from   −1 to .

EXPERIMENTS AND RESULTS
We compare our proposed network, MDCSA 1,4,7 5 (MDCSA with 3 kernels of size 1, 4, and 7), with: • Random Forest (RF) as a baseline technique which has been shown to work well for indoor localisation [43], • TENER [50] which is a modified transformer encoder in combination with a CRF layer representing a model with capability to capture global dependency and enforce dependencies in temporal aspects, • DTML [51] represents the state-of-the-art model for multimodal and multivariate time series with a transformer encoder to learn asymmetric correlations across modalities, • Alt DTML 6 representing DTML with a GRN layer replacing the context aggregation layer and CRF layer added as the last layer, • MDCSA 1,4,7 4APS, as an ablation study, with our proposed network (i.e.MDCSA 1,4,7 ) using 4 access points for the RSSI (instead of 10 access points) and accelerometer data (ACCL) as its input features, • MDCSA 1,4,7 RSSI, as an ablation study, with our proposed network using only RSSI, without ACCL, as its input features, and • MDCSA 1,4,7 4APS RSSI, as an ablation study, with our proposed network using only 4 access points for the RSSI as its input features.
For RF, all the time series features of RSSI and accelerometry are flattened and merged into one feature vector for room-level localisation.For TENER, at each time step , RSSI x   and accelerometer x   features are combined via a linear layer before they are processed by the networks.A grid search on the parameters of each network is performed to find the best parameter for each model.The parameters to tune are: the embedding dimension  in {128, 256}, the number of epochs in {200, 300}, and the learning rate in {0.01, 0.0001}.The dropout rate is set to 0.15, and the RAdam optimiser [29] in combination with Look-Ahead algorithm [53] is used for the training with early stopping using the validation performance.For the RF, we perform a cross-validated parameter search for the number of trees ({200, 250}), the minimum number of samples in a leaf node ({1, 5}), and whether a warm start is needed ({, }).The Gini impurity is used to measure splits.Evaluation Metrics.We are interested in developing a system to monitor PD motor symptoms in home environments.For example, we will consider if there is any significant difference in the performance of the system when it is trained with PD data compared to being trained with healthy control (HC) data.We tailored our training procedure to test our hypothesis by performing variations of cross-validation.Apart from training our models on all HC subjects (ALL-HC), we also perform four different kinds of crossvalidation: 1) We train our models on one PD subject (LOO-PD), 2) We train our models on one HC subject (LOO-HC), 3) We take one HC subject and use only roughly four minutes worth of data to train our models (4m-HC), 4) We take one PD subject and use only roughly four minutes worth of data to train our models (4m-PD).For all of our experiments, we test our trained models on all PD subjects (excluding the one used as training data for LOO-PD and 4m-PD).For room-level localisation accuracy, we use precision and weighted F1-score, all averaged and standard deviated across the test folds.
To showcase the importance of in-home gait speed features in differentiating the medication state of a person with PD, we first compare how accurate the 'Room-to-room Transition' duration produced by each network is to the ground truth (i.e.annotated location).We hypothesise that the more accurate the transition is compared to the ground truth, the better mobility features are for medication state classification.For the medication state classification, we then compare two different groups of features with two simple binary classifiers: 1) the baseline demographic features  (see Section 3), and 2) the normalised in-home gait speed features.The metric we use for ON / OFF medication state evaluation is the weighted F1-Score and AUROC which are averaged and standard deviated across the test folds.

Experimental Results
Room-level Accuracy.The first part of Table 1 compares the performance of MCDSA network and other approaches for room-level classification.For the room-level classification, MDCSA network outperforms other networks and RF with a minimum improvement of 1.3% for the F1-score over the second-best network (i.e.Alt DTML) in each cross-validation type with the exception of the ALL-HC validation.The improvement is more significant on the 4m-HC and 4m-PD validations, when the training data are limited, with an average improvement at almost 9% for the F1-score over the Alt DTML.
The LOO-HC and LOO-PD validation show that a model that has the ability to capture the temporal dynamics across time steps (e.g.TENER and DTML) will perform better than a standard baseline technique such as a Random Forest.TENER and DTML perform better in those two validations due to their ability to capture asynchronous relation across modalities.However, when the training data becomes limited as in 4m-HC and 4m-PD validations, having extra capabilities is necessary to further extract temporal information and correlations.Due to being a vanilla transformer requiring considerable amount of training data, TENER performs worst in these two validations.DTML performs quite well due to its ability to capture local context via LSTM for each modality.However, in general, DTML's performance suffers in both the LOO-PD and 4m-PD validations as the accelerometer data (and modality) may be erratic due to PD and should be excluded at times from contributing to room classification.MDCSA network has all the capabilities that DTML has with an improvement in suppressing accelerometer modality when needed via GRN layer embedded in DCSA.Suppressing the noisy modality seems to have a strong impact in maintaining the performance of the network when the training data is limited.This is validated by how Alt DTML (i.e.DTML added with GRN and CRF layers) outperforms the standard DTML by an average of 2.2% for the F1-score in in 4m-HC and 4m-PD validations.It is further confirmed by MDCSA 1,4,7 4APS against MDCSA 1,4,7 4APS RSSI with the latter model, which does not include the accelerometer data, outperforming the former for the F1-score by an average of 1.6% in the last three cross validations.It is worth pointing out that the MDCSA 1,4,7 4APS RSSI model performed the best in the 4m-PD validation.However, the omission of accelerometer data affects the model in differentiating rooms that are more likely to have active movement (i.e.hall) than the rooms that are not (i.e.living room).It can be seen from Table 2 that the MDCSA 1,4,7 4APS RSSI model has low performance in predicting hallway compared to the full model of MDCSA 1,4,7 .As a consequence, the MDCSA 1,4,7 4APS RSSI model cannot produce in-home gait speed features as competent as the ones produced by the MDCSA 1,4,7 .
Room-to-room Transition and Medication Accuracy.We hypothesise that during their OFF medication state, the deterioration in mobility of a person with PD is exhibited by how they transition between rooms.To test this hypothesis, a Wilcoxon signed rank test was used on the annotated data from PD participants undertaking each of the three individual transitions between rooms whilst ON (taking) and OFF (withholding) medications to assess whether the mean transition duration ON medications was statistically significantly shorter than the mean transition duration for the same transition OFF medications for all transitions studied (see Table 4).From this result, we argue that the mean transition duration obtained by each model from Table 1 that is close to the ground truth can capture what the ground truth captures.As mentioned in Section 3, this transition duration for each model is generated by the model continuously performing room-level localisation focusing on the time a person is predicted to spend in a hallway between  1 shows the performance of all our networks for medication state classification.The demographic features can be used as a baseline for each type of validation.The MDCSA network, with the exception of the ALL-HC validation, outperforms any other network by a significant margin for the AUROC score.By using in-home gait speed features produced by MDCSA network, a minimum of 15% improvement the baseline demographic features can be obtained with the biggest gained obtained in the 4m-PD validation data.In 4m-PD validation data, RF, TENER, and DTML could not manage to provide any prediction due to their inability to capture (partly) hall transitions.Furthermore, TENER has shown its inability to provide any medication state prediction from the 4m-HC data validations.It can be validated by Table 3 when the TENER failed to capture any transitions between dining room and living room across all periods that have ground truths.MDCSA networks are able to provide medication state prediction and maintain its performance across all cross-validations thanks to the addition of Eq. 13 in the loss function.

CONCLUSION
We have presented the MDCSA model, a new deep learning approach for indoor localisation utilising RSSI and wrist-worn accelerometer data.The evaluation on our unique real-world freeliving pilot dataset, which includes subjects with and without PD, shows that MDCSA achieves the state-of-the-art accuracy for indoor localisation.The availability of accelerometer data does indeed enrich the RSSI features which, in turn, improves the accuracy of the indoor localisation.
In naturalistic settings, in-home mobility can be measured through the use of indoor localisation models.We have shown, using room transition duration results, that our PD cohort take longer on average to perform a room transition when they withhold medications.With accurate in-home gait speed features, a classifier model can then differentiate accurately if a person with PD is in an ON or OFF medication state.Such changes show the promise of these localisation outputs to detect the dopamine-related gait fluctuations in PD that impact on patients' quality of life [31] and are important in clinical decision making [9].We have also demonstrated that our indoor localisation system provides precise in-home gait speed features in PD with a minimal average offset to the ground truth.The network also outperforms other models in the production of in-home gait speed features which is used to differentiate the medication state of a person with PD.
Limitations and future research.One limitation of this study is the relatively small sample size (which was planned as this is an exploratory pilot study).We believe our sample size is ample to show proof of concept.This is also the first such work with unobtrusive ground truth validation from embedded cameras.Future work should validate our approach further on a large cohort of people with PD and consider stratifying for sub-groups within PD (e.g.akinetic-rigid or tremor-dominant phenotypes), which would also increase the generalisability of the results to the wider population.Future work in this matter could also include the construction of a semi-synthetic dataset based on collected data to facilitate a parallel and large-scale evaluation.
This smart home's layout and parameters remain constant for all the participants, and we acknowledge that transfer of this deep learning model to other varied home settings may introduce variations in localisation accuracy.For future ecological validation and based on our current results, we anticipate the need for pre-training (e.g. a brief walkaround which is labelled) for each home, and also suggest that some small amount of ground-truth data will need to be collected (e.g.researcher prompting of study participants to undertake scripted activities such as moving from room to room) to fully validate the performance of our approach in other settings.
Accurate room localisation using these data modalities has a wide range of potential applications within healthcare.This could include tracking of gait speed during rehabilitation from orthopaedic surgery, monitoring wandering behaviour in dementia or triggering an alert for a possible fall (and long lie on the floor) if someone is in one room for an unusual length of time.Furthermore, accurate room use and room-to-room transfer statistics could be used in occupational settings, e.g. to check factory worker location.

A STATISTICAL SIGNIFICANCE TEST
It could be argued that all the localisation models compared in Table 1 might not be statistically different due to fairly high standard deviation across all types of cross-validations which is caused by relatively small number of participants.In order to compare multiple models over cross-validation sets and show statistical significance of our proposed model, we perform the Friedman test to first reject the null hypothesis [10].We then performed a pairwise statistical comparison: the Wilcoxon signed-rank test with Holm's alpha correction ( = 5%).Finally, we used a critical difference diagram [6] to visualize the results of these statistical tests projected onto the average rank axis, with a thick horizontal line showing a clique of localisation mdoels that are not significantly different (see Figure 3  and 4).

Figure 1 :
Figure 1: Layout of the residential home setting.

Figure 3 :
Figure 3: Critical difference diagram for the precision of room-level localisation showing the pairwise statistical comparison of all localisation models across different crossvalidation techniques.

Figure 4 :
Figure 4: Critical difference diagram for the F1-score of roomlevel localisation showing the pairwise statistical comparison of all localisation models across different cross-validation techniques.
and accelerometer data x  = [x  1 , . . ., x   ] ∈ R  × within  time unit, where x   =  1  , . . .,    represents RSSI signals from  access points, and x   =  1  , . . .,    represents accelerometer data from  spatial directions at time  with  ≤  .Given feature vectors x   =  1  , . . .,    with  ∈ {, } representing RSSI or accelerometer data at time , and  ≤  representing time index, a positional embedding h   for RSSI or accelerometer can be obtained by:

Table 1 :
Room-level and medication state accuracy of all models.Standard deviation is shown in (.), the best performer is bold, while the second best is italicized.Note that our proposed model is the one named MDCSA 1,4,7

Table 2 :
Hallway prediction on limited training data.

Table 3 :
Room-to-room transition accuracy (in seconds) of all models compared to the ground truth.Standard deviation is shown in (.), the best performer is bold, while the second best is italicized.A model that fails to capture a transition between particular rooms within a period that has the ground truth is assigned 'N/A' score.

Table 4 :
PD participant room transition duration with ON and OFF medications comparison using Wilcoxon signed rank tests.We show, in Table3, that the mean transition duration for all transitions studied produced by MDCSA 1,4,7 model is the closest to the ground truth improving over the second best by around 1.25 seconds across all hall transitions and validations.The second part of the Table