The Effect of Predictive Formal Modelling at Runtime on Performance in Human-Swarm Interaction

Formal Modelling is often used as part of the design and testing process of software development to ensure that components operate within suitable bounds even in unexpected circumstances. In this paper, we use predictive formal modelling (PFM) at runtime in a human-swarm mission and show that this integration can be used to improve the performance of human-swarm teams. We recruited 60 participants to operate a simulated aerial swarm to deliver parcels to target locations. In the PFM condition, operators were informed of the estimated completion times given the number of drones deployed, whereas in the No-PFM condition, operators did not have this information. The operators could control the mission by adding or removing drones from the mission and thereby, increasing or decreasing the overall mission cost. The evaluation of human-swarm performance relied on four key metrics: the time taken to complete tasks, the number of agents involved, the total number of tasks accomplished, and the overall cost associated with the human-swarm task. Our results show that PFM modelling at runtime improves mission performance without significantly affecting the operator's workload or the system's usability.


INTRODUCTION
Aerial swarms amplify our ability to observe and engage with areas that are challenging for us to reach or oversee.One of the promising applications of aerial swarms is in search and rescue missions to locate and identify casualties on time or to deliver essential life-saving supplies to remote and difficult-to-reach areas in the aftermath of a natural disaster [1].Such applications are often accompanied by several challenges, ranging from design and deployment [14] to issues related to safety [17], regulations [19], and the operator's mental workload [2].Other challenges include performance, shared control and degree of automation, as well as determining the appropriate timing for presenting necessary information to the human operator.Prior research on human-swarm interaction (HSI) has identified essential prerequisites for the successful operation of aerial swarms [9].For systems relying on human supervision and intervention, a critical requirement for the smooth operation of the swarm is the efficient timing and selection of relevant data provided to the operator.Gu et al. [12] proposed a predictive formal modelling (PFM) technique to estimate the mission success at runtime.PFM can be used to inform the swarm operator about the crucial information required to make appropriate decisions in order for the mission to be successful.This capability allows the swarm operator to make informed decisions promptly, increasing the overall efficiency and adaptability of the human-swarm collaborative efforts.With PFM providing crucial insights, the interaction becomes more streamlined and responsive, ensuring that decisions are aligned with the mission's success criteria.This enhanced decision support contributes to a more seamless and effective collaboration between humans and robots in achieving mission objectives.Our hypothesis posits that incorporating predictive formal modelling (PFM) into the real-time execution of human-swarm tasks can empower swarm operators to make more informed decisions, thereby optimising the utilisation of swarm capabilities and resources.
In this paper, we integrate PFM into the 'Human And Robot Interactive Swarm' (HARIS) simulator [15] to provide human swarm operators with real-time mission and swarm status updates, along with predictions of mission success.Following a within-subject design, we recruited 60 participants to complete a human-swarm task of delivering packages to different areas with two conditions (with and without PFM).We assess the impact of PFM on the performance and required mental workload during the completion of a task as a human-swarm team.Results of the study showed that the PFM condition was able to significantly improve mission performance without having a significant effect on the operator's workload.

RELATED WORK
In Abioye et al. [1], the authors evaluated the effect of adding an extra feature to the human-swarm interface, i.e. operator option to request high-quality images of a search area.They found that this led to higher trust perception but did not enhance the overall human-swarm performance.Schneiders et al. [24] indicated the demand for studying non-dyadic human-in-the-loop system configurations, such as that presented in this work.Hunt et al. [15] proposed a method of dynamic re-tasking and triage based on operator feedback.Wilson et al. [27] identified key challenges for the deployment and use of robot swarms, which included how humans understand, monitor, control, and interact with swarms.
Kouvaros and Lomuscio [16] show a formal model for swarms and determine whether the emergent behaviour is satisfied; Boureanu et al. [6] analyse unbounded swarm systems with respect to security requirements by verifying a parameterised model; Lomuscio and Pirovano [18] outline a verification procedure to reason about the fault-tolerance in probabilistic swarm systems.However, none of these approaches can give guarantees after deployment.A close approach to ours is runtime monitoring [3], where pre-constructed monitors are used to analyse the system execution traces that are generated at runtime against formal specifications [11].These monitors can evolve with system dynamics, such as the size and topology, but cannot reason about mission-level specifications, like the human-swarm interactions, where finite observations are not sufficient.Instead, Gu et al. [12] propose a framework to integrate runtime modelling [5], that has been deployed in system reasoning for unforeseen situations during execution [4], with formal methods and focus on formal runtime modelling.Quantitative formal models can provide predictions, and this has been used at design time, e.g., for predicting failures and service availability of components [8].In this work, we adopt an existing model [12] implementing PFM at runtime, to predict the feasibility of a human-swarm mission succeeding and check whether presenting PFM output in the user interface can support human operator in their decision-making during the human-swarm task.

STUDY
We conducted a within-subject user study with 60 participants divided into two counterbalanced groups as shown in Table 1, in order to directly compare usability, workload, and performance between the PFM and No-PFM conditions.

Participants
We recruited 60 participants (38 female, 22 male, average age: 34.6, age range 18 -64) through Prolific [22] 1 .70% of participants have at least a bachelor's degree, 50% were above-average computer users, and 47% were familiar with UAV or swarm robotics.Participants were recruited from the US and the UK.Participants were randomly allocated into groups and received £ 9.The average study duration was 33.6 minutes.

Study task
To investigate the impact of PFM, in a human-swarm interaction context, we developed a drone delivery mission scenario which was presented to participants through two scenarios: the PFM and No-PFM scenarios.To constrain the operators' strategies, we added a cost (£ 2,000) and time (6 minutes) limit.These two constraints meant that the participants were occupied trying to meet the time limit while staying within the mission budget.Users could control the swarm and the mission by performing two operations: adding or removing UAVs from the mission.The more UAVs they add, the faster they complete the mission, but they incur a higher mission cost.The reverse is also applicable in that the more UAVs they remove from the mission, the longer the mission completion time, and the less the overall mission cost.We set the minimum and maximum number of UAVs allowed in the mission to be 4 and 10 respectively.The final mission cost was a cumulative sum of the upkeep cost per second.We implemented a non-linear per-second upkeep cost function that makes the upkeep per-second cost higher each time a new UAV is added.This reduces the participant's ability to predict the cost of adding or removing a UAV, especially for the No-PFM condition without the predictive model.We added 40 delivery tasks to each scenario.PFM was used to predict the probability of completing all deliveries with the given number of UAVs.This prediction was presented to the operator as an estimated completion time, as shown inside the circle in the upper right corner (see Figure 1).The colour of the circle changes from green to yellow and red depending on the estimated time of completion (green for finishing well below the given time, yellow for finishing near the given time, and red for exceeding it).

HARIS Model
The HARIS simulator is a browser-based platform that was specifically designed for human-in-the-loop multi-agent and swarm robotics experimentation.HARIS is a successor of HutSim [23] which was designed with a specific focus on usability by consulting with industry experts to model not only their typical command structure but also make it operable with real-life or simulated agents.Building on its predecessor, HARIS was further tailored to its use case derived from interviews with drone pilots [21] and swarm experts [20] to make the platform as usable and realistic as possible while maximising the ease of use for multiple human operators [25], making this simulator a useful tool for the investigations on human-swarm simulations.
We use the model from [12] with slight modifications to better reflect the scenario.For example, the background failure in each region is removed to relieve participants' stress from the unexpected loss of UAVs; Erlang-k law [10] is implemented to represent a smooth transition delay in CTMCs.The integration with the HARIS simulator follows a similar process to [12], but with the Sim2PRISM middleware directly embedded in HARIS.Instead of showing the probability of success directly, which might be difficult for participants to interpret, we consider the feasibility over different time intervals and give an estimated completion time as the time when the probability of success reaches 0.99.Additionally, we implement What-if scenarios to give the participants extra information on the effect of adding/removing a UAV before making a decision.

Procedure
Following recruitment, participants were presented with the participant information sheet and consent form, after which they completed a brief demographics survey which collected data on gender, age group, education level, self-rated computer expertise, as well as self-rated UAV or Swarm robotics knowledge.Subsequently, participants were asked to watch a short study briefing video and asked to answer three questions to test their preliminary understanding of the task.To ensure that participants understood the study task, two of these three had to be answered correctly in order to proceed with the study.Participants were required to perform a short tutorial scenario which allowed them to experiment with all the provided functionality and experience the interface prior to the actual data collection.Participants then proceeded to their first scenario.Following its completion, they completed the post-task survey which included the 6-item NASA-TLX [13] questionnaire and the 10-item System Usability Scale (SUS) [7].They continued with the second scenario, followed by the same set of questionnaires.Finally, participants were asked to complete a short survey in relation to their preferred scenario condition, before returning to Prolific.These questions were related to (a) the perceived accuracy of the time estimation feature provided (for the PFM condition), (b) their preferred scenario, (c) a selection of reasons for perceived success during task completion, (d) the primary reason for their success, and (e) a binary selection if they used the estimated completion time.Each participant's performance was measured and recorded in real-time during the scenario tasks as HARIS generated log files.
The survey questionnaires and HARIS simulator were dockerised and deployed online 2 on an AWS EC2 c5.4xlarge (32GB RAM, 16 vCPUs) instance running the Ubuntu 22.04 operating system.The dockerisation was necessary for a scalable deployment due to the high computing resource requirement of the prediction model in the simulator.

RESULTS AND ANALYSIS
The result of the participants' performance over time is presented in Figure 2. Figure 2a shows the mean mission cost of each scenario over time.The No-PFM scenario incurred a lower cost over time than the PFM scenario.Figure 2b shows the mean number of agents used over time.The No-PFM group started with the least number of agents but finished with the most.Since this group did not have the estimated completion time displayed, it is possible that they realised very late that they may not finish, and therefore started adding more agents towards the end.This might indicate that participants in the No-PFM condition found it more difficult to balance the number of agents with the two constraints defined.Figure 2c compares the mean number of completed tasks over time and shows that 2 Online HARIS simulator: https://uos-haris.online/participants completed more tasks on average over time in their PFM scenario compared to their No-PFM scenario.
To determine the impact of our formal model on the performance of the human-swarm teaming, we analysed the results to understand whether the prediction feature contributes to participants completing tasks more efficiently, with minimal influence on the overall mission cost.We evaluated four dependent variables: a) Time Completion: refers to the mission completion time i.e. the time taken to complete 40 delivery tasks.b) No. of Agents: refers to the mean number of agents deployed by each participant to complete the delivery task.c) Completed Tasks: We considered a delivery task to be successfully completed when the UAV reaches the target coordinate.After this, the UAV returns to the hub to collect parcels for the next delivery.d) Cost per Task: This was computed as a ratio of the mean total cost incurred over the mean number of tasks completed per study scenario.In terms of data analysis, our dataset meets the prerequisites necessary for conducting one-way ANOVA testing.Additionally, we performed a G*Power analysis to verify that our sample size aligns with the required criteria.Specifically, we have 60 participants across experimental groups, with an assumed effect size of 0.2 and a significance level of 0.05.The G*Power analysis helps to confirm that our study is adequately powered to detect the expected effects.
For workload, participants had a mean of 4.77 (SD =1.50) in the PFM and a mean of 4.74 (SD = 1.56) in the No-PFM scenarios.Oneway ANOVA for workload revealed no significant main effect (F(1, 118) = 0.009, p = 0.924.This suggests that the PFM feature did not add extra workload to the participants.Regarding usability, we used the system usability scale (SUS) to compare the mean values for the two conditions.In line with the guidelines [7,26], interfaces with a value of 68 or above are considered good.Mean SUS scores for PFM and No-PFM scenarios were 70.75 (SD = 17.52) and 74.38 (SD = 15.15).This shows that the usability of both systems was good.One-way ANOVA yielded no significant effect on usability (F(1, 118) = 1.470, p = 0.228).This suggests that the PFM feature did not make the system more or less usable than without it.
As depicted in Table 2, the PFM condition led to enhanced task completion rates and reduced time requirements when compared to the No-PFM scenario where no prediction was presented to participants.Specifically, participants, on average, completed 39.55 tasks (SD No. of Tasks = 1.30) within 314 seconds (SD Time Completion = 34.77) in the PFM condition.This performance contrasted with the No-PFM condition, where participants completed an average of "I found the presence of the estimated completion time feature [PFM] helped me decide whether to add or remove agents, whereas in the first scenario [No-PFM] I was trying to estimate it myself based on the remaining time and the percentage completion of the task." -P46 indicating the usefulness of the additional information provided to complete the task successfully.A similar sentiment was presented by P20 who describes the use of the PFM feature as a guiding mark for optimising the addition and removal of drones."I used the estimated time to allow me to hover around the 6-minute mark, adding and taking away planes where necessary" -P20

DISCUSSION AND FUTURE WORKS
We found that there was no significant change in workload between the two conditions and both the PFM and No-PFM interfaces were found to be usable based on the systems usability survey.Although our result show that there is a performance gain when using the predictive formal modelling feature at runtime, our analysis does not take into account how the accuracy of the prediction could affect the users' performance or trust in the system.Furthermore, this study did not investigate how the PFM feature increases explainability and hence trust in the human swarm interaction.A follow-up study could collect data on trust, acceptability, and user preferences in order to evaluate these measures.In future work, we may also consider embedding recommendations for the operators to help in controlling the swarm, i.e., when to add or remove drones.In order to expand on the assessment of mental workload, different data streams about the users' interaction, such as neurophysiological responses (e.g., error potentials), might be useful.This could indicate the operator's cognitive workload and situational awareness as they operate the swarm in a disaster response scenario.

CONCLUSION
Building on previous work in human-swarm interaction on deploying predictive formal modelling (PFM) at runtime, we conducted a within-subject user study to determine its impact on performance and mental workload.We recruited 60 participants to perform the role of a UAV swarm operator facilitating the delivery of parcels to target locations in a simulation environment.The role required the participant to add or remove agents as needed to complete the mission within the given time.We find that participants using PFM were able to complete more tasks in less time compared to the No-PFM scenario.This increase did not result in a higher demand on mental workload, showing the potential benefit of predictive formal modelling in a human-swarm interaction scenario.

Figure 1 :
Figure 1: HARIS simulator interface showing the predictive formal modelling based estimated completion time feature in the PFM scenario.

Figure 2 :
Figure 2: Comparing the mean performance of all study conditions over time.

Table 1 :
Showing counterbalanced distribution of the 60 recruited participants.