A Robot-Administered ICU Confusion Assessment with Brain-Computer Interface Control

Evaluation of patient delirium in hospital Intensive Care Units (ICU) is a crucial and challenging task, with conventional assessments like CAM-ICU relying largely on verbal and physical communication, making it difficult for patients with limited physical abilities. To address this, we propose a system that integrates Brain-Computer Interface (BCI) technology and a Socially Assistive Robot (SAR) through brain-controlled mental commands. In a pilot user study, we demonstrate how our system could successfully administer a version of the CAM-ICU to 13 medical professionals and students roleplaying various level of delirium severity. Our work reveals early usability and workload insights, and next steps to improve upon assessment classification accuracy and interaction design.


INTRODUCTION
Patient delirium is a common occurrence in hospital intensive care units (ICU) caused by factors such as underlying conditions or trauma, medication, and sleep deprivation.Delirium is characterized as an acute cognitive disturbance that affects attention, perception, memory, awareness, and behavior [20,23].Its prevalence ranges from 11% to 42% in medical wards, with incidence rates exceeding 70% in ICU patients on ventilators [20].Delirium has been shown to have a significant impact on healthcare, leading to increased mortality rates, extended hospital stays, and increased risks of falls, infections, and other complications [20].
To detect delirium, ICU staff will frequently administer confusion assessments, such as the widely accepted Confusion Assessment Method -Intensive Care Unit (CAM-ICU) [10].However, these assessments generally rely on verbal and physical communication, creating challenges for patients with limited movement or speech [8].Furthermore, the rate at which these assessments are administered can range from eight to twelve hours or as needed in large ICUs, an added strain that exacerbates the existing staff burden on many struggling healthcare services [8].
Brain-Computer Interface (BCI) technology utilizes electroencephalography (EEG) to capture user brainwave data for passive monitoring of user states and active control tasks.In healthcare scenarios, BCI offers a non-invasive solution for non-verbal and non-physical communication [18] and has proven useful for cognitive assessments [1,3].Moreover, Socially Assistive Robots (SARs) have emerged as a promising technology for personalized, costeffective monitoring and assessment in healthcare settings [5].Their unique social capabilities can facilitate roles traditionally undertaken by humans, such as the administration of healthcare assessment [19,21].Limited work has been published in the realm of confusion assessment using SARs, with Jeffcock et al. providing one of the only known works that investigates this [15].Specifically, the authors used a SAR platform to administer the CAM-ICU to roleplaying students utilizing camera vision to determine agitation level [15].The authors were able to achieve above 80% accuracy for the three tested features of the CAM-ICU.However, the system was dependent on verbal and physical communication to complete the assessment.
Towards more autonomous administration of delirium assessments in the ICU, we developed and evaluated, in a pilot study, a SAR which administered the CAM-ICU to medical professionals and students roleplaying as patients with varying levels of delirium.Additionally, we integrated a BCI headset to facilitate braincontrolled mental commands, serving as an accessible interface for users unable to convey themselves verbally or physically.Our goal was to evaluate the feasibility and practicality of this technology in a low-stakes roleplaying scenario and gain initial insights into this novel area of delirium assessments.

SYSTEM DESIGN
Our system utilizes a SAR to administer the CAM-ICU verbally, assessing different aspects of delirium through input from mental commands sensed through a BCI headset.We chose ARI from PAL robotics [6] as our humanoid social robot.ARI has been used considerably across multiple healthcare-related applications [4,7] and satisfied our requirements of an embedded tablet for graphic display and a robust voice module for verbal expression.For BCI, we used the Emotiv Insight 21 , a 5-channel EEG headset that has shown impressive results in a range of control tasks [9,17,24], with minimal setup compared to more elaborate headsets.
Figure 2 presents a high-level overview of our system design.A local machine monitors a live data stream from the BCI headset connected through Bluetooth.The data stream captures: 1) mental commands from EEG and 2) facial expressions from onboard motion sensors.Processing is performed using the Emotiv Cortex API service2 .To enable the classification of mental commands, an initial training step is required.The local machine also coordinates the web pages on the ARI's GUI as the HRI sequence progresses, integrating Python Flask and AJAX to communicate the live BCI stream to drive visual components on the ARI's screen.The full code repository is accessible through GitHub3 .

Administering the CAM-ICU
The CAM-ICU focuses on four features present in delirium, as summarized in Table 1.For our implementation, we considered acute change out of scope, as it would require longer-term interaction and knowledge of baseline mental state.Thus, we focus on the last three categories of the CAM-ICU.

Inattention.
To align our implementation with the existing assessment, we developed an interaction mimicking the letter reading task [8].Specifically, the robot would verbalize a sequence of 10 letters, and the user would signal when the letter "A" was verbalized by selecting the options displayed by the robot.

Altered level of consciousness.
The CAM-ICU uses the Richmond Agitation-Sedation Scale (RASS) [10] to assess the patient's level of consciousness.For this, the clinician usually provides a subjective rating based on the patient's physical movements and visible emotions.Previous work has already sought to achieve this through computer vision [15].We harnessed facial expression data captured through the Emotiv headset to minimize the need for a more complex setup.Specifically, our headset streamed passive data on five facial characteristics: neutral, surprise, frown, smile, and clenched teeth, as well as eye movement.2: Overview of the experiment structure 2.1.3Disorganized thinking.Again, we mimicked the CAM-ICU by including a sequence of four logic-based questions to assess this aspect of delirium.Users were required to answer yes or no based on these questions: "Will a stone float on water?"; "Are there fish in the sea?"; "Does one pound weigh more than two?"; "Can you use a hammer to pound a nail?".The two logic tasks based on physical movement were not included in the section, which would be a standard procedure in the ICU for patients with limited functionality [8].

Mental Commands for Answer Selection
Responses for questions asked during the assessment's inattention and disorganized thinking phases would be sensed through the BCI headset.For this, it can be beneficial for users to feel like they are controlling some physical action within the real world.Therefore, we implemented the basic interface seen in Figure 1, which allows users to drag a marker left to answer no and right for yes.A neutral state (when the user is not focusing on any answer) would result in no action from the marker.

USER STUDY
We recruited 15 participants to evaluate our system.Seven of these were medical students from the University of Edinburgh, and four of them were practicing professionals (two nurses, two doctors) who work within Scotland's National Health Service.The average medical experience between participants was 6.5 years, and 80% of the participants had experience with delirium or delirious patients.
The study was carried out at the University of Edinburgh over five days, with each session lasting approximately an hour.All participants provided informed consent prior to taking part in the study.Participants were asked to sit opposite the ARI robot for the duration of the study, as depicted in Figure 3, and were faced away from any interior windows, in a quiet room, with the research facilitator positioned behind during the training and roleplaying phases of the evaluation.Table 2 gives an overview of the experiment setup.

BCI Setup and Training
This phase involved the facilitator fitting the BCI headset to the participant's head, using a saline solution to improve conductivity.Once fitted, the participants were asked to perform three mental commands: move left, move right, and neutral, to manipulate the marker shown on the interface in Figure 1.Each command was trained between 8-12 times, 8 seconds per time, alternating

Delirium Roleplaying
A patient's CAM-ICU score is calculated based on the results of the four assessment features shown in Table 1.Feature one is scored 0 or 1 while features two, three, and four are scored between 0 and 2 [16].No delirium is assigned to patients with a total score of 0-2, moderate for 3-5 and severe for 6-7.To investigate our system's interaction with users across this range, we assigned none, moderate, and severe classifications to n=5, n=4, and n=4 participants respectively.
For administration of the assessment, the SAR would execute the inattention phase followed by the disorganized thinking phase, while participants would roleplay answering using mental commands (e.g.participants assigned a severe delirium level would purposefully get more questions wrong and exhibit more exaggerated facial expressions to show agitation).The robot would move on to the next question when the mental command moved a marker to the selected option or after 10 seconds.The system recorded the participant's response as the option selected by the marker or as the most frequently sensed command during the 10-second time frame.The facial expression data was captured every 0.5 seconds.

Questionnaires
To determine the BCI error rate, participants were shown the recorded responses from the BCI headset for each question and asked to indicate if the recorded response was accurate with regard to their roleplaying.The NASA TLX [12,13] and SUS [2] questionnaires were also given to gain insight into workload and to understand the experience of participants in clinical settings with delirious patients how usable they found the system.

Delirium Roleplaying
We measured CAM-ICU response accuracy across all questions based on answer classification from the BCI mental command data against the ground truth gathered from the post-interaction questionnaire.Overall, we found a 65% ± 3.9% accuracy, with a 70% average for the none group, 63.1% for moderate, and 57.4% for severe.The accuracy overall during the inattention feature questions was 68.5%, while the accuracy during the disorganized thinking feature was 55.8%.Results can be seen in Figure 4. We then used the assessment data gathered by our system (including facial expression data for the Altered Level of Consciousness feature) as input into the CAM-ICU scoring system [10,16], (with a default value of 1 for the Acute Change feature).We found that the system could correctly classify 4 participants into their respective delirium severity groups (all within the moderate group).

Workload and Usability
The average weighted total workload for all participants, calculated using the NASA TLX, was 54.46 ± 5.37, placing the workload requirement in the 60th to 70th percentile (40th to 80th with error rate) [11].Physical workload was the lowest demand, with an average weighted rating of 0.18, while mental demand was the highest, with an average weighted rating of 16.56The average SUS score for all participants was 60.58 ± 4.52.The accepted benchmark for SUS scores is 68 [14], therefore, the system falls short of average usability.

DISCUSSION
The accuracy of our system based on total correctly classified answers showed average performance, yet not close to what would be expected in a real-world deployment.We saw low results for the final CAM-ICU classifications compared to participants' expected delirium severity levels.This was likely caused by weak classifications of the mental commands, possibly from participants concentrating on answering the assessment while simultaneously mimicking facial expressions.Participants also experienced high levels of mental demand which was expected, as in line with other cognitive assessments using BCI [1].Nevertheless, a lower workload may be achievable through more efficient interface design, sensitivity of marker movement, or adaptions to questioning.With regard to usability, a lower-than-expected score could be attributed to the interaction exchange between the mental command and the marker movement on the screen.Further investigation with stakeholders would be required to understand more desirable aspects of the system design.
Although headset problems resulted in the omission of two participants, the overall experience with the system was positive.The SAR, driven by input from mental commands, was successfully able to autonomously administer our version of the CAM-ICU test to roleplaying participants.Importantly, we experienced a quick setup time for the headset and were able to train the classifier to a working level with a relatively low number of example training commands.

Limitations
Firstly, a more reliable mental command training could have been achieved with better headset contact, yet this is a compromising factor in low-cost devices.Furthermore, if working with the target population of acute patients in the ICU, using such a headset for mental commands would likely be a significant jump from our roleplaying scenario, with more complex considerations such as cognition level, physical practicality and ability to train commands, despite BCI/EEG technologies having a strong footprint in existing healthcare research [22].
For our study, participants were asked to provide a 'ground truth' of their deliberate responses to the assessment to compare the system's performance.This information was asked post-roleplaying phase; therefore, some discrepancies may exist.This was a conscious decision to limit the prescripting of the interaction and to make the roleplaying as authentic as possible.In addition, frequent classification of the neutral command could also indicate confusion or inattention in real patients, as they cannot focus on a specific answer.However, it would be difficult for an initial pilot to differentiate genuine neutral commands role played to those incorrectly classified by the system, hence this was not implemented.

CONCLUSION AND FUTURE WORK
There is strong motivation for more accessible and automated confusion assessments in the ICU.This short work has shown how our socially assistive robot was able to autonomously deliver key phases of the popular CAM-ICU to 13 medical professionals and students roleplaying varying levels of delirium.Although the accuracy of our system did not meet expectations, this initial pilot provides useful information about lessons learned and the next steps.
Specifically, future work should include consideration of lowquality mental commands as possible identifiers of inattention; codesign with stakeholders to refine interaction; further pilot testing to refine command sensitivity for reduced workload; investigation of other potential input technologies such as eye tracking for assessment answering.

Figure 2 :
Figure 2: A high-level overview of system architecture

Figure 4 :
Figure 4: % accuracy of CAM-ICU question answering among the various roleplaying severity groups, as evaluated by our system