Effects of Urgency and Cognitive Load on Interaction in Highly Automated Vehicles

In highly automated vehicles, passengers can engage in non-driving-related activities. Additionally, the technical advancement allows for novel interaction possibilities such as voice, gesture, gaze, touch, or multimodal interaction, both to refer to in-vehicle and outside objects (e.g., thermostat or restaurant). This interaction can be characterized by levels of urgency (e.g., based on late detection of objects) and cognitive load (e.g., because of watching a movie or working). Therefore, we implemented a Virtual Reality simulation and conducted a within-subjects study with N=11 participants evaluating the effects of urgency and cognitive load on modality usage in automated vehicles. We found that while all modalities were possible to use, participants relied on touch the most. This was followed by gaze, especially for external referencing. This work helps to further understand multimodal interaction and the requirements this poses on natural interaction in (automated) vehicles.


INTRODUCTION
Interaction is expected to alter with the introduction of highly automated vehicles (AVs) [44].In AVs, a user is not required to intervene in the driving task anymore (i.e., SAE Level 4 or 5).A passenger no longer relevant to the primary driving task (i.e., steering, accelerating, braking) can engage in non-driving related tasks such as reading, watching a movie, or even sleeping [17,44].Additionally, with the advance in multimodal sensors in the vehicle as cameras, microphones, or radars, novel interaction techniques and metaphors become feasible [13,30,31,54].For these interactions, the characteristics of naturalness, intuitiveness, and human likeness become crucial [4,56].Additionally, such technology enables novel use cases such as addressing outside objects (i.e., objects outside the AV such as buildings or other road users) with direction-based commands like "I know back there, there is this little café" [51, p. 7], simple questions like "What is that?", and still enabling the user to indicate also preferences regarding the driving task (e.g., determining the perfect parking spot [46], modification of the route, the specification of a destination, or changing of the traveling speed [20,39]).These types of commands can be understood by a multimodal technology without the necessity of unimodal approaches to be very precise (e.g., "What is that in 20 m on the right side close to the tree?´´).Previous works have already evaluated new interaction modalities focusing on a limited number of inputs and specific contexts.Overall, previous studies [2,25,50,51] based in a human-driven car context showed the relevance of new interaction modalities such as eye-gaze and hand-tracking to improve the quality and effectiveness of communication.
While these technologies and these interaction cases are highly probable or can even be seen in current semi-automated vehicles, the way passengers will interact with an AV with all possible interaction techniques was not yet explored.
Therefore, we implemented the voice, gaze, pointing, and touch as input modalities in a Virtual Reality (VR) environment and conducted a within-subjects study with N=11 participants.We designed two scenarios, driving through a city with the objective of referencing different objects or user interfaces in and outside the AV and a scenario in which the effect of urgency (e.g., present in referencing objects outside the vehicle when driving past it) was evaluated.For this, the participant had to reference a building in the line of sight or provide information about driving direction (left or right) in 3, 7, or 10s.
We found that touch, speech, and gaze + touch were preferred and used most frequently.Multimodal approaches were used less both for in-and outside referencing.Participants especially highlighted the need for (visual) feedback if pointing or gaze is used.
In this paper, we will first outline related work, describe the experiment to investigate multimodal interaction, define our interpretation procedure, and report our quantitative and qualitative results.Finally, we discuss our results.
Contribution Statement: Our work provides insights into two scenarios regarding (multimodal) interaction with AVs.The scenarios include referencing objects and user interfaces in and outside the AV and the effects of urgency.Results of a VR study with N=11 participants showed that touch, voice, and gaze + touch were preferred and that, especially for gaze and pointing, (visual) feedback is required.Our work helps guide developers and researchers towards useful and usable multimodal interactions in AVs.

RELATED WORK
This work builds on research in interaction modalities in manual and AVs.The interaction modality should satisfy four requisites: spatial accuracy, intuitiveness, wide range of possible maneuvers and feasibility [56].Similar parameters were defined Ataya et al. [4]: easy to use, satisfactory, naturalistic, controlling, and useful.Especially in a dynamic context such as a driving vehicle, the modalities should allow real-time interaction with the external environment [25].
Voice was used to support driver-vehicle cooperation and to select maneuvers [4].Others studied voice input for the selection of in-vehicle objects [41,50,53].However, voice recognition does not necessarily work adequately in noisy environments, and drivers may be confused about possible commands [7,18].For example, Qian et al. [46] report that gestures were favored compared to voice because of the minor influence of recognition failures, noise, and confusion between commands and dialogue context.
Multimodal approaches combined, for example, gaze to localize and hand gestures to coordinate pointing [33,48].The advantages of the fusion, such as compensation of the single inputs drawbacks, motivated the work of Gomaa et al. [25], who analyzed the subject's pointing and gaze behavior during the drive.They found that in automated driving (compared to manual driving), gaze accuracy was significantly higher but pointing accuracy showed no significant differences between driving modes.They also found that, in general, gaze accuracy was significantly higher than pointing accuracy.Gomaa et al. [25] acknowledged that a real vehicle could have some influence on these metrics based on vehicle boundaries and movements.Therefore, we employed a vehicle mockup but could not simulate movement.In a later work, Gomaa et al. [26] proposed ML-PersRef, a personalized machine learning technique to reference objects inside and outside of a moving vehicle.In line with this work, Aftab and von der Beeck [1] use multiple modalities (eye-gaze, head, and finger as well as a voice command to separate interactions), to reference inside and outside objects.They showed that while finger (91.6%) and eye gaze (96.3%) alone achieve high accuracy, finger plus eye (98.1%) and eye plus head + finger (98.6%) outperform these [1].
Multimodal approaches, however, are still affected by numerous challenges such as alignment, translation, representation, and co-learning [5,6].
While the multimodal approaches showed benefits, it is still unclear how passengers will use the available modalities.The online video-based survey of Ataya et al. [4] addressed this issue, observing which interactions the users chose while performing different non-driving related tasks and which factors could influence this choice.Voice, touch, hand gesture, gaze, and their combination were proposed to resolve maneuver-based (lateral control, longitudinal control, stopping the vehicle, rerouting the navigation) and nonmanoeuvre-based interventions (request information and playing a video) [4] under different cognitive and physical workload such as relaxing, eating, working on a laptop and watching a video.Voice was rated best using the ranking parameters easy to use, satisfaction, naturalness, and usefulness.However, users of the online study were only able to provide their suggestions and impression based on the online videos.Therefore, the study lacks external validity.

Study Design
Our study focused on interaction in an AV that performs the driving task.To evaluate which and how modalities are used, we designed and implemented a within-subjects study with two scenarios: the "Driving" and the "Urgency" scenario.
In the driving scenario, the participants could freely use every available interaction modality while driving through a city.In this scenario, the cognitive load was altered by displaying a video.Thus, this scenario was experienced twice.The driving scenario aimed at the exploratory research question (RQ): RQ1: What impact does the independent variable "cognitive load" have on passengers in terms of (1) modality usage, (2) task load, (3) usability, and (4) trust?
In the urgency scenario, we altered the urgency of the interaction by displaying a countdown timer with 3, 7, and 10 seconds duration.We define urgency as the feeling that a given task must be carried out quickly.These timings were chosen as they represent realistic distances for AV users to interact with objects (e.g., assuming 50km/h leads to ≈ 41m in 3s, 97m in 7s, and 138m in 10s) and induced sufficient stress in internal pre-trials.In this task, the vehicle remained still to be able to study the effect of urgency independent of the driving.Additionally, we altered the task.Participants either had to ask for the height of a building or indicate where the AV should head towards.Therefore, the urgency scenario followed a 3 × 2 design.The urgency experiment was performed as the last condition when the users gained more experience with the system.Before each task, the vehicle's windshield was obscured (see Figure 3) to minimize the participant's possibility of preparation.The urgency scenario aimed at the exploratory RQ: RQ2: What impact do the independent variables "urgency" and "task" have on passengers in terms of modality usage?

Materials
To investigate the four input modalities, i.e., eye-gaze, pointing, touch, and voice, we implemented a driving course in Unity version 2020.3.19f1.

Interaction Concepts.
Our multimodal system provides the users with five input possibilities: touch, voice, voice + gaze, voice + pointing, and voice + pointing + gaze.For voice interaction, we used the Google Speech Recognition Asset [24] in version 4.1.
In the case of pointing and gaze being used, the voice was used to confirm the selection of the object (i.e., as a trigger for the user's activity).When the system detects the combined use of gaze and pointing, we measured the angle between the ray originated by the eye gaze and the pointing direction line (see Figure 4).Fig. 4. If the system detects an input from the gaze and pointing modalities, and the direction's vectors have a distance greater than 10°, the pointed object will be selected for the commands.
Based on the findings of Gomaa et al. [25], we considered an offset of 10°between the target POI and the user's gaze.Then, in the case of an angle greater than 10°, the pointed object will be selected.Otherwise, the input will be defined by the user's eye gaze.This behavior is in line with the algorithm of Roider et al. [50], defining that if the distance between two objects of interest is more than one element's position, the gaze does not refer to the target, and the pointing should be taken into consideration.
We also integrated a display into the used model of a Mercedes Benz F015 with all logos removed and added a simulated touch screen at the passenger's door.For the cognitive load condition, a second screen was added on the driver's side.For this purpose, we augmented the used virtual hands with an additional collider placed on the top of the right index finger.Thus, no real touch interaction with a physical surface was necessary.While we aligned the virtual dashboard with the physical chassis of the mockup (see Figure 5), this was not always perfect.Thus, haptic feedback was not always possible.It should be noted that for pointing, our implementation included only the detection of right-handed inputs defined with the index finger.
3.2.2Apparatus.We used the HTC Vive Pro Eye VR headset with its included Tobii eye tracker.The participant's hands needed to perform the pointing action were reproduced in the VR environment using a Leap Motion Controller [57] (version 5.0.0)mounted to the Vive headset.Finally, to enhance the perceived realism of a drive in a real vehicle and to also accommodate for the physical boundaries that limit gesture interaction, we conducted our study in the driving simulator of the Human Factors department at Ulm University.As this vehicle mockup does not contain physical doors, we added a plastic panel as the door (see Figure 5).

Scenarios
To analyze the user's preferences on the interactions, participants drove in an AV along the streets of Ulm, generated using CityGen3D [55] (version 1.05), building the environment on real-world data.Figure 6 shows the reproduction of Ulm compared to the existing one; the yellow path, long around 2 km, represents the route used for the learning drive.Below is the scene used for the Baseline and cognitive load conditions (see Figure 7), covering about 3 km in the real world.

Driving Scenario.
In the cognitive load condition, participants watched a video due to its moderate but externally valid mental workload required to perform this task [4].We decided to   reproduce a neutral video provided by a German national news channel 1 .The video was played on the left display of the car.The same route was traveled in the control condition but with 12 different points of interest.The tasks were displayed on the right screen with an icon and a short text, showing no reference to any interaction modalities (to avoid priming participants to use specific modalities; see Figure 9).Additionally, the text was read by the voice assistant of the car.Table 6 and Table 7 give an overview of the task descriptions presented in the baseline and cognitive load condition.In both scenarios, the interaction choice is based on the user's preferences.At the end of each driving experience, users were questioned about the studied input modalities.In the case of the cognitive condition, to prove the user's attention, we asked the following questions: "Which vaccine did prime minister Ramelow receive?" and "Which was the color of the jacket worn by the journalist presenting the news program?" 3.3.2Urgency Scenario.Finally, subjects performed the urgency scenario, including the direction decision and building information tasks (see Table 4) that must be performed within 3, 7, and 10 seconds.

Measurements
Objective Measurements: We logged the eye-tracking features of the Vive Pro Eye and the Leap Motion gesture controller.For each task, we logged the input modality, the selected object, the execution time, and the employed voice command.Further, in the case of the "Urgency" conditions, we considered the gap between the correct position of the point of interest, i.e., the building and the location selected by the participant.Finally, we recorded videos of the participants.Subjective Measurements only "Driving" scenario: Regarding task load, we employed the NASA TLX [27] with the six subscales Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration Level.The Post-Study System Usability Questionnaire (PSSUQ) [36] was employed for user satisfaction.The PSSUQ is divided into the overall score, the System Usefulness (SYSUSE), Information Quality (INFOQUAL), and Interface Quality (INTERQUAL).The system's usability was assessed with the System Usability Scale (SUS) [9].Trust in the developed AV was evaluated with the Trust in Automation questionnaire of Körber [34].This is divided into the subscales Reliability, Understanding, Familiarity, Intention of Developers, Trust, and Propensity to trust.Finally, participants commented on the tested system and filled out a basic demographic questionnaire, including questions about their driving and VR experiences.Moreover, users were asked to rank their preferences, from most preferred (4) to least favorite (1), on the input modalities based on the question "Which of the input modalities provided for the system interactions do you prefer?".They were also asked which combinations they preferred.

Procedure
First, participants were introduced to the study procedure and the VR scene.They were informed about the studied conditions and possible interaction modalities.They then signed informed consent and could adjust the seat.The study commenced with a training session, allowing the participants to familiarize themselves with the vehicle and the proposed interactions for as long as necessary (see Table 5).In this session, participants received feedback.In the case of correct selections, a green bullet point was colored in green, otherwise in red (see Figure 2).This was avoided in the later test runs, as such an indication would require an augmented reality windshield which we did not want to prerequire.
Then, following a counterbalanced order, subjects took part in the "Driving Scenario", including 12 tasks with an equal number of POIs inside and outside the AV.After the 12 tasks in the "Driving" scenario, participants performed the "Urgency" scenario with the six conditions each done once in counterbalanced order.Participants answered the stated questionnaires after both conditions (with and without mental workload) in the "Driving" scenario.
The study took approximately 1h.Participants were compensated with 10€.We conducted the study in German.The hygiene concept for studies regarding COVID-19 (ventilation, disinfection, wearing masks) involving human subjects of our university was applied.

Participants
The experiment was performed by N=11 participants (6 male, 5 female), on average M=25.18 years old (SD=4.19).One participant did not have a driving license.This was no requirement as future AVs might not require users to have the know-how or be allowed to drive manually.Seven participants were drivers or passengers for less than 7.000 km in the last year, two persons for less than 15.000 km, one for about 25.000-33.000km, and one for more than 33.000 km.The majority of the subjects (5) were not drivers self, and only one subject was reported for the category "every day", "in workdays", "1-3 times per month", and "1 time per week".Regarding their current occupation, six subjects were students, four were accomplishing training, and one was employed.

Data Analysis
We independently analyzed the two scenarios "Driving Scenario" and "Urgency Scenario".

207:9
Before every statistical test, we checked the required assumptions (normal distribution and homogeneity of variance assumption).When comparing two conditions, t-tests were used for parametric, Wilcoxon Signed Rank tests for non-parametric data.For the factorial analysis of non-parametric data, we used the non-parametric ANOVA (NPAV) by Lüpsen [38] Performance: A Wilcoxon Signed-Rank test found no significant difference for performance (p=0.42).Effort: A Wilcoxon Signed-Rank test found no significant difference for physical demand (p=0.42).Frustration: A student's t-test found no significant difference for frustration (t(10)=-0.94,p=0.37).

Modality Usage and Task Duration.
We summed all the employed modalities as shown in Table 1.For touch, the references to the "Home-Button" were removed.Touch was used most both with and without cognitive load.However, for external referencing, gaze was used either more frequently (with no cognitive load) or almost equally often (with cognitive load).We also plotted the usage count per cognitive load with reference to internal and external reference points (e.g., reference to the window for internal or asking for a building for external; see Figure 10).Figure 10 clearly shows that touch or gaze with voice was used mostly both for internal and external referencing.To use touch for external referencing, one could navigate from the navigation page to the attractions page of the dashboard.Regarding the interaction duration, we logged all interactions (see Figure 11).These were only partially available in both rides to avoid learning effects.Therefore, we could not include the task in the statistical analysis.Regarding the cognitive load, the NPAV found no significant effects on completion time (with: cognitive load M=21.34,SD=17.10;without: M=19.55,SD=14.74).Table 2. Employed modalities per task and time interval.In yellow, we record the instances where no interaction occurred.n stands for the number of occurences.

Modality, Duration, Gap.
A two-way repeated-measures ANOVA was performed to evaluate the effect of the tasks and time intervals on completion time.There was a statistically significant effect of task on completion time ( (1, 1) = 403.76,p=0.03; see Figure 12).For the building, the duration was significantly shorter (M=0.53,SD=2.70) than for the direction (M=2.12,SD=3.40).In  five trials of the building task and 3 of the direction task, participants were not able to provide information in time.Six of these happened in the 3s time interval and 2 in the 7s time interval.
The NPAV found no significant effects on error distance (see Figure 13).

Reasonability Assessment and General Remarks
Variable n Min q 1 x x q 3 Max s IQR Gesture  We asked participants about their impressions of using different modalities.The gesture received the lowest ratings.Voice was rated best, closely followed by touch and gaze (see Table 3).
After all conditions, we asked participants about their assessment of the advantages and disadvantages of each of the modalities and how these could be improved.Most improvement proposals focused on the technical limitations of the systems, that is gesture, voice, and eye gaze detection.As we employed state-of-the-art software and hardware, this indicates that further improvements seem necessary here.Regarding the interaction modalities' general applicability, most agreed that besides voice (which was rated best) and also touch, gestures and eye gaze could become viable options (e.g., "if you look at the building any way you can get information directly", "[gesture] easy to use").However, the participants stressed that visualization is necessary ("When looking or pointing, show a point on, e.g., the target, which the system has recognized as focus.")as one participant stated, "It is difficult to see if the system correctly recognizes the gesture and when pointing, recognizes the correct, e.g., building."

DISCUSSION
We designed two scenarios to evaluate (1) the effect of cognitive load on interaction modalities in AVs and (2) the effect of urgency.In a within-subjects study with N=11 participants, we found, in line with previous work [4,19], that touch and voice-only were preferred, with gaze being used for external referencing.However, while other previous work suggested and made a case for multimodal approaches, our data suggest that besides gaze + voice, other multimodal approaches are less desired and will be used less frequently.

Modality Usage
In line with previous studies [4,19], the voice-only input was confirmed to be the most preferred method for in-car activities.However, as already noted by Qian et al. [46], drawbacks of voice are recognition failures, confusion between specified instructions, and the influence of ambient noise.As noted by our work, these issues can be compensated by the fusion with other input modalities.We agree that providing the users with a multimodal interface will support a more comfortable and less stressful driving experience, increasing the system's reliability, robustness, and performance [1,25].This seems especially relevant when AVs enable passengers to reference external objects (see Figure 10).The menu was designed hierarchical despite attempts from vehicle manufacturers (e.g., see Mercedes Benz's adaptive Hyperscreen [8]) and academia [22] to provide contextual data as hierarchical menus are still state-of-the-art.Despite this downside, touch was still mostly used.We assume that this is due to the habituation effect.As noted by Detjen et al. [19], touch seems more reliable and easy to understand, but it implies a specific hardware location.Gestures imply a higher physical load and effort [19] that could negatively influence the interaction quality compared to the voice or touch inputs.Our data support lower usage of this modality.Another possible reason could be that participants were allowed to point only with the index of their right hand, which could limit their preferences.However, we believe this to be, at most, a minor reason as the index finger, while not used universally [61], seems most useful for pointing.In contrast with other studies [2,49,51,56] which describe pointing as a useful, intuitiveness, naturalness, and simple interaction modality, applicable without training, our data and open feedback suggest that pointing is less relevant.
Gomaa et al. [25] demonstrated that pointing does not suffer from significant effects of distance and environment density.However, in our cognitive experiment, we could not exclude an influence on the system's performance and interaction choice.To address this, we analyzed the interaction preferences in a static situation, i.e., stopping at an intersection.Our data suggest that users need at least 7 seconds to complete such activities.

Effects on NASA TLX, Trust, and Usability Evaluations in the "Driving" Scenario
As planned, the induced workload led to increased mental demand, showing the appropriateness of our method.Interestingly, this did not influence other demands in a significant fashion.Additionally, we found no significant differences for trust.We also found no significant differences for the SUS and only a barely significant difference (p=0.04) for information quality.We did find differences regarding the usage of touch for referencing outside objects: with cognitive load, this was used more than twice as often compared to no cognitive load.Our data suggest that the interaction modalities used are not significantly affected by the presence of cognitive load but that users do default to known interaction modalities such as touch.

Practical Implications
Our scenarios resembled potential real-world applications of using (multi-)modal referencing interactions.We opted for high external validity by allowing users to employ several modalities and their combinations.Our data suggest that a focus on adaptive touch interfaces and improved voice recognition should be the main focus of vehicle manufacturers.While referencing via gaze was also well-received, (visual) feedback seems crucial for understanding the currently selected object, despite advances in multimodal referencing accuracy [1].This could be done via augmented reality windshield displays (which were suggested in numerous previous works [11,12,15,28,37,52,60,62]).However, challenges such as parallax effects remain.

Limitations
One limitation is the moderate number of participants in the study (N=11).As mostly younger participants who were probably more proficient in novel technologies took part, the transferability of the study findings to other age groups is also unclear.Future work should also consider the demographics and characteristics (Quantity of experience, Perceptual sensitivity, Situation selectivity, Reflexivity, and Willingness) proposed by Rainer and Wohlin [47] for the selection of additional participants.For example, older participants or participants with little or much experience with AR should be recruited.Additionally, the study setting limited the possibility of simulating vection influence on interactions [14].While we used VR to add immersion, the vection was already shown to have an influence on the perceived usefulness of interaction modalities [14,29].Regarding technical aspects, we implemented the interactions with state-of-the-art hardware and software.Nonetheless, there were technical limitations in the detection of interactions.However, this is also likely to happen in a moving vehicle.Therefore, these aspects, while reducing internal validity, improved external validity.Additionally, wearing masks will most likely have negatively influenced recognition rates for voice and potentially eyes.Finally, the chosen interaction tasks in both scenarios were chosen carefully and resemble relevant and realistic future tasks.However, this is not an exhaustive list, and the interactions are not necessarily comparable (e.g., because of the size of the objects).

CONCLUSION
This work evaluated a multimodal interactive system to interact with objects in and outside an AV.In a within-subjects study with N=11 participants, in two scenarios, we investigated interaction preferences considering differences depending on the user's mental workload and the effect of perceived urgency and interaction task.Despite the availability of other modalities and combinations, we found that touch and voice were mostly used and preferred.Our work helps determine the relevance of multimodal interaction in a more ecologically valid setting as the use of modalities was not enforced.Therefore, it helps developers and designers in future interaction design.
Open the window of the left front door, using the voice command (e.g.say: "Open the window") and pointing or looking toward the window.Bitte mach das Fenster auf.Benutze dafür den im Display angezeigten Fensterschalter.
Close the window, using the voice commands (e.g.say: "Close the window of the left front door").Bitte ändere die Temperatur auf 19°.Benutze dafür die Sprachsteuerung oder das Touchdisplay (sage z.B.: "Setze die Temperatur auf 19°") Set the temperature to 19°, using the voice commands or the touch display (e.g.say: "Set the temperature to 19°").Benutze das Touchdisplay, um Zugriff auf die Speisekarte des nächsten Restaurants zu bekommen.
Use the touch display to access the menu of the nearest restaurant.Lass dir die Speisekarte des nächsten Restaurants anzeigen.Verwende dafür die Sprachsteuerung (sage z.B.: "Speisekarte des nächsten Restaurants ).
Ask for the menu of the nearest restaurant (e.g.say: "Menu of the nearest restaurant ).
Die Kirche, die im Display erscheint, wird bald zu sehen sein.Schau hin und frage über einen Sprachbefehl was das ist (frage z.B.:"Wie heißt die Kirche?" oder "Was ist das?").Soon you will see the church displayed on the screen.Look at it and using your voice ask what that is (e.g.ask: "What's the name of the church?" or "What is that?"Erkundige dich wozu der Knopf auf der Türe dient.Zeige oder schaue gezielt in Richtung des Knopfes und frage über einen Sprachbefehl was das ist (frage z.B.: "Was ist das?").
Find out what the functionalities of the button on the door.Point or look at the button, and ask what it is using the voice commands (e.g.ask: "What is that?").Biege bitte nach rechts ab, sobald die Brücke überquert ist.Verwende dafür die Sprachsteurung (sage z.B.: "Gehe nach rechts").
Turn right once you have crossed the bridge, using the voice control (e.g.say: "Go to the right" ).Biege bitte nach rechts ab, sobald die Brücke überquert ist.Betätige dafür den rechten Blinker.
Turn right once you have crossed the bridge.Use the right turn button to do this.Informiere dich über die Öffnungszeiten des nächsten Parkhauses.Zeige mit dem Finger auf das Parkhaus und frage über einen Sprachbefehl nach den Öffnungszeiten.
Find out the opening hours of the nearest parking garage.Point your finger at the parking garage and ask for the opening time, using the voice commands.Schalte die Innenlichtfarbe auf rot.Dafür kannst du unter "Einstellungen" die Lichterseite aufrufen.Um Sie zurückzusetzen, schalte die Farbe durch einen Sprachbefehl auf weiß.
Switch the ambient lighting color to red.To do this, navigate to the lights view of the car's Settings.To reset it, use the voice command to turn the color to white.Informiere dich über die nächste Kirche.Zeige mit dem Finger oder schau auf das Gebäude und frag über einen Sprachbefehl wie sie heisst.
Discover information about the nearest church.Point or look at the building and ask its name using a voice command.Informiere dich über die nächste Kirche.Dafür kannst du auf die "Attraktionen" Seite des Navigationssystem navigieren.
Discover the nearest church.For this you can navigate to the "Attractions" page of the navigation system.Benutze die Sprachsteuerung, um dich über das Wetter zu informieren (frage z.B.: "Wie wird das Wetter?").
Ask about the weather using the voice commands (e.g.say: "What's the weather like?").Benutze das Touchdisplay, um dich über das Wetter zu informieren.
Get some weather information using the touch display.Lass dich nach München fahren.Benutze dafür das Touchdisplay.
Let you drive to Munich.Use the touch display for this.Lass dich nach München fahren.Definiere das Ziel über Sprachbefehl (frage z.B.: "Fahre mich nach München.") Let you drive to Munich.Use the voice command to set the destination (e.g.say: "Navigate to Munich").Frag wie hoch ein zufälliges Wohngebäude ist.Zeige oder schaue gezielt in Richtung des Gebäudes und frage über einen Sprachbefehl nach der Höhe (frage z.B.: "Wie hoch ist das Gebäude?" Choose a building randomly and ask how tall it is.Point or look at the building and use a voice command to ask for the height (e.g.ask: "How tall is the building?").
Table 5. Overview of the interaction activities presented during the training condition.

Task Description
Translation Finde heraus wozu der Knopf an der Türe dient.
Find out for what the button on the door is used for.Informiere dich über die Öffnungszeiten der nächsten Bank.
Find out the opening hours of the nearest bank.Erkundige dich nach der Hohe des Ulmer Maritim Hotels.
Ask about the height of the Ulm Maritim Hotel.Biege bitte nach links, vor der nächsten Kreuzung ab.
Turn left before the next intersection.Lasse dir die Speisekarte des nächstes Restaurants anzeigen.
Ask for the menu of the next restaurant.Bitte mache das Fenster auf.
Inform about the tower displayed in the screen.Informiere dich über das Wetter.
Inform you about the next building.Schalte die Farbes des Innenlichtes auf rot.
Switch the color of the ambient light to red.Informiere dich über die Preise der nächsten Elektroladestation.
Ask about the price of the next electric charge station.Lass dich nach München fahren.
Let you drive to Munich.Task Description Translation Der Brunnen, der im Display erscheint, wird bald zu sehen sein.Informiere dich über den Namen.
The fountain that appears in the display will appear soon.Find out about the name.Bitte ändere die Temperatur auf 19°.
Inform yourself about the next building.Biege bitte nach rechts, vor der nächste Kreuzung Turn right before the next intersection.Informiere dich über das Wetter.
Ask about the weather.Informiere dich über das Ibis Hotel.
Ask about the name of the next train station.Frage, wozu dient der Knopf an der Ture.
Find out for what the button on the door is used for.Informiere dich über die Öffnungszeiten des nächsten Parkhauses.Ask about the opening time of the next parking garage.Schalte die Farbes des Innenlichtes auf blau.
Set the color of the ambient light to red.Informiere dich über das nächste Gebäude.
Inform yourself about the next building.Lass dich nach München fahren.
Get a ride to Munich.
Table 7. Overview of the interaction activities presented during the control condition.

Fig. 2 .
Fig. 2. With correct pointing or selection via eyegaze, the bullet point is green, otherwise red.

Fig. 3 .
Fig. 3. Obscured windshield during the switch between each run in the urgency scenario.

Fig. 5 .
Fig. 5. Participants pointing to a building.Driving simulator of the Human Factors department at Ulm University with a transparent plastic panel.

Fig. 6 .
Fig. 6.Comparison between the real and virtual city of Ulm used for the familiarization drive.

Fig. 7 .
Fig. 7. Comparison between the real and virtual city of Ulm used for the base and cognitive drive.

( a )Fig. 8 .
Fig. 8. Example of buildings considered for the interactive tasks.

Fig. 9 .
Fig. 9. On the bottom of the right display was shown the visual and textual task description.

Fig. 12 .
Fig. 12. Completion time.Negative values indicate an entered command after the end of the interval but prior to the next task.
. The alpha level was 0.05.R in version 4.2.3 and RStudio in version 2023.03.0 was used.All packages were up to date in April 2023.NASA TLX.A student's t-test found no significant differences in the total NASA TLX score (t(10)=-1.86,p=0.09).Mental Demand: A Wilcoxon Signed-Rank test found a significant difference for mental demand.With the video (M=10.64,SD=6.19), mental demand was significantly higher than without (M=6.27,SD=4.45) as per our study design.Physical Demand: A Wilcoxon Signed-Rank test found no significant difference for physical demand (p=0.14).Temporal Demand: A Wilcoxon Signed-Rank test found no significant difference for temporal demand (p=0.05).

Table 3 .
Table of reasonability of the interaction modalities.n stands for the number of participants rating each interaction modality.

Table 6 .
Overview of the interaction activities presented during the cognitive condition.Effects of Urgency and Cognitive Load on Interaction in Highly Automated Vehicles Proc.ACM Hum.-Comput.Interact., Vol. 7, No. MHCI, Article 207.Publication date: September 2023.