Effects of 3D Displays on Mental Workload, Situation Awareness, Trust, and Performance Assessment in Automated Vehicles

Novel display technologies, such as lightfield displays, have become increasingly available. In the automotive domain, these are already in use or are to be used in the future. However, their effect on the user is yet to be explored. Therefore, we conducted a within-subject study (N=15) comparing a baseline visualization of information about automated vehicle functionality with it being visualized either via a LumePad or a LookingGlass display. Interestingly, we found almost no significant differences, thus, indicating that the display technology is less relevant for conveying automated functionality.


INTRODUCTION
Novel display technologies focus on improved color-gamut performance [6], reduced power consumption [22], better contrast ratio [50], display thickness, size, and visualization capabilities, including 3D display technology [31].While the first 3D displays are commercially available (e.g., [21,28,37]), there has not been widespread adoption.For example, the 3D display market in the USA in 2021 is estimated to be around 162 million $ [47] while the general display market in the USA in 2021 is approximately 43.4 billion $ [48], resulting in a share of 0.37% for 3D displays.Displays are used to visualize various information, including spatially distributed data such as maps.3D displays could provide unique benefits for visualizing such spatially distributed data.
Previous work using 3D displays has shown benefits for surgeries [5,53], air traffic control [35], spatial ability [42], and target search [54].In the automotive context, Broy et al. showed that 3D displays do not increase mental workload compared to 2D displays [4] and can even improve task completion times and interface attractiveness [3].While these task-focused measurements were important in the era of manual driving, the focus of relevant dependent variables shifts with fully automated vehicles (AVs).
With AVs, users will not need to take over the vehicle.However, it is crucial that the technology is accepted and trusted for AVs to be able to positively impact mobility.Previous studies have shown overtrust or distrust in this technology.Both overtrust or distrust could lead to an undesired adoption of automation.Therefore, calibrated trust [41] should be the main goal [11,15].Previous work has shown that visualizing uncertainty can lead to a better understanding of AV capabilities [11,15].These visualizations were either tested in (simulated) 2D displays using virtual reality (VR) [10], or simulated augmented reality windshield displays in remote online monitor-based studies [11,15].However, current high-class vehicles such as the Mercedes Benz S-Class already integrate 3D displays 1 .With advanced displays, the so-called Halo Effect [43], i.e., attributing capabilities based on unrelated characteristics, could come into play.This means that these displays, with their "shiny" effects, could lead to higher trust in the AV despite not contributing to the actual relevant capabilities.Previous research has explored the Halo Effect [43] in automotive user interface design and found correlations with perceived vehicle capability and trust [9,15,23].However, some results were inconsequential [10].One way trust and other capability assessments are influenced, according to the Halo Effect [43], could be via unrelated characteristics.In the automotive domain, this could be, for example, the design of the chassis but also the visualization technology within the AV.
To investigate this phenomenon, we conducted a within-subject study with N=15 participants, evaluating the effect of three displays on mental workload, situation awareness, trust, and performance assessment (e.g., recognizing other vehicles or pedestrians) in AVs.
The displays included (1) a baseline, where the LumePad acts as a standard display, and lightfield technology as implemented by the (2) LumePad 1 or (3) the LookingGlass LKG-2K-05780 (LookingGlass 8.9 Gen1) display.We visualized the prediction for other vehicles and the planning of the own trajectory in Unity on three different display types.We hypothesized that the display type has an effect on mental workload, situation awareness, trust, and performance assessment based on the Halo Effect [43].This work is only possible with actual 3D displays and cannot be done, for example, in VR due to missing 3D display implementations for this device type.These displays have only recently become widely available.Interestingly, we only found a significant difference in assessing whether AV recognized other vehicles.We employed Bayes factor analysis to test whether these non-significant differences actually stem from a lack of difference between the conditions.We found moderate evidence for this regarding trust and the AV's capability to drive.
Contribution Statement: Our main contributions are (1) the definition and implementation of visualizations regarding vehicle and pedestrian recognition and intention on two 3D displays and (2) the results of a user study with N=15 participants regarding the effects of 3D displays on mental workload, situation awareness, trust, and performance assessment in AVs.

RELATED WORK
This work builds on prior work on general 3D display technology and its advantages, especially with a focus on automotive appliances.Additionally, we report the results of previous work on general information visualization in the automotive context.

Information Visualization in Automated Vehicles
While the display technology is particularly relevant in this work, the displayed information is also relevant.Therefore, we introduce prior work on information visualization with a particular focus on mental workload, trust, and situation awareness.Previous research has evaluated various methods for communicating decisions, detections, destinations, regulations, and navigation information to users in the context of AVs.For example, Löcken et al. [36] proposed using ambient light to inform users of AV decisions, and Wilbrink et al. [59] suggested using light strips to indicate the AV's intentions or perceptions.Lindemann et al. [34] employed an augmented reality (AR) heads-up display (HUD) to highlight potential hazards such as pedestrians, which they found to result in higher situation awareness in low and high visibility scenarios than only having the basic elements of speed and navigation information.
The concept of "calibrated trust" [41] refers to a state where the user's trust in an automated system is appropriate to the system's capabilities, which can help prevent associated issues with overand under-trust.
Koo et al. [32] investigated the effect of providing "how" and "why" information on trust in semi-autonomous vehicles and found that explanatory information led to the highest trust, although providing information on the vehicle's behavior could lead to cognitive overload [32].However, combining both types of information resulted in the safest driving behavior.Häuslschmid et al. [25] evaluated the impact of visualizing the vehicle's current situation interpretation using a world in miniature or a simulated chauffeur avatar and found that the world in miniature led to the greatest increase in trust, although participants had varying opinions on whether such a visualization was necessary.In VR, Colley et al. [10] compared the cognitive load of a tablet-based with an AR version to display pedestrian intention and found that the tablet version was rated significantly higher in cognitive load.Currano et al. [16] also tested an AR HUD and found that the results varied depending on the dynamic nature of the scene and the reported driving style of the participants, and therefore concluded that the HUD should be adaptable to these factors.Schneider et al. [49] evaluated explanations given via an AR HUD and an LED strip and found that user experience increased with the explanations.However, the combination with a post-explanation via a smartphone app did not increase the user experience.Finally, Colley et al. [13] evaluated an abstract representation of the perceived objects and showed that no futuristic visualization is necessary but that already current HUDs could provide appropriate information.
Recent research by Colley et al. [11] examined the impact of visualizations of the semantic segmentation task.The authors argued that prior studies in this domain have relied on abstract representations that do not facilitate the user's ability to comprehend the sources of uncertainty.Through their study, they discovered that the simulated AR Windshield Display (AR WSD) did not have any significant effect on trust or cognitive load.The subjective measure of situation awareness showed an improvement, and the users rated the recognition-related attributes to be better.
Finally, Colley et al. [15] evaluated the visualization of scene detection, prediction, and maneuver planning.This combined previous work [10,11] and added visualization of predicting other vehicles' trajectory and the own maneuver via trajectories.The study evaluated the impact on trust, mental workload, situation awareness, and perceived safety.The results showed that visualizations related to Situation Prediction were perceived as inferior to the other levels.
However, what is currently missing is the investigation of whether such visualizations on Situation Understanding and Prediction could benefit from novel display types.We thus evaluate how display technologies affect trust, mental workload and situation awareness.We visualize other road users and their and one's own intention to indicate what the AV recognizes, predicts, and plans (see Figure 3b).

Effects of 3D Displays in the Automotive Context
3D displays can be categorized based on the technologies of lightfield display, integral imaging, and volumetric 3D or holographic 3D displays [46].These differ in various performance metrics such as color presentation, parallax effects, and interaction possibilities (for a review, see Ren et al. [46]).3D displays were investigated in various contexts including surgeries [5,53], air traffic control [35], spatial ability [42], target search [54], and navigation [7].In their survey, McIntire et al. [39] found that "41 (58%) of the experiments showed 3D to be better than 2D regarding the performance measurements of interest; ten (14%) showed mixed results; and 20 (28%) of the experiments showed no benefit to using 3D" [39, p. 4].
The use of 3D dashboards in the automotive industry has been the subject of considerable research to assess their potential benefits and drawbacks.While it has been established that excessive parallaxes in 3D displays can cause discomfort to users, as reported by Broy et al. [2] in their study, they do not appear to increase visual load or driver distraction [1].Weidner and Broll [55] evaluated the zone of comfort of stereoscopic 3D dashboards for vehicles, finding a "budget ranging from 23.8 ± 2.9cm in front of the dashboard to 22.7 ± 5.5cm behind it" [55, p. 1].
In a driving simulation study, Weidner and Broll [56] instructed participants to perform two simplified cognitive tasks: a change detection task and a list selection task.The study's results indicate that the presentation of stereoscopic 3D content does not have a statistically significant impact on the performance of secondary cognitive tasks or driving performance, as determined by analysis of the entire dashboard and its individual subareas.
However, there were also positive effects.Studies such as that conducted by Pitts et al. [45] have demonstrated that 3D displays can enable shorter glance times and faster object recognition for users.Furthermore, in the context of partially autonomous vehicles, where drivers must take over control of the vehicle occasionally, quickly gaining awareness of the situation is crucial.In such scenarios, stereoscopic 3D dashboards have been shown to be beneficial for situation assessment and performance, as demonstrated by Weidner and Broll [57,58].
Broy [1] have conducted extensive research on 3D displays in automotive contexts.Their studies have aimed to evaluate the potential advantages and disadvantages of these displays and have investigated the discomfort that they can induce in users, as well as the risks they may pose in in-car environments.Furthermore, in their study [4], they found that there was no significant difference in workload between 2D and 3D displays and that the latter did not have a negative impact on the driving scenario.Additionally, they showed that 3D displays could increase accuracy for expected events, decrease task completion times, and make the interface more attractive [3].In take-over scenarios, 3D displays also have been shown to improve driver re-engagement [1].However, Broy et al. [4] only visualized an abstract representation of a street and upcoming events such as navigation cues, traffic signs, and traffic information (such as traffic jams) but not spatially more complex information such as maneuver planning trajectories what we look at in our study.
Another effect is the so-called Halo Effect [43], i.e., attributing capabilities based on other (unrelated) characteristics, was shown in the general HCI context [8,30].However, literature in the transportation and, more specifically, in the automotive domain, is inconsequential.
Frison et al. [23] have shown that how a vehicle agent interacts with its users influences their expectations of the vehicle's capabilities.Colley et al. [9] also could show that for AVs communicating and answering pedestrians were perceived as more intelligent and evoked higher trust.Work that also visualized different parts of the inner workings of AVs such as scene detection, situation prediction, and maneuver planning [15] also discusses whether the relatively error-prone assessment of intentions led to the significantly lower ratings of lateral control, thereby representing a Halo Effect.
Despite these positive showings of the Halo Effect, Colley et al. [10] report that while the trust was increased by the AR visualization, the assessment of longitudinal and lateral control was not increased.While lateral and longitudinal control remained on the same level, Colley et al. [13] showed that any visualization led to better critical situation perception significantly higher perceived intelligence and intention of developers.Finally, in urban air mobility, Colley, Meinhardt et al. [14] also evaluated different visualization with the assumption that better visual aesthetics could influence trust in urban air mobility.However, they do not discuss this effect further.
Based on these results, we were interested in whether the display technology would lead to altered trust and capability assessments.

EXPERIMENT 3.1 Research Question and Hypotheses
The following research question (RQ) guided our study: RQ: How do different display technologies influence (1) mental workload, (2) trust, (3) situation awareness, and (4) the attributed capabilities of the AV?
Based on previous work on information visualizations, 3D displays, and the Halo Effect [43], we hypothesized that the display technology could lead to higher trust, situation awareness, and attributed capabilities and lower mental workload.We used the Windridge City asset for Unity and recreated the ego trajectory and the predicted trajectories of the other vehicles by Colley et al. [15].Unlike in previous work, these visualizations were shown not in a simulated AR WSD but on an additional display (see Figure 1).While this was shown to increase mental workload [10], this arrangement is both currently feasible to study and could be implemented in vehicles easily.The additional display did not show the entire scene but a reduced visualization only including the  road, its users, pedestrians, and predicted vehicle trajectories both for the own vehicle and others (see Figure 3b).The detection of pedestrians and vehicles was included implicitly as the additional display can only show what the simulated AV has detected [11].For the different displays, which are shown in Figure 2, we used the official SDKs.

Materials
The driving scene was shown to the participants on a Samsung QLED 4K Q60A 75" TV.The distance was chosen to be both appropriate for immersing in the scene and not noticing pixels.
The used 3D displays are both lightfield displays but differ in some characteristics.The Lumepad 1 is a 10.8-inch tablet.As a front sensor array tracks a user's face, it is only viewable from one angle at a time.The Looking Glass 8.9 Gen1 has a size of 8.9 Inches with a viewing angle of 50°viewable from different angles simultaneously.

Measurements
Based on our research question, we employed the mental workload subscale of the raw NASA-TLX [24].This uses a 20-point scale ("How much mental and perceptual activity was required?Was the task easy or demanding, simple or complex?"; 1=Very Low to 20=Very High).Furthermore, we used the subscales Predictability/Understandability (Understandability from here) and Trust of the Trust in Automation questionnaire by Körber [33].This questionnaire measures Understandability via agreement on four statements ("The system state was always clear to me.", "I was able to understand why things happened."; two inverse: "The system reacts unpredictably.", "It's difficult to identify what the system will do next.") using 5-point Likert scales (1=Strongly disagree to 5=Strongly agree).
Trust is measured via agreement on equal 5-point Likert scales on two statements ("I trust the system." and "I can rely on the system.").Also, participants stated the level of perceived safety via four 7-point semantic differentials from -3 (anxious/agitated/unsafe/timid) to +3 (relaxed/calm/safe/confident) [20].We additionally measured the perceived quality of situation awareness with the situation awareness rating technique (SART) [52].This situation awareness [19] may be a predictor of "how a person will choose to act on that SA" [19, p. 86].Also, we employed the subscales Performance ("I would have performed better than the automated vehicle in this situation"), Judgement ("The automated vehicle made an unsafe judgment in this situation"), and Reaction ("The automated vehicle reacted appropriately to the environment") from the Situational Trust Scale for Automated Driving [27].
Furthermore, the following 7-point Likert scale questions were asked based on visualizations of object recognition, prediction, and maneuver planning (in brackets, we write the abbreviation for reporting purposes): • "The automated vehicle recognizes all vehicles in every situation perfectly" (recognized vehicle) • "The automated vehicle predicts all vehicle paths in every scene perfectly" (predict paths) • "My vehicle recognized where the other vehicles will drive to very well." (certain drive to) • "The automated vehicle has perfect longitudinal guidance (braking, acceleration, ...)" (longitudinal guidance)  • "The automated vehicle has perfect lateral guidance (keeping track, staying on the road)" (lateral guidance) • "It was always clear what the automated vehicle will do next" (clear AV will do next) • "The board computer has helped me to better recognize road users" (perceive road users) • "The board computer helped me perceive distances better" (perceive distances better) • "The board computer helped me to perceive my surroundings better" (perceive surroundings better) • "How would you rate the driving style of the automated vehicle?" (1 = completely safe to 7 = completely dangerous) (driving style) Finally, general questions about 3D displays and their usefulness and feedback regarding concerns, improvement proposals, and other thoughts were asked qualitatively (with open-ended text answers).

Procedure
Prior to the initiation of each session, participants were required to sign a consent form and receive an explanation of the simulated scenario.Each participant underwent the simulated ride with the AV three times (constituting a within-subjects design).Each iteration of the course lasted approximately 4 minutes and 30 seconds.To control for order effects, the displays were presented in a counterbalanced order across the three rides.The scenario was introduced as follows: You will take a ride in a highly automated vehicle through a town.The vehicle automatically steers, brakes, and accelerates (lateral and longitudinal guidance).The automated vehicle will try to predict the other vehicles' future maneuvers (=intention) via its onboard sensors and determine its trajectory accordingly.Additionally, there will be a board computer on which all recognized traffic participants will be displayed.While participating, you are supposed to imagine sitting in such an automated vehicle, following the entire journey attentively, and then assessing it.
Therefore, the task of the participant was solely to observe the simulated automated ride.The difference in the conditions was solely how the additional visualization of the AV was presented (i.e., conventional display, LookingGlass, LumePad in Lightfield mode).The duration of the study, including all three rides, was approximately 45 minutes on average for each participant.Participants were provided with a monetary compensation of 8€.

Data Analysis
Prior to every statistical test, we checked the required assumptions (normality distribution and homogeneity of variance assumption).Due to the within-subject design, we employed either Friedman tests (non-parametric data) or one-way ANOVAs (parametric data).
For post-hoc tests, we used Holm correction and employed Dunn's test if not stated otherwise.Additionally, we used the BayesFactor package [40], using Jeffreys-Zellner-Siow (JZS) priors, to compute Bayes factors.Interpretations were made according to Jeffreys [29].R in version 4.3.1 and RStudio in version 2023.06.1 was employed.All packages were up to date in August 2023.

Participants
The age ranged from 22 years to 60 years, with an average age of M=28.73 years (SD=9.03).Five participants reported their gender as female and ten participants as male.Of those 15 participants, nine were still studying, four were employed, and one did not specify their employment status.The participants have held a driving license for an average of M=10.10 (SD=6.90)years.Seven participants indicated they drive "daily", one participant chose "on working days, three picked "3-4 times a week", one was "one time a week", two picked "1-3 times a month", and the last participant was driving "less than 1 time a month".One participant drove "25000 -32999 km", seven participants drove "15000 -24999 km", four participants drove "7000 -14999 km", and three chose "less than 7000 km" per year.

Statistical Analysis
Descriptive data can be found in Table 2.
First, we evaluated every dependent variable with the appropriate inferential test (Friedman or repeated measures one-way ANOVA).We only found one significant difference.A Friedman test found that there was a significant difference in the assessment of recognized vehicles (question: "The automated vehicle perfectly recognizes all vehicles in any situation") ( 2 (2)=6.06,p=0.05).However, post-hoc tests showed no significant differences between the conditions.
Then, we used the BayesFactor package [40] to determine whether there was evidence for no differences between the conditions (see Table 1).
Table 1: BayesFactor analysis to evaluate whether there is evidence to support that there is no difference between the conditions.If the User model is favored, this implies evidence against an influence of the conditions.If the User + Condition model is favored, this implies that the conditions have an influence.Bayes factors provide a continuous quantification of relative evidence, where a Bayes factor exceeding 1 indicates evidence supporting one of the models (usually referred to as the numerator), while a Bayes factor less than 1 indicates evidence in favor of the other model (the denominator).For example, a value of 2 "2 times more probable under the null compared to the alternative hypothesis" (see website of [38]).

Dependent Variable
Evidence BF Favoured Model Mental Workload [24] Anecdotal 1/2.99 = 0.33 User model Understanding [52] Anecdotal 1/1.55 = 0.65 User model Demand [52] Anecdotal 1.27 User + Condition model Supply [52] Moderate 1/5.87 = 0.17 User model Situation Awareness [52] Anecdotal 1/1.52 = 0.66 User model Trust [33] Moderate 1/5.07 = 0.20 User model Understandability [33] Anecdotal 1/1.43 = 0.70 User model Perceived Safety [20] Anecdotal We found that for most dependent variables, there is anecdotal evidence supporting no difference.For Supply, Trust, and longitudinal and later guidance, there is moderate evidence (see Table 1).Participants ranked the display technology ("Please rate the display technologies from best to worst").A Friedman's ANOVA found a significant difference (see Figure 4).The post-hoc test showed that the LookingGlass was significantly better ranked compared to the LumePad both in 2D and using it in 3D mode.The qualitative feedback showed that this is mostly due to the lower eye strain induced by the LookingGlass.

Qualitative Feedback
The qualitative feedback, on the one hand, showed that two participants did not find the 3D display to offer relevant advantages (e.g., "Not useful!2D would be enough for me because you don't necessarily have a better view through 3D." or "Interesting but I'm not sure if they are useful.A great feature or addition in a car to stand out, but not really needed IMO [in my opinion].").On the other hand, two other participants stated that the 3D displays enhanced their spatial understanding: "Unlike 2D, 3D gives you a slightly better feel for ranges and the spatial environment.The Looking Glass was more pleasant to look at and less strenuous on the eyes, unlike the LumePad." and "The[y] are very useful.Its easier to predict distances on the 3D displays." Only two participants provided feedback regarding concerns, improvements, or other thoughts, with one respondent requesting higher compensation and another suggesting visualization of navigation information.

DISCUSSION
This work investigated the effect of display technology on mental workload, situation awareness, trust, and performance assessment in AVs.Using Bayes factors, we were able to show moderate evidence that trust and subjective assessment of lateral and longitudinal guidance are not affected by the display technology.In line with prior work [4], we found no significant differences in the mental workload when comparing 2D and 3D displays.In the following, we discuss our results with regards to the larger literature.

3D Displays -Necessity, Gimmick, Nice-to-Have, or Distraction?
There has been some media hype about 3D displays.While most articles tend to lean on the positive side (e.g., 2 , 3 ), there are also negative commentaries (e.g., "Mercedes-Benz doesn't say what this technology might be used for, or what kinds of benefits it could provide -which is disconcerting considering it's an idea that failed to provide any usefulness the case of those two doomed smartphones" 4 ).While the 3D display received qualitative feedback suggesting positive effects on user acceptance and the appearance, our data suggests that there is little to no impact of this on users in AVs.The question arises whether such displays, as they are already available to consumers 5 , are a necessity, a gimmick, nice-to-have, or even a distraction.Previous work on scenarios unlike automated driving confirmed that they have no significant effect on mental workload [1].Other positive effects such as situation assessment and performance [57,58] or shorter glance times [45] become less important with the proliferation of automation.According to our data, we suggest that 3D displays in the context of AVs and visualizing information about its inner workings could be considered a nice-to-have feature.However, they offer a sense of pleasantness and hint at advanced technological capabilities.In addition to emphasizing calibrated trust and acceptance [44], the automotive industry is increasingly focusing on user experience [51] due to its influence on users' engagement with vehicle interfaces and interiors [23], and its recognized value as a unique selling feature [17].Thus, it remains to be investigated how 3D display technologies can enhance user experience and assist users in non-driving related tasks.
Our data suggests that there are either no significant differences or even evidence for no difference.Therefore, we suggest that the development of 3D displays has to be properly adjusted to the specific use case.For the automotive use case, we assume that there is negligible effect for information visualizations regarding the AV's situation understanding and prediction, therefore, rendering current advances in the industry mute (e.g., 6 ).The currently used 3D displays are, however, expectedly safe to use.

Methodical Reflections
Due to the unavailability of consumer-grade vehicles capable of automated driving and having incorporated 3D displays, we conducted a lab-based study.The driving scene was visualized using a 75" TV, therefore, there was no actual depth there.This might have altered the mapping between the driving scene and the 3D display.While a real driving scene as captured via dashcams could have been used and post-processed to display relevant information (e.g., where pedestrians are), to the best of our knowledge, there is no solely RGB-based prediction of other vehicles' intention.Therefore, a Unity scene was required in which the future position of all objects was known.
Based on these methodical constraints, we believe that our approach was valid to elicit the first results regarding 3D displays in AVs.Nonetheless, this should be re-evaluated in a real-world scenario once possible.
We also applied Bayes factors to draw conclusions about the absence of effects.This is a "coherent approach to determining whether non-significant results support a null hypothesis over a theory, or whether the data are just insensitive." [18, p. 1].

Limitations
There are two main limitations: The sample size for this study was rather small, with only 15 participants.This limits the generalizability of the findings to a larger population.The study was also conducted in a laboratory setting, which does not fully capture the complexities of real-world environments.Specifically, the setting was not in a vehicle, which could have a significant impact on the results [12,26].Therefore, these findings may not be directly

CONCLUSION
Overall, this work provides additional insights into the effects of 3D displays in an automotive context.In a user study with N=15 participants, we found evidence that 3D displays possibly do not alter trust and perceived AV capabilities.Nonetheless, participants highlighted that they find it interesting and sometimes even useful.They also clearly ranked the LookingGlass as their favorite technology representative.This work helps to focus on relevant novel technologies other than 3D displays and their use cases in AVs.

OPEN SCIENCE
The implementation will be made available to interested 3rd parties.

Figure 1 :
Figure 1: A participant sitting in front of the study setup with the LumePad as an onboard computer (a).

Figure 2 :
Figure 2: The two devices used in the study up close.The distortion in the figure stems from the 3D effect.

( a )
Recording of passenger view.(b) Recording of the onboard computer.

Figure 3 :
Figure 3: The participants' scene view (left) and the onboard computer view (right).

2 ( 2 ) 15 Figure 4 :
Figure 4: Ranking of the devices.The plot shows the individual data points, the bar charts, the density curve, the mean values, and the significant post-hoc tests.