Investigating the Effects of External Communication and Platoon Behavior on Manual Drivers at Highway Access

Automated vehicles are expected to improve traffic safety and efficiency. One approach to achieve this is via platooning, that is, (automated) vehicles can drive behind each other at very close proximity to reduce air resistance. However, this behavior could lead to difficulties in mixed traffic, for example, when manual drivers try to enter a highway. Therefore, we report the results of a within-subject Virtual Reality study (N=29) evaluating different platoon behaviors (single vs. multiple, i.e., four, gaps) and communication strategies (HUD, AR, attached displays). Results show that AR communication reduced mental workload, improved perceived safety, and a single big gap led to the safest merging behavior. Our work helps to incorporate novel behavior enabled by automation into general traffic better.


INTRODUCTION
Automated vehicles (AVs) will change trafc both for vulnerable road users (VRUs) such as pedestrians [17,30,37] as well as for manual drivers [11,58,59].Anticipated benefts such as reduced accidents [68] and improved fuel usage either by improved driving [1] or via novel behaviors such as platooning [1] drive the development of these systems.Especially platooning, that is, a system in which several (automated) vehicles can drive behind each other at a very close distance of 10m or less [1,3,38,41] with the aid of a technical control system without compromising road safety, has been researched extensively [1] and is being evaluated with trucks on real streets since 2018 1 .
While numerous works investigated the crossing process of VRUs (e.g., the impact of vehicle shape [25], modality [20], or scenario [16]), the potential encounters of AVs with manually driven vehicles ("manual drivers") have been less investigated.In these encounters, there exists the possibility for uncertainty, for example, when the road is partially blocked [45,58].Another source of uncertainty to manual drivers is unknown driving behavior by AVs.One particularly well-evaluated approach for AVs to improve road efciency is platooning.However, this introduces "a new cooperative driving paradigm for drivers"[1, p. 1].In the very likely transitional phase during which AVs and manual drivers co-exist on the roads, AV approaches must account for human behavior and vice-versa.With platoons in mind, there could be two difculties: merging into a, potentially, long platoon on highway entries and traveling in a platoon as a manual driver.In this work, we focus on the highway entry with an approaching platoon.Driving onto a highway today is already a complex scenario [71].With an approaching platoon, this could become unfeasible; therefore, the platoon's behavior must be adapted.While the platoon could dissolve prior to every entry, this would make it very inefcient.Therefore, diferent approaches are required.With real-world testing undergoing since 2018 and with legislature coming up to date with AV technology (e.g., in Germany2 ), the emergence of this scenario becomes within reach in the next decade and has to be designed in a safe way.
Therefore, we investigated a highway merging scenario with an approaching car platoon in a within-subjects virtual reality (VR) study with N=29 participants.To address the complexity of highway entry, there are two alterations possible: altering the behavior and altering available information to the manual driver.Therefore, we altered the platoon behavior by providing single gaps (35 or 70m long) or multiple gaps (30 or 45m long) and provided three communication designs: Augmented Reality (AR)-based, using a Head-Up Display (HUD), or attached displays on the AVs alongside a baseline.We found that the large single gap and the AR communication were clearly preferred.The AR concept also signifcantly increased trust and predictability.The large single gap also led to signifcantly fewer crashes.
Contribution Statement: This work provides two platooning behaviors usable for the merging of manual drivers onto automated highways.Three communication concepts (AR, attached displays, and a HUD) are also defned.Findings of a VR study with N=29 participants show that the large single gap and the AR communication were clearly preferred and led to higher trust and fewer crashes.

RELATED WORK
This work is based on research in the feld of external communication of AVs toward manual drivers.However, we also provide general insights from studies on external communication of AVs towards pedestrians as these can also guide the design for other road users.

External Communication Towards Vulnerable Road Users
Vulnerable road users such as pedestrians or bicyclists, that is, road users not protected by an outside shield [37], communicate in trafc scenarios with ambiguous right of way via eye gaze or gestures.Recent research has looked into how a missing human driver in a vehicle that is communicating with vulnerable road users (e.g., by waving or nodding) could be substituted or whether this communication process could even be enhanced [20].The design of these concepts varies regarding modality, message type, and communication location [17].Also, situation characteristics have to be considered.While several modalities have been evaluated, visual communication concepts were mostly used [21].Current research focuses on scalability [10,22], accessibility [21], and social aspects [62].Also, potential challenges regarding passenger actions, such as pedestrian mode confusion, are evaluated [13,23].

External Communication Towards Manual Drivers
While communication to VRUs has gained signifcant attraction in recent research, communication toward manual drivers has been investigated less.Therefore, we recapitulate the fndings of related work on any communication towards manual drivers to derive our communication concepts.Rettenmaier et al. [58] investigated the communication between an AV and a manual driver in such a bottleneck situation.In their task, two parked automobiles block the road, forcing both the MV and the AV to utilize the oncoming trafc lane to pass the parked vehicle.As a result, both cars must cede the right-of-way to the oncoming vehicle in an uncontrolled circumstance.As a result, communication is essential.The communication is accomplished using displays on the AV's radiator grill or road projections.The screens depicted German road signs, which are commonly used to control trafc during bottlenecks.The study found that displays outperformed projectors in terms of visibility, especially at larger distances [58].Because the driver's gaze is largely oriented upon the approaching AV, the display is detected earlier.Furthermore, the projections are transparent and hence less apparent than the screens.Nonetheless, Miller et al. [50] also found that perceivable implicit communication, that is, clearly visible movement, can sufce for bottleneck scenarios.
Also, in a bottleneck scenario, Rettenmaier et al. [57] changed the visual signals to communicate the AV's purpose to the manual driver.
This study yielded the best design for communication visualization in the form of a red or green colored arrow.
Klaus et al. [42] discussed various forms of communication.They diferentiate between four external human-machine interfaces (eHMIs): "marking a vehicle as an automated vehicle", "light strips", "display," which shows texts or symbols, and "projection".They believe that the latter two eHMIs are more benefcial for communicating with other road users since their displays and projections can present a huge quantity of information (textual or pictorial).
Avsar et al. [2] investigated an intersection (i.e., a T-junction) scenario.Here, an AV had the right of way but still yielded to a manual driver, communicating this via a "360° LED light-band" [2].They discovered that the eHMIs improved perceived safety and trust in the AV and that handling was assessed as simpler.This reinforces the notion that eHMIs improve trafc and communication with manual drivers.
Schlackl et al. [65] used the whole visible car body.They investigated the zipper procedure at a road extension (i.e., a construction site).The zipper procedure is also performed when merging onto a highway, as is done in our experiment.The eHMI had three parts: the hood, the rear, and the driver's side.In the shape of a dot, the side recorded the location of the observed vehicle driving next to the AV.This was to compensate for the lack of eye contact.The trafc status was depicted as a symbol on the vehicle's back.Triangles pointing and moving in the direction the participants were intended to turn in were visible on the front of the AV.Pictograms and triangles were "much better than the ordinary headlight fasher" [65, p. 82].
Colley et al. [11] investigated two scenarios.First, they evaluated a scenario with emergency vehicles where the AV would communicate the situation or block the intersection for vehicles other than the emergency vehicle by driving into it.While external communication aided in understanding the behavior, participants were still confused by this novel behavior.In the second scenario, a four-way intersection was investigated.In this deadlock scenario, external communication led to a lower mental workload and fewer accidents, showing the potential of this communication.
These works show that eHMIs have the potential to communicate with other manual drivers but that also implicit communication can sufce.We draw from these previous works in designing our communication concepts and further the understanding how such communication can afect drivers in a novel situation, i.e., the merging into a platoon.While one zipper scenario was evaluated, there currently is a knowledge cap in understanding merging scenarios with platoons.

Platooning of Automated Vehicles
Due to diferent objectives and motivations, various approaches exist for platooning strategies.While some strategies seek to establish a commercial feet or implement platooning as a service [48], others are interested in increasing safety, reducing congestion, or simply compensating for the lack of skilled drivers [5].These strategies can be distinguished according to the following criteria: ( In this study, we focused on a homogeneous platoon where longitudinal control is exercised, and no additional infrastructure is needed.
We opted for a homogeneous platoon as this reduces unintended side efects such as having to merge in front of vehicles with variable sizes or diferent colors as this might infuence attributions of speed [24].Only longitudinal control was simulated, as swaying to diferent lanes is potentially dangerous if other vehicles are present.
The SARTRE [5,60,63] strategy is an example of such a solution.
The idea is to develop and implement trafc solutions so that it will be possible to do platooning without having to modify the infrastructure.It defnes a platoon as a heterogeneous combination of vehicles led by a heavy manual vehicle.All following vehicles orientate themselves to the lead vehicle.Using V2V communication, the specifed distances and paths are communicated to the following vehicles.Therefore, longitudinal and lateral control is needed in this strategy.Important to mention here is that in this strategy, platoons do not have a fxed size but behave dynamically.Vehicles can join or separate dynamically.Moreover, in an emergency or with loss of communication, the local system, i.e., the vehicle itself, will take over control again.Kühn et al. [44] have already investigated human reactions to potential AVs on highways in various scenarios.They conducted a simulation study to determine whether human drivers could, in general, detect an AV based on its driving behavior, fnding that human drivers could recognize this quite well.They already had appropriate ideas about potential interaction scenarios and possibilities about AVs and the way they would behave.Various scenarios were investigated for this purpose, including one where the subject had to enter the highway and line up in front of an AV.It was found that it was more pleasant for the subject when the AV kept a relatively large safety distance after the vehicle had entered the lane [44].However, Kühn et al. [44] only investigated single-vehicle scenarios.
van Loon and Martens [69] have collected diferent concepts that address the compatibility problems during this transition phase of mixed trafc.Optimal driving patterns are often assumed in theory.However, this cannot always be guaranteed, especially in mixed-trafc situations [69].For instance, on the highway at an on-ramp, one cannot always switch to the left lane to ease merging for entering vehicles, as human drivers are also present and usually want to pass a slower vehicle.Therefore, AVs in mixed trafc will need to be able to co-exist and cooperate with manually driven vehicles (i.e., achieve so-called "backward compatibility").Thus, AVs require the ability to anticipate the behavior of vehicles unequipped with the latest technology based on measurable indicators such as their speed or acceleration and to react accordingly.Clearly, it would also be important to have forward compatibility, i.e., unequipped vehicles interacting with an AV and not noticing any diference from a human driver.However, this will be less apparent due to the nature of AVs, as they will be designed to drive efciently and safely, which may contrast with human driving behavior [69].Woodman et al. [74] previously studied low-speed platoons of AVs in inner-city environments via focus groups.They used low-speed autonomous transport systems (L-SATS), which were equipped with eHMIs to communicate with pedestrians.They assumed a platoon that does not operate completely autonomously.At the start of the platoon, the AV is supervised by a human who can intervene in an emergency.Within this context, it was interesting that the participants had more confdence in the stopping capabilities of the AVs than in those of normal vehicles.One of the aspects the participants were concerned about was the spacing of the individual AVs.Thus, on the one hand, they were concerned about safety and whether the AVs would not crash into each other.On the other hand, they were concerned that people or animals could walk between them if the distance became too great.Gilbert et al. [33] investigated the public perception of the number, afliation, and purpose of AVs in shared lanes.In their survey, they found that feets, and especially feets associated with a private agency, were perceived as less positive compared to single or public agency-associated AV feets.Therefore, we refrained from defning the association to a private or public agency.Finally, Razmi Rad et al. [54] evaluated the impact of a platoon driving in a dedicated lane next to manual drivers.They found that manual drivers accepted shorter gaps when driving next to the dedicated lane.This overview shows that the topic of platooning is challenging both from the technical as well as from the human factor perspective.While a lot of work has gone into improving the technical capabilities, only theoretical [69], survey [33], or focus groups [74], simple [44] or scenarios with dedicated lanes [54] were conducted to approach human factor challenges. Empirically evaluating drivers' behavior in more complex (i.e., multiple vehicles, multiple lanes) scenarios with diferent platoon behaviors and communication concepts to support the manual driver is currently missing.

EXPERIMENT
We designed and conducted a within-subject study to evaluate the efects of various platoon behaviors and communication concepts.This study was guided by the exploratory research question (RQ): RQ1: What impact do the variable platoon behavior and communication have on manual drivers in terms of (1) driving behavior, (2) mental workload, (3) trust, (4) perceived safety, and (5) communication quality?
These measurements were used in previous literature to assess diferent behaviors and communication possibilities.We expect that lower mental workload and higher trust, perceived safety, and communication quality are desirable aspects of the merging process.Due to the exploratory nature of this study, we refrain from stating hypotheses.Behavior was measured by eye gaze and driving-related measures regarding the merging, such as distance and speed (see Section 3.2.1).The measured behavior was important to adequately assess objective metrics for the merging process.
Every participant experienced 17 conditions, resembling a 4 × 4 design plus a baseline (no communication and no gap adaption).The independent variables were platooning behavior (Single Small Gap, Single Large Gap, Multi Small Gaps, Multi Large Gaps) and communication concept (No Communication, AR, eHMI, HUD).In the baseline, the platoon would not alter a gap, so no gap other than the initial 5m between each vehicle was created.

Materials
3.1.1Virtual Reality Simulator.We modeled the scenarios in Unity version 2021.3.10f1[67].We used a Thrustmaster T150 Pro steering wheel with pedals and an HTC VIVE Pro Eye.This allowed the participants to look over their shoulders and, for example, check the blind spot.We used the Simple Trafc System asset [66] with several modifcations to generate trafc.Participants drove a vehicle with 5m length, 2.05m width, and a height of 1.7m.This approximately represents a Volkswagen Passat Limousine.

Platooning
Behavior.The platoon operated with a distance of 5m between AVs and consisted of 103 vehicles.This number was chosen to ensure that the participant will encounter the platoon when merging.While previous work often assumes a maximum platoon size of < 10 [49,75] to achieve high trafc stability, Zhou and Zhu [75] showed that higher numbers increase capacity and are feasible without too high degradation in trafc stability.The platoon traveled at a speed of 130km/h, the typical velocity, for example, recommended on German highways.
We implemented two diferent types of platooning behavior in the case of merging: forming one gap (see Figure 1a) or multiple (i.e., four consecutive gaps in our study) smaller gaps (see Figure 1b).As we were interested in diferent required sizes for the gaps, after initial testing, we chose 35 and 70 m for the single gap and 30 and 45m for the multiple gaps.To determine the proper gap sizes, we followed the recommended safety distance on German highways.According to German law, the safety distance to the preceding vehicle has to be large enough to ensure that the driver can stop their car without any problems if the preceding vehicle brakes unexpectedly [55].For this purpose, it is common to use a rule of thumb that states that the distance to the preceding vehicle should be approximately the equivalent in meters of half the own current speed [55].As all vehicles in our study were traveling at 130 km/h on the highway, the correct safety distance, therefore, would be 65m.We used this recommendation as a starting point for the gap sizes.Initially, we tested gap sizes of 15m (multiple small) and 30m (multiple big gaps).Based on this, we determined that the gap size should be at least 30m to keep the task from being too challenging and reduce potential crashes.This led to the condition of multiple small gaps with 30m distance.Following the recommended safety distance, we chose 70m as the maximum gap size, which is even slightly larger.We chose 35m for the single small gap size as it is half of the single big gap size and also still higher than the fnable distance [55].As we also wanted to consider efciency, we could not make the gaps too big, as, for example, a gap size of 50m would result in a total size of 200m with multiple gaps.This would only leave a 50m bufer at the entrance to the highway, which we considered difcult for the driver.Therefore, we again oriented ourselves to the recommended minimum distance and used the gap size of the single big gap as an appropriate bufer.This resulted in a gap size of 45m (250 − 70 = 180, divided by 4) for the multiple big gaps.
The gap(s) were formed as follows: when the participant drove past a trigger in the curve leading to the acceleration lane (see "gap trigger" in Figure 1), the gap was formed approximately 200m to the left of the start of the acceleration lane (see "wall trigger" in Figure 1).In internal tests, this distance was found to be appropriate to allow the gap to be at the end of the acceleration lane when the participant continued to drive regularly.

Communication Concept.
We implemented three communication concepts based on prior work along with having no communication as the fourth level of the factor communication concept.
AR: The AR visualization (see Figure 2a) employs a simulated AR windshield to directly (i.e., contact analog) display gaps that were created for merging.This concept closely relates to the concept of Lorenz et al. [46] who found that visualizing the permitted/suggested area (via AR in green) for driving after a takeover led to better reactions than visualizing the prohibited/unfeasible area.The simulated AR windshield extends to the side windows as frequent shoulder checking will be required for entering the highway.
eHMI: The attached display (called eHMI; see Figure 2b) refers to concepts currently evaluated in external communication of AVs with manual drivers such as Colley et al. [11] and Rettenmaier et al. [58].The green arrow signals which gap to join, while the red line indicates an unfeasible gap.The eHMI had high visibility from the driver's perspective.
HUD: The HUD (see Figure 2c) does not directly incorporate information on which gap can be joined but only that there is a feasible gap by displaying the word "Free".This approach requires the least technical adaptation.In the current version, the HUD only contains information about the gap.Therefore, cognitive overload is unlikely due to the textual nature of the information.Additionally, the text is marked in green (see Figure 2c) to facilitate its recognition.We were also inspired by the HMIs incorporating information for the takeover process, which often contained text (e.g., see [12,28,70]).Nonetheless, Charissis and Papanastasiou [6] explain that "HUDs overloaded with information, especially those using textual output, can create the efect known as cognitive capture [6, p. 43].Therefore, future interfaces should evaluate the use of pictorial visualization.
AR and eHMI represent localized information, that is, information regarding the merging process is located at the relevant position.The HUD represents aggregated information.This enables us to study the efects of these diferent kinds of information presentation.

Objective measurements:
We measured position, speed, and eye-tracking data at 66 Hz.We logged several areas of interest: every vehicle (defned by a unique id) and the diferent visualizations (AR, HUD, eHMIs).For the AR visualization, only when the gaze was upon the green area was logged as a fxation (i.e., not the entire window that was simulated as being an AR windshield), for  the HUD, the area of the text was defned as gazable area, and for the eHMI, the displays on the vehicles were the area of interest.Additionally, we logged the time needed overall, time to arrival at the ramp, time on-ramp, the time needed to join, whether the participant joined the platoon in one of the gaps (boolean), and whether the participant let the entire platoon pass (boolean), coordinate of the join (x,y,z), which gap was used to join (int), distance to the front and back car at join (in m), the speed at join (in km/h), join attempts (int), crashed with AVs (boolean), and the number of crashes (int).These measures enabled us to answer our RQ with regard to driving behavior.

Subjective
Measurements.Participants rated their perceived reacts unpredictably.","It's difcult to identify what the system will do next.")using 5-point Likert scales (1=Strongly disagree to 5=Strongly agree).Trust is measured via agreement on equal 5point Likert scales on two statements ("I trust the system."and "I can rely on the system.").The system was introduced as the whole platoon.In addition, situation awareness was assessed using a selfdefned statement ("It was clear to me at all times what the other road users were doing.").The behavior ("The behavior of the other vehicles was clear.")and the communication quality ("The communication for fltering/merging into trafc was clear.") were also assessed using agreement on a 7-point Likert scale (1=totally disagree to 7=totally agree).After all 17 conditions, participants could provide positive and negative open-ended feedback.safety using four 7-point semantic diferentials from -3 (anxious/agitated/unsafe/timid) to +3 (relaxed/calm/safe/confdent) as used by Faas et al. [29].We also employed the mental workload subscale of the raw NASA-TLX [34] on a 20-point scale ("How much mental and perceptual activity was required?Was the task easy or demanding, simple or complex?"; 1=Very Low to 20=Very High).Additionally, we used the subscales Predictability/Understandability (Understanding from here) and Trust of the Trust in Automation questionnaire by Körber [43].Understanding is measured using agreement on four statements ("The system state was always clear to me.", "I was able to understand why things happened.";two inverse: "The system

Procedure
Each participant began by signing a consent statement.Every participant had to have a valid driver's license.Following that, we gave each participant a brief overview of the study.Afterward, participants could familiarize themselves with the steering wheel and the VR environment by driving onto the highway at least three times.In these trial runs, there was reduced trafc and no platoon approaching.On average, participants tried the scenario fve times.Participants were allowed to drive as long as they wanted.Following this, the participants encountered all seventeen conditions  We introduced the diferent visualization and gap concepts to the participants before each condition via printed sheets and encouraged questions.The following texts were combined depending on the condition (Translated from German): No Visualization There is no visualization.
HUD The text "Free" is displayed on the windshield as soon as there is a gap between the vehicles.Visualization on Vehicle There are screens on the sides of the vehicles that either tell you not to drive up here (red) or show you by arrows which gap is coming up for you.This variant could be either implemented using AR technology in the own vehicle (e.g., via a HUD) or by adding displays to the AVs, a concept currently explored in academia and industry [4,8].In our implementation, we envisioned displays attached to the AV.AR The gap(s) between vehicles intended for you are displayed in green directly in the world.
Single Gap There will be one gap between the vehicles.
Multiple Gaps There will be multiple gaps between the vehicles.
After each condition, participants flled out a questionnaire with the questions outlined in Section 3.2.2.At the end of the study, participants completed a fnal questionnaire in which they were able to provide open-ended feedback.The study took approximately 60 min.Participants were compensated with 10€.The study was conducted in German.

Participants
We determined the required sample size via an a-priori power analysis using G*Power in version 3.1.9.7 [31].To achieve a power of .8 with an alpha level of .05,28 participants should result in an anticipated medium efect size (0.24 [32]) in a within-factors repeated measures ANOVA with seventeen measurements.
N=29 participants (5 female, 24 male) took part, which we recruited via mailing lists and social media.Unfortunately, one participant had to be discarded because of simulation sickness, leaving us with 28 samples for the subjective data.We had to discard eyetracking and other objective data from fve participants due to technical issues, leaving us with 24 valid datasets.As each participant experienced each condition once, this leaves us with 24 * 17 = 408 recorded merges.
Participants had a mean age of M=24.64 years (SD=2.15,between 22 and 31 years).Every participant held a valid driver's license.On average, they held the license for M=4.86 (SD=0.76)years.
Most participants (16) drove less than 7.000 km, six drove between 7.000 and 24.999 km in the past year, four between 25.000 and 32.999 km, one participant drove over 33.000 km, and one did not drive in the last year.One participant uses a vehicle daily, one at least on weekdays, fve one to four times a week, fve once a week, six one to three times a month, eight less than once a month, and two stated even less.

Data Analysis
Before every statistical test, we checked the required assumptions (e.g., normality distribution).As no data was normally distributed, for the non-parametric data, we used the ARTool package by Wobbrock et al. [73] as the typical ANOVA is not appropriate with non-normally distributed data and Holm correction for post-hoc tests.The procedure is abbreviated, as in the original publication, with ART.The ART was always conducted as two-way with both communication and platoon behavior as independent variables.We also included the participant as a random factor.The error bars shown in Figure 3, Figure 4, Figure 5, and Figure 6

Eye Gaze Data
We split the eye gaze data into two subsets of areas of interest.Figure 3 clearly shows that participants focused mostly on the AVs that make up the platoon.Additionally, we can see that in all scenarios but the baseline without a gap, the rear mirror is looked at more than the left mirror.
Figure 4 shows the areas of interest of the communication concepts.The AR concept was consistently looked at, no matter which platoon behavior was simulated.This was also the case for the eHMI.Interestingly, the HUD received very few fxations in the MultiBigGaps but higher (more than double) fxations in the Sin-gleSmallGap behavior.
We then used a Dirichlet regression.Its usage in this context is motivated by the specifc nature of gaze distribution data collected from participants across diferent scenarios.The data involves nonnegative proportions that sum up to one (across the areas of interest), so it qualifes as compositional data.Traditional statistical methods designed for unconstrained data may lead to incorrect inferences when applied to this data type.Dirichlet regression, as suggested by Hijazi and Jernigan [35], provides a more appropriate statistical approach, ensuring that the inherent constraints of the data are considered.This makes it suitable for analyzing the relative contributions of diferent aspects within the collected gaze distribution data.
The Dirichlet regression was conducted to examine the relationship between the combined AOIs and the independent variable Scenario.The analysis revealed signifcant efects for various scenarios across diferent categories of gaze distribution including Null (i.e., everything but the AOIs), AV, HUD, Rear Mirror, Left Mirror, AR, and eHMI.
Similar patterns of signifcant efects were also found in other categories such as HUD, Rear Mirror, and Left Mirror.
The overall model ft was satisfactory, with a log-likelihood of 7668.667 on 119 degrees of freedom and AIC of -15099.33,BIC of -14621.99.The analysis used the DirichletReg package in version 0.7.1, with a log link function and common parametrization.5).Interestingly, with multiple small gaps, there were more crashes than with one single gap.Participants in the baseline performed approximately as well as the communication concepts.
Letting Platoon Pass.In all 384 recorded highway entries excluding the baseline drives, participants never let the entire platoon pass.In the 24 recorded baseline drives, one participant let the entire platoon pass.
Distance to Automated Vehicles.The ART found a signifcant main efect of Platoon Behavior on Distance to Front Car at Join ( (3, 69) = 7.04, p<0.001).In addition, the ART found a signifcant interaction efect of Communication × Platoon Behavior on Distance to Front Car at Join ( (9, 207) = 1.97, p=0.044; see Figure 6a).While the large gap behaviors naturally had higher distances, the communication concepts altered these by up to 5m.In general, with AR, these were rather high.However, the HUD led to highest distances to the front car for the large gaps.
The ART found a signifcant main efect of Platoon Behavior on Distance to Back Car at Join ( (3, 69) = 78.57,p<0.001).The ART found a signifcant interaction efect of Communication × Platoon Behavior on Distance to Back Car at Join ( (9, 207) = 2.42, p=0.012; see Figure 6b).For all visualizations, the single large gap led to the highest distances.For the other behaviors, there was no clear pattern discernible.However, the single small gap was always among   We also logged which gap participants joined the platoon at (see Table 1).Most participants joined in the frst gap if multiple were available.However, in total, also 19 joined behind the platoon gap(s) (i.e., in the second half) and 3 before the platoon gap(s).This means that they still entered the platoon and did not jump ahead of it or let it pass entirely.
Additionally, we logged the number of attempts to join the platoon (see Table 2).This was defned as the number of times that the participant switched between the acceleration lane and the highway.Most participants joined on the frst attempt.However, in total, also 19 joined on the second, 2 on the third, and 1 on the fourth attempt.The ART found a signifcant main efect of Platoon Behavior on time on acceleration lane ( (3, 69) = 3.11, p=0.032).A post-hoc test found that MultiSmallGaps was signifcantly higher (M=5.95,Finally, we logged how many participants remained in the platoon at the end of each condition (see Table 3).Interestingly, most participants remained in the platoon.

Task Load Index
The ART found a signifcant main efect of Communication on TLX Score ( (3, 81) = 5.87, p=0.016).A post-hoc test, however, found no signifcant diferences in TLX Score.The ART found no signifcant main efect of Platoon Behavior (p=0.059)nor an interaction efect (p=0.46) on the TLX Score.
Physical Workload.The ART found a signifcant main efect of Communication on physical workload ( (3, 81) = 6.04, p<0.001).A post-hoc test found no signifcant diferences in the physical workload.
The ART found no signifcant main efect of Platoon Behavior (p=0.254)nor an interaction efect (p=0.419) on physical workload.
Performance.The ART found no signifcant efects on performance.
Frustration.The ART found no signifcant efects on frustration.

Clarity of Behavior and Communication Quality
The

Ranking
After all the conditions, participants ranked the communication (i.e., no communication, the HUD, the display, and AR) and the behavior (no gap, single gap, multiple gaps).Friedman's ANOVAs were conducted to compare the mean rankings (lower means better).For communication, a Friedman's ANOVA showed a signifcant diference ( 2 (3)=29.53,p<.001).Post-hoc tests showed that the AR visualization (M=1.71) was ranked signifcantly better than all other communication.The eHMI display (M=2.00) was ranked second and signifcantly better than no communication and the HUD).There was no signifcant diference between the HUD (M=2.96) and no communication (M=3.32).
For behavior, a Friedman's ANOVA also showed a signifcant diference ( 2 (2)=41.21,p<.001).Post-hoc tests showed that no gap was rated signifcantly worse than the single gap (M=1.32) and the multiple gaps (M=1.71).There was no signifcant diference between single and multiple gaps, but the single gap was preferred.

Open Feedback
Participants, in general, gave positive feedback about communicating gaps.One participant proposed the combination of HUD and AR.The participants highlighted that the HUD requires perfect timing to accurately display when it is safe to merge, which they believed to be difcult.One participant also stated that it was not clear that waiting would have been a possibility for them.

DISCUSSION
In this work, we present an under-evaluated scenario regarding the introduction of AVs and their novel behavior, i.e., platooning.Platooning might occur at highway accesses, making it difcult for manual drivers to enter.Therefore, novel behaviors and their communication are necessary.In line with prior work in adjacent areas such as AV-pedestrian communication [7,16,26,61] and bottleneck scenarios Colley et al. [11], Rettenmaier et al. [58,59], we show the positive aspects of external communication.In addition, we found clear evidence that the AR visualization, which was modeled after previous work in takeover requests [46], was preferred.

On the Introduction of Automated Vehicles Into General Trafc
Human factors are often overlooked when designing and implementing AV concepts [1].However, with the clear possibility of having manual drivers alongside AVs, there is a necessity to introduce concepts that enable manual drivers to be mobile safely.
In this work, we also showed that despite the focus on manual drivers, crashes occurred (see Table 4).However, we showed that a human-centric evaluation can lead to platoon behavior and communication concept that lead safe human driver performance.
In particular, the Single and Multi Big Gaps behavior with any communication led to no crashes.Instead of only looking at objective values for improving trafc efciency, the human aspect has to be investigated even more [1].This also becomes clear with the recent rise in studies on the communication of AVs with other, partially vulnerable road users [21,59].
There remains the question of whether platoon behavior should, for example, automatically adapt to highway ramps by providing additional space.This could alleviate the need for human drivers to adapt their behavior, which will most likely be present for at least another 30 years (based on currently bought manual vehicles).This would lead to reduced advantages of platoon behavior regarding driving efciency.Regarding the frequency of such a "platoon breakup", in Germany, there are approximately 3727 (243 interchanges and more than 2,260 access points and 94 interchanges (crossings) [27], 430 managed [51], and 700 unmanaged [56] service areas) ramps for 13200 km of highway, therefore, every 3.54 km a ramp (with the longest distance being 23.9 km [72].Therefore, platooning efectiveness would be severely reduced.

Balancing Safety and Efciency
We introduced two diferent platooning behaviors with each two diferent distances to other AVs.Unsurprisingly, the larger distance was perceived better and led to lower crashes.The large single gap is the worst in terms of trafc efciency, as one the AVs have to slow down the most.However, weighing the benefts and drawbacks should, in our opinion, clearly favor the safer variant.Therefore, while research on making trafc efcient is crucial, the human factor has to be included.Potentially, we tested a too-large gap.Future work could conduct a stepwise refnement to explore the parameter space (i.e., the inter-vehicle distance) to reduce the safer (and larger) distance to fnd a more efcient yet still safe balance.

On the Visualization Design
Our results show that the localized information via AR or eHMI was rated signifcantly better in most dependent variables.We assume that this stems from the need to quickly take in and process a lot of information.This preference is also clearly visible in the rankings.Therefore, we assume that the most important latent factor in the study design is localization of the information.Regarding the implementation of the designs, we envisioned that the technology necessary, for example, for the AR visualization, would be available.In line with previous work in the automotive domain, this AR visualization showed signifcant improvements [15,18,53], for example, for cognitive load [9].Nonetheless, this technology is not yet available.Thus, eHMIs or even HUDs could still help make these scenarios safer.Regarding the HUD, the specifc design should be then reevaluated, as the text "Free" could be cognitively more demanding than a symbol and does not provide as much information as the localized designs of eHMI and AR.

Practical Implications
This work clearly shows that platooning behavior and platoonmanual driver interaction must be considered in developing automated vehicle platoons.We show that the proposed behaviors lead to lower crashes and higher trust, which is in line with previous work in adjacent areas [11,16,58].Trafc analysts, therefore, have to design platooning behavior with the user in mind.A simulation of the proposed behaviors can show which behavior is more efcient, enabling policymakers to determine the best balance between safety and efciency.Therefore, simulations as presented by Zhou and Zhu [75] have to be updated to incorporate manual drivers.

Limitations and Future Work
A moderate number of participants took part (N=29).As mostly younger participants (on average, 24.62 years old) took part, it is unclear whether this work's fndings are transferable to diferent age groups.Additionally, the participant group skewed male.We assume that in a more general population, the tendency towards less risky behavior, that is, longer gaps and clear communication, would be even stronger [40,47,52].Our study participants were also all from the same cultural context, i.e., Germany.Therefore, future work should assess cultural diferences for the efects of platooning.Also, transferability to real-world scenarios is difcult to assess.While much caution was given to realistically designing the scenario and reducing motion sickness symptoms, one participant had to abort due to motion sickness.The collisions also let us believe that participants would have behaved more cautiously in the real world.As the scenarios involved manually driving a vehicle, the experiment could also beneft from using simulators with higher degrees of freedom (e.g., [14] or [36]).Additionally, we assume sophisticated visualization technologies.However, our insights should be confrmed with actual capabilities and with other technology available, such as 3D display [19].Future work should also evaluate the efect of diferent modalities [39] and onramp geometries, as we expect these to have a signifcant efect on the merging scenario.Finally, the study took one hour, which could have led to fatigue.However, we employed counter-balancing to avoid order efects, and Schatz et al. [64] found "that even after 90 minutes of active testing, participants' quality gradings were still reliable despite the presence of measurable signs of fatigue" [64, p. 1] for auditory subjective Quality-of-Experience assessments.Therefore, we believe that the study's results are valid despite the study's length.

CONCLUSION
In this work, we proposed diferent behavior and communication concepts for platoons of AVs at highway accesses.We modeled the scenario and conducted a within-subjects study with N=29 participants.Our independent variables were platooning behavior (single gap or multiple gaps with diferent distances) and communication (AR, HUD, displays, no communication).The results of our study show that the proposed behaviors mostly improve objective and subjective measures regarding safety and trust.However, the number of crashes was also low without any communication.With multiple gaps available, these were also used by participants.The Single Large Gap and the AR communication led to the overall best results.To better understand the limits of gap sizes, future work should look more closely at stepwise refnements.This work aids the safe introduction of AVs in general trafc.41 (a) Single gap.(b) Multiple gaps, where the platoon creates a total of four consecutive gaps.

Figure 1 :
Figure 1: Platooning behavior used for the VR study (vehicles not to scale).The solid black rectangle represents the starting position, the striped rectangle an arbitrary position on the acceleration lane.The start point is approximately 100m before the turn, visible on the lower left side.

Figure 2 :
Figure 2: Communication concepts used in the study.
Interaction efect on Distance to Front Car at Join.Interaction efect on Distance to Back Car at Join.
1) what type of control is exercised (longitudinal such as keep- , represent the bootstrapped confdence interval.R in version 4.3.2 and RStudio in version 2023.09.1 was employed.All packages were up to date in December 2023.For descriptive data of objective values, see Table 4.

Table 1 :
Description of where participants joined the platoon.One participant did not join in the condition SSG No comm.

Table 2 :
Number of attempts required to join the platoon.One participant did not join in the condition SSG No comm.

Table 3 :
Number of participants remaining in the platoon at the end of the condition.