Improved Situational Awareness and Performance with Dynamic Task-Based Overlays for Teleoperation

Teleoperation remains an essential mode of robot control, especially in hazardous, uncertain environments. However, it presents multiple challenges such as latency, limited depth perception and constrained field of view. In this work, we introduce dynamic 2D GUI designs that leverage psychophysics principles to align with human perception and decision-making. These designs seamlessly integrate depth information, force feedback, and system status by overlaying real-time task-relevant data onto either 2D or 3D views. To assess their effectiveness, we compare them with HMD displays. The results indicate superior user task performance, marked by fewer “control mode” switches, reduced errors and collisions, and a 100% task completion rate. Furthermore, users reported heightened task confidence levels and improved situational awareness when using the dynamic 2D GUI designs.


INTRODUCTION
Robotic operation modes range from manual, teleoperation control to full autonomy, often combining modes.Teleoperation generally refers to the remote control of robotic systems, such as a dual arm manipulator (Fig. 1), enabling human operators to perform tasks in environments that are hazardous, such as military and rescue operations [2,4], or tasks demanding a high degree of precision, such as robotic surgery [14,18].While autonomous robots are designed for specific tasks, teleoperated robots offer flexibility, enabling operators to perform a wide range of tasks in diverse environments.Therefore, teleoperation remains an essential element in robotic systems and is often the default mode of operation [21].It usually involves low-level control of independent joints using cameras and monitors, which provide an indirect means of overseeing the robot's actions.
Despite its advantages, teleoperation presents unique challenges.The level of autonomy inversely correlates with the number of degrees of freedom (DOF) that the user needs to control, demanding greater expertise from the operator when autonomy is reduced [4].This becomes particularly challenging due to limited sensory information, lack of force feedback, latency and low visual feedback quality, making tasks that are simple for humans difficult to achieve remotely.
Johannson's human grasping study [24] highlights the importance of sensory inputs (Kinesthetics, Force Feedback, Haptics, Visual Feedback) in multi-degree-of-freedom system teleoperation [18].External wearable devices like Head Mounted Displays (HMDs) and Force/Haptic Feedback devices, aim to solve these issues [11,16,18], but could lead to problems like cognitive overload and motion sickness [13].In contrast, operating with a standard 2D monitor, a viable alternative, presents challenges such as limited depth perception, restricted field of view, latency, low video resolution, and lack of force feedback.However, 2D monitors remain the most common, affordable, and widely accepted mechanism for users to receive visual feedback.But their limitations can impact the operator's understanding of the environment and task, potentially leading to a Situational Awareness (SA) problem.SA is defined as the perception of environmental elements and events with respect to time or space, the comprehension of their meaning, and the projection of their future status [8].
To address these challenges, designing effective teleoperation interfaces requires a deep understanding of human perception, decision-making, and action.This interdisciplinary research spans psychology, psychophysics, and human factors engineering.With increasing system complexity, data integration should be strategic, and prioritize decision support over data overload, while considering human working memory limits [3,22].In this work, we present a streamlined graphical user interface (GUI) system grounded in these principles to tackle the remote SA challenge.Our prototype design is evaluated in-house at our research laboratory, using a dual-arm mobile manipulator equipped with a range of sensors (Fig. 1).Our objective is to create an intuitive GUI for low-level teleoperation in uncertain environments, without the need for external wearable devices.This GUI adheres to SA theory principles, delivering essential information based on the operational mode (navigation or manipulation) to ease the cognitive load.It also ensures stable operation under controlled latency.Moreover, it incorporates visual sensory substitution techniques and 3D-to-2D projection using a standard monitor which eliminates the need for wearable devices.

RELATED WORK
Human-Machine Interfaces (HMIs) have evolved with the rise of data-rich complex systems.However, SA was not typically a central consideration in GUI design, particularly when using standard monitors.The different SA levels In recent years, there has been a rapid evolution of Head-Mounted Displays (HMDs), initially designed for gaming and simulators, and subsequently for robotic teleoperation [11,16].Together with fisheye cameras and cameras mounted on gimbals, HMDs provide enhanced depth perception and spatial awareness through stereo imaging and camera-based views [26].Yet, HMD use may increase cognitive load and affect performance [1].
In [1], researchers assessed the cognitive impact of AR displays.They compared a standard monitor, Spatial AR, and HMD-based AR. Results showed that using an HMD led to reduced performance, longer task completion times, and higher self-reported cognitive load when compared to a standard monitor.Overlaying information on screens is common in teleoperation.NASA pioneered this approach with overlays showing forces and predictive displays [5].However, these suffer from issues discussed earlier.Furthermore, Augmented and Mixed Reality (AR/MR) technologies are now widely adopted for robotic teleoperation.In [9], the concept of visual haptics allowed users to see indications on remote gripper fingers upon object contact, reducing the cognitive burden for remote manipulation and increasing task success.Furthermore, Virtual Fixtures (VF), an example of leveraging AR for teleoperation, have been used to overlay sensory data, and can significantly boost operator performance [20].VFs also play a critical role in surgical robotics [14], and can aid trajectory following guidance [10,15], rather than using forces and torques.This further improves performance and enhances safety.Incorporating force feedback proves to be the most effective fixture in reducing latency-induced errors in mitigating cumulative errors [15].
From the works outlined above, SA is marginally improved through the development and adoption of cumbersome wearable devices.Furthermore, improvements brought by 2D standard monitors have not been fully addressed.Also, technologies like AR and VF which use overlays have numerous issues such as gain shaping and scaling of forces and torques which must be solved before a robust, stable implementation can be achieved [15].We address these issues by developing novel techniques for 2D monitors while strategically leveraging 3D views.The following section explores the hardware and software system design details.

HARDWARE SYSTEM DESCRIPTION
A dual-arm mobile manipulator with a suite of sensors is used for this work (Fig. 1).It is a ClearPath Husky base integrated with two Universal Robots UR5 arms with one Robotiq 2-finger gripper.Its sensors include a UM6 inertial measurement unit (IMU), a SICK LMS511 2D LiDAR, a 6 DOF NetFT force torque sensor, an Intel RealSense D435 camera, and two 4K Kodak PixPro SP360 cameras.Also, a 3Dconnexion SpaceNav mouse is used as an input device.
The robot operates using two synchronized ROS masters, each running on separate computers.One master handles all the main robot and sensor functions, while the other manages the cameras.This configuration ensures continuous camera feed in the case of a main system reboot due to malfunction.To meet security requirements, the robot is connected via a wired Ethernet tether.Figs. 2 and 3 provide a detailed overview of the network and the designed ROS architecture.
The RealSense cameras are strategically mounted on both wrist joints of the arms to provide the closest possible view during remote object manipulation.For a comprehensive 360 • view, the two Kodak cameras are positioned back-to-back, approximately 60 above the centers of the arm bases to achieve optimal coverage with minimal blind spots, as seen in Fig. 4.  The camera positions were iteratively fine-tuned based on user feedback gathered from experienced teleoperators actively participating in remote robot navigation tasks.This approach ensures that the resulting positions not only meet functional requirements but also provide an intuitive perspective for novel users, offering a first-person point of view during remote robot navigation.It is important to note that while the mounting for the Kodak cameras remains fixed, we conducted empirical investigations into the feasibility of a mobile setup.Our findings highlighted two crucial points: 1) The introduction of additional moving components may increase operational complexities, and 2) Altering the point of view could potentially result in orientation loss, thereby negatively impacting SA.However, the nuanced implications of this decision warrant further investigation, which falls outside the current scope of our work, underscoring the need for future exploration and in-depth analysis.The laser scanner is located at the front-center base of the Husky for collision detection.The NetFT force-torque sensor is positioned between the wrist joint of UR5 and the gripper to supply force-torque data externally applied on the end effector (EE).

SOFTWARE IMPLEMENTATION OF SA IN 2D GUI
In this work, the GUI exclusively focuses on presenting information to the user, without incorporating the user input interface design.The software was developed using Python scripts with OpenCV

Navigation Mode User Interface
The navigation user interface is selectively activated when the user chooses "Navigation mode".This interface presents two screens: on the left, two images display the front and rear portions along with robot state information, while the right screen provides a 360 • view from the SP360 camera mounted atop the robot.The navigation mode GUI was carefully designed to address key issues with spatial perception, latency, and depth sensing as discussed below.
4.1.1Spatial Perception.To enhance spatial perception, we leveraged a specialized rendering technique from the visualization plugin, rviz_textured_sphere.This renders panospheric camera outputs as spherical images within the ROS visualization software, RViz [25].While initially designed for HMD applications, this approach enables users to observe and navigate the environment effectively.Since the image is monocular, depth perception remains challenging, especially for obstacle avoidance during navigation.The image's distortion may make objects appear closer than they are on the monitor, and there may be blind spots.To address these issues, we implemented two essential features.The first feature introduces scaled Linear Perspective lines, creating a "Ponzo illusion" [19].This technique is very efficient as it requires low computational resources and no specialized measuring devices.Moreover, it offers a very intuitive solution, similar to parking assistance systems.It is worth noting that the proper drawing of perspective lines is vital, as not all combinations of straight lines can create the desired illusion.The size and placement of these lines depend on three key factors: the Horizon line, the Vanishing point, and the convergence lines (orthogonal lines), as illustrated in Fig. 6.Each line in this feature is divided into three segments, distinguished by different colors.These serve as markers indicating distances in increments of 25 measured from the mobile robot's base to the surface plane.These markers prove particularly useful when approaching objects, especially if the intention is to manipulate them with the robot's arm, which has a maximum reach of  While working with fisheye cameras typically involves image calibration using an intrinsic distortion matrix, our use of the spherical display eliminates this requirement.The fisheye camera model and the mapping equations from the world to the image plane are detailed in Fig. 7, and described by the equidistant projection model in Eq. 1 [12].
The calculation of  and  of any point in the world plane can be done by: Where  = 0.85.With a sensor size of 6.08 4.56 and an image resolution of 1440 1440, the size of each pixel in the real world is: Hence, transferring to pixels frame is done by: The second feature integrates a 2D map with the spherical image, combining information from the laser sensor to create complementary data.The choice of a 2D map over a 3D map was driven by considerations of processing time and ease of information comprehension.This integration provides real-time awareness of the robot's environment to the user.By zooming out, the user can obtain a "bird's eye view" of the robot and explore potential collisions from various perspectives.While viewing the map, the left screen displays a cropped, consistent view of the robot's front and rear sections, as shown in Fig. 8.

Manipulation Mode User Interface
Enhanced visual clarity and resolution are needed to enable effective object handling with the robot's arm.The conventional approach involves mounting a camera gripper at the EE link.However, viewing the environment through the perspective of the gripper camera can be non-intuitive, giving rise to new challenges.When transitioning to manipulation mode for arm control, the left screen adapts accordingly by displaying the gripper view captured by the EE Realsense camera.The manipulation mode GUI has been carefully designed to address four key issues discussed below.

Latency. The technique of overlaying the robot model in
RViz on the image stream discussed in the Navigation mode is leveraged here as well.

Hand-Eye Coordination.
The GUI provides intuitive positioning and orientation of the gripper image relative to the robot base.This helps to replicate the human ability to use visual information for spatial perception and simultaneous task execution.Through a point pattern with a link connecting the gripper and the base, along with changes in image perspective, the user can perceive the EE's transformation relative to the base.This information is based on the arm's transformation matrix (Eq.10).As the user moves the arm along its  and  axes, the image shifts proportionally along the image's  and  axes in a scaled manner.When the arm moves along the  axis, both the image size and its associated point change, creating the illusion of the image moving closer or farther away.The projection of the arm's position along the  axis is represented by a line connecting the gripper to the projection point on the linear perspective lines on the surface.
Where  6 0 is the UR5 arm transformation matrix,  is a scale factor matrix,   ,   are the original pixels (no transformation), and ,  are the new updated projected pixels.
To understand the EE's orientation, the image plane adjusts its perspective relative to the gripper's orientation with regard to the base, following principles of linear perspective.If necessary, the user can zoom in for a closer view.Additionally, an attitude indicator (artificial horizon) overlay was included to help the user maintain awareness of the gripper's orientation.
The manipulation screen, as shown in Fig. 9, also features 3D workspace boundaries to assist the user in avoiding singularities and potential collisions with the robot base.These boundaries are essential for alerting the user when they approach the limits.The boundaries are clearly marked on the  and  plane, while the EE projection line is used for the  axis.When the arm extends beyond 75, the line changes color and displays informative text.
A 6D SpaceNav mouse serves as the input device for both navigation and manipulation.Initially, its frame was linked to the EE frame, but this proved unintuitive.The problem arose because when the gripper was rotated, the frames of the SpaceNav and the EE no longer aligned.For example, a 90 • rotation around the  axis of the gripper would cause a sideways movement when the mouse end if 10: end for was pushed up.This was resolved by establishing a new frame that remains fixed relative to the gripper, ensuring consistent and intuitive control.

Depth perception (using grasping assistive overlay).
The primary challenge in tele-manipulation tasks using only 2D images is the difficulty in gauging whether an object is at an appropriate distance for grasping.To address this limitation, a novel approach employing MR to perceive depth is introduced (Algorithm 1), providing enhanced depth resolution beyond spatial perception.An imaginary grasping plane is superimposed onto the "gripper view" image (Fig. 10a).Subsequently, a 2D polygon grasping region of interest (ROI) is defined over the pixels of the imaginary plane and drawn over the image (Algorithm 1 step 4 and 5 respectively).The distances to objects are monitored using the aligned depth image.For each pixel within the ROI, if the distance is less than 30 (the actual distance between the grasping plane and the image plane), the overlaid pixels are replaced with the original ones (Fig. 10b).Thus, if the object is close enough it will "pop out" from the virtual plane, creating the illusion of the object passing through the grasping plane.

Force Feedback (using visual haptics).
To accomplish contact tasks effectively, it is crucial to sense and control applied forces.However, with the aim of designing an intuitive solution that does not require wearable devices, we leverage visual haptics.This technique enhances the visualization of contact forces, serving as an alternative to tactile feedback [9].It is crucial to understand both the direction and intensity of forces applied when manipulating an object partially or fully fixed to the environment.This often leads to the creation of a closed kinematic chain, as seen in tasks like operating a ball valve using a handle.Without sensing a tactile force, remote operators typically rely on positional control alone.This could lead to unintended force application in cases where the robot and object axis systems aren't perfectly aligned.In our system, visual haptics feedback provides a way to visualize and regulate exerted forces in undesired directions.A 6-axis force sensor located at the EE link records these contact forces, which are then transformed into the EE frame.Fig. 11 illustrates a concept and an implementation of a visual force overlay in our designed system.Here, the forces are visualized through an overlay that dynamically changes size in proportion to the force magnitude and direction, as depicted in Fig. 11 B.    11: EE contact force display.A shows a conceptualization of the robot arm, the 6DOF force sensor, the measured and applied forces, and the task.The applied force vectors are opposite to the measured force vectors.B shows the gripper in our setup, the task, and the designed interaction force overlay which is displayed when the arm applies forces toward the right direction.

EXPERIMENTAL METHOD
In this work, we were interested in quantifying the improvement of our system design over traditional 3D HMDs.We conducted initial, internal, in-house prototype evaluations at our research laboratory using two established methods: Situation Awareness Rating Technique (SART) [23] and Situation Awareness Global Assessment Technique (SAGAT) [7].While both methods have their merits and drawbacks [6], we combined both approaches to allow for a comprehensive system assessment, emphasizing participants' abilities in teleoperating a robot through navigation, manipulation, and grasping tasks.The focus was on assessing the intuitiveness of the user interface and comparing it to teleoperation using an HMD with a 360 • view.The 360 • view provided by the HMD served as a reference, given its established effectiveness in enhancing situational awareness in prior studies [16,26].We posited that if the proposed approach could demonstrate a level of SA at least on par with HMDs, it would substantiate the effectiveness of enhanced visual feedback on 2D monitors, leveraging 3D video streams.

Participants
In alignment with Nielson et al. 's recommendations, our prototype evaluation involved five users (four males, one female), a number considered optimal for yielding insightful results [17].These participants, aged between 22 and 30, were chosen for their expertise in robotics and complex systems; nevertheless, they did not have specific experience in teleoperating the system under consideration.Despite their familiarity with robotics, these users encountered the designed 2D teleoperation interface for the first time during the experiments.This deliberate choice allowed us to assess the interface's intuitiveness based on relative results.It is crucial to acknowledge the potential for bias in the results, considering users' prior experiences with Head-Mounted Display (HMD) interfaces, such as those used in gaming.The experimental protocol was approved by The University of Texas at Austin's Institutional Review Board, ensuring ethical compliance, and informed consent was obtained from each participant.

Experimental Protocol
The experiment employed a remotely controlled robot (Fig. 1), with capabilities for navigation, manipulation, and grasping tasks.The task involved navigating through a tarp-covered U-shaped mock tunnel (Fig. 12), locating randomly placed aluminum bars on the floor, and depositing them into a designated box.The tarp blocked outside light, mimicking real-world conditions.Arbitrary stops, accompanied by SAGAT questions, were introduced during the tasks without participants' prior knowledge, to help SA assessment.After an initial briefing and training session, participants were tasked with navigating the robot using both the HMD and the designed 2D GUI for each trial.This setup facilitated a comparative analysis of the advantages of the designed system over traditional 3D HMDs, with a specific emphasis on user interface intuitiveness.Participants performed two trials: one with the HMD providing a 360 • view and another with the designed 2D GUI.Post-experiment surveys gathered qualitative feedback on the user interface.

Recorded Metrics
The selected metrics in the study protocol serve as integral parameters for evaluating the novel 2D design's performance in teleoperating the robot.The metrics used were: (1) Detection time assesses how well the system helps users locate objects, offering insights into its effectiveness.(2) Number of "control mode" switches gauges the system's intuitiveness, the fewer, the better.6 RESULTS AND DISCUSSION 6.1 SART Questionnaire Results The questionnaire results, along with statistical analysis, are presented in Table 1.Fig. 13 provides a statistical overview of all users' results across both operating interfaces.Demand, Supply, and Understanding were calculated based on concepts from Endsley et al. 's work [6].To understand the role of attentional supply in SA, we calculated the situational awareness score as follows: Where  is situational awareness,  is understanding,  is attentional supply and  is attentional demand.
Analysis of the SART test reveals a notably higher median SA level for users in the proposed user interface compared to HMD (20 and 14, respectively).User P3 achieved the lowest scores in both experiments, but their score was slightly higher in the 2D interface experiment (11 and 12, respectively).Table 1 highlights the most significant change observed in the supply index, indicating that on-screen information significantly enhances attentional resources.

SAGAT Evaluation Results
The SAGAT analysis employed task-specific queries.The random stops initiated during the experiments prompted users to respond to questions gauging their task orientation and comprehension, with results shown in Table 2. Additionally, users were required to self-assess their confidence levels while executing the tasks, with results illustrated in Figs. 14 and 15.
The results demonstrate superior performance and increased SA levels with the 2D GUI in comparison to the HMD interface.The users provided more correct answers while reducing the time required to successfully complete tasks, with the exception of P2.The average completion time with HMDs was 222 seconds, compared to 192 seconds for the 2D GUI.Furthermore, there were fewer "control mode" switches, errors, and collisions with the 2D GUI.All five users successfully completed their missions using the 2D GUI, whereas only three completed the missions using the HMD.Finally, most of the users reported heightened confidence levels.

CONCLUSION AND FUTURE WORK
This work presents an improved 2D interface aimed at addressing some of the SA problems encountered in performing remote tasks without the need for additional wearable devices.This distinguishes it from HMD-based interfaces.The interface optimizes screen organization, reducing cognitive load and enhancing task-focused concentration.Critical missing information, such as depth, forces, and system status, is intuitively integrated into the interface using psychophysics principles to align with human perception.Comparative testing against HMD-based interfaces, historically known for their high SA levels, was conducted.The experiments employed the SAGAT and SART standardized tests, revealing an overall SA improvement for most users.
The preliminary results from the in-house prototype tests affirm the viability of this solution and lay the groundwork for future research in this domain.The proposed interface offers a distinct advantage by not requiring high-end computing resources and eliminates the need for operators to use wearable devices.While this study focused on a specific robot, the proposed solution holds promise for a broad spectrum of remotely operated robots.
In the future, we plan to extend this effort by performing a robust research study for more in-depth qualitative and quantitative results.Furthermore, we plan to implement a dynamic plugin system for adaptable overlays that respond to operation modes and scene changes.

4. 1 . 2
Latency.By incorporating the robot model in RViz and finely adjusting the sphere's diameter over the model, we achieved a remarkable alignment between the virtual model and the physical robot.This synchronization helps users anticipate real robot motion due to the faster response of the robot model compared to image updates.Moreover, overlaying user input commands provides instant validation when commands are sent to avoid overshooting in the presence of latency.Finally, compressed image topics are used to reduce image lag.4.1.3Depth Perception.

Figure 6 :
Figure 6: Conceptual explanation of linear perspective with vanishing point, Horizon line and Vanishing point and the convergence lines

Figure 7 :
Figure 7: Illustration of the fisheye lens camera projection model

Figure 8 :
Figure 8: Navigation mode GUI: Linear perspective lines and velocity input overlay (left), driver and bird's eye view (right)

Figure 9 :
Figure 9: Manipulation mode GUI hand-eye coordination (a) Gripper view -Object cannot be grasped (b) Gripper view -Object can be grasped

Figure
Figure11: EE contact force display.A shows a conceptualization of the robot arm, the 6DOF force sensor, the measured and applied forces, and the task.The applied force vectors are opposite to the measured force vectors.B shows the gripper in our setup, the task, and the designed interaction force overlay which is displayed when the arm applies forces toward the right direction.

( 3 )
Number of emergency stops assesses the system's safety, addressing uncertainties during teleoperation.

Figure 12 :( 4 )
Figure 12: Experimental setup.A shows the tarp-covered mock tunnel the robot navigates.B shows the control desk with the HMD, 2D monitors, and the SpaceNav device.C shows the aluminum bars to be located.D shows the 2D schematic of the mock tunnel with dimensions, with the white, right rectangle as the entrance (not drawn to scale).E shows the 3D model of the mock tunnel.

Figure 13 :
Figure 13: Statistical results of SA scores based on SART

Figure 14 :
Figure 14: SAGAT sum of right/wrong answers