AR-STAR: An Augmented Reality Tool for Online Modification of Robot Point Cloud Data

Robotic solutions are being deployed in industry to perform increasingly challenging maintenance and inspection tasks. In many industrial applications, robots can encounter uncertainties that prevent task completion, potentially resulting in unexpected costs and unsafe working conditions. Supervised autonomy allows personnel to intervene and assist in negotiating these challenging situations. Augmented Reality (AR) allows for the visualization of complex robotic sensor data and provides an opportunity for non-expert users to interact with it in a natural and intuitive manner. We present a comprehensive AR application module-Augmented Reality-Situational Task Accept and Repair (AR-STAR)-that allows users to visualize LiDAR point cloud data along with sensor images and navigational goals. AR-STAR enables users to interact with LiDAR point clouds and modify them in real time. We developed and evaluated three interaction modalities that enable users to manipulate point cloud data in situ while observing the data superimposed on the physical environment. Demonstrated in a human-robot teaming scenario where tasked to identify and repair surface corrosion in a simulated industrial setting, we evaluate AR-STAR with pilot studies to determine the preferred interaction modality based on user workload, system usability, and task completion time to improve non-expert human-robot interactions in industrial settings.


INTRODUCTION
Uncertainty is ubiquitous in real-world environments.Deployed robots must navigate this uncertainty, which can hinder efcacy in completing tasks.Supervised autonomy allows objective-aware human supervisors to monitor performance of robotic systems, intervening when necessary to overcome situations that are challenging with full autonomy.One challenging domain with opportunities to be improved via supervised autonomy is performing routine inspection and maintenance tasks in the nuclear, oil & gas, and chemical industries [20,21].In highly corrosive environments, such as outdoor ofshore manufacturing facilities, operators regularly inspect and repair surface corrosion on critical equipment.Corrosion in these environments accounts for 3-4% of global GDP, with unrealized potential savings of 15-35% of this cost using timely management strategies [11].Recent work within our laboratory 1 enables mobile manipulators to perform surface corrosion inspection and repair tasks.Mobile manipulators can autonomously navigate, identify, and physically repair corroded surfaces by autonomously applying a protective spray coating.These environments, however, are often spatially complex, making full autonomous operation a challenge.Extensive use of vision-based cameras and LiDAR sensors renders robots highly susceptible to noise and artifacts frequently generated by these sensors within such environments.This can hinder the ability of detection systems to identify and address corrosion, presenting challenges that autonomous systems may struggle to surmount without human intervention.
Corrosion is challenging to reliably detect with image-based models due to geometry-based occlusion and changes in ambient lighting conditions.Thus, inaccurate predictions about the location and extent of corroded material are often observed, and informing repair plans using an autonomous system based on these inaccurate surface identifcations results in incomplete surveys and repairs.Corrosion also often encroaches on valves, gauges, and other functional hardware that would be adversely afected if incidentally coated due to inaccurate predictions or overspray.Human-robot collaboration presents an opportunity to ensure these adverse behaviors are not experienced.Users should be enabled to visualize predicted corrosion, confrm or reject detections, and make additions and subtractions to detections to include missed material and exclude sensitive equipment in the physical space.Crucially, time spent performing these modifcations should be minimized to improve efciency and maximize cost savings.An intuitive, mobile user interface to review, modify, and approve the robot's repair plan in the feld before it can perform work is needed to ensure repairs are performed properly.To enable such collaboration, we present Augmented Reality Situational Task Accept and Repair (AR-STAR), a novel AR Head-Mounted Display (HMD)-based system that allows users to a) visualize the point cloud of the identifed corroded area, b) tag points that should be included or excluded from the visualized repair plan using any of three interaction modalities, and c) send approval to the robot to proceed with the reviewed repair plan 2 .

RELATED WORK
AR used for Human-Robot Interactions (HRI) has enabled users to communicate and supervise robotic agents [16,18], program robotic agents [1,2], enable users to provide navigation [6,10] and teleoperation [17] commands, visualize robotic workspaces [4,5,14], and communicate robot intent [7,15,19].Researchers in [16] provided the users the ability to visualize a 2D point cloud to enable those working alongside the robot to visualize the path of the robotic agent.Authors in [13] developed an AR tool to visualize radiation survey sensor data aligned with the foor to ensure users do not track alpha particles around facilities.Both works emphasize visualization, with no user modifcation capabilities.Researchers in [2] allows users to visualize the position and label of objects for grasping tasks.They allow users to interact with virtual hologram representations of objects via standard pinch-and-hold and air tap gestures that are built-into the Microsoft HoloLens 2. Authors in [14] developed an AR tool to enable users to pinch-and-hold holographic objects to defne manipulator control points and paths.This again uses the standard built-in HoloLens 2 hand interactions. 2See system demonstration: https://utnuclearroboticspublic.github.io/ar-star/Both works do not provide users with an interaction mode that is free-form,intuitive, and natural.It also does not enable users to modify real-time sensor data from robotic systems.Throughout the literature, no works have presented a supervised autonomy AR tool that enables non-expert users to both visualize and modify robot sensor data in real time.This work presents a method to both visualize and modify point cloud data in situ generated by a mobile robot to improve task execution in real-world environments.

SYSTEM OVERVIEW
Software packages were developed and deployed in parallel onboard a dual-arm mobile manipulator system and an AR-HMD.The AR-STAR module extends prior work, Augmented Robot Environment (AugRE), which enables AR-HMD users to bi-laterally communicate and localize in open environments with mobile robots in large human-robot teams (50+ agents) [18].A mobile manipulator uses RGB and 3D LiDAR sensor streams to search for corroded surfaces.Both streams are fused and detection models from the authors' laboratory 1 are used to segment the surface corrosion.Point cloud and image data for each segmentation is extracted and packaged to be further visualized on AR-HMDs.Location information from the point cloud clusters are used to generate surface repair plans.When a corroded surface is positively identifed from the mobile manipulator, AR-HMD users receive an image of the corrosion, a point cloud spatially aligned to the object in the physical world, and a navigation goal pose nearby that represents where the robot will navigate to perform a complete surface coating.Fig. 2-a shows one of the test fxtures used to demonstrate this work, a large, mostlycorroded wall with uncorroded regions and sensitive equipment to be masked.Fig. 2-b shows a point cloud, identifed by a mobile manipulator, corresponding to a positive detection superimposed over the test fxture.
After a positive detection, AR-HMD users are prompted with a clickable notifcation icon in the headset.Opening the notifcation presents the user with holographic representations of the navigation goal pose and robot repair plan, represented by a Cartesian coordinate frame and a cluster of triangular pyramids to visualize the point cloud, respectively.A holographic Detected Item menu, containing an image of the corrosion is additionally presented to the user."Repair", "Modify", or "Reject" AR buttons are available to permit, edit, or ignore, the repair plan, respectively 2 .On "Repair" or "Reject", the robot either makes the surface coating repairs autonomously or disregards the repair and continues to search for more corrosion."Modify" prompts the user with a Modifcation menu which provides the ability to use three modalities of interaction to modify the visualized point cloud (representing the intended repair plan) by adding or subtracting points from the point cloud data.The three interaction modalities developed are shape, highlight, and lasso based on well-established styles adopted from AR interface works [14] and common 2D sketching interfaces [9,12], that should serve to enhance intuition for users and reduce the time required to modify data.
The shape modality allows a user to add inclusion or exclusion shapes (rectangular prism, cylinder, sphere) to the environment (Fig. 2-d).These shapes can be manipulated using standard pinch-andhold hand gestures on the Microsoft HoloLens 2, both at near and The highlight modality allows a user to trace over the visualized points to be included or excluded using their index fnger on either hand (Fig. 2-e).Commands to begin and stop tracking fnger position are issued verbally and processed onboard the AR-HMD.Holographic spheres are virtually placed atop the index fngertip at a frequency of 30 Hz to visualize and track the area being highlighted.With this method, users can mark any location in the scene in a free-form natural manner.Upon completion of the highlight modifcation, an array of poses for each sphere's centroid and its radius is sent to an ofboard server.Here the squared distance between each centroid for each point in the point cloud cluster is compared to the squared radius.Points with values less than the squared distance were tagged and appropriately included or excluded.A modifed point cloud is visualized to the AR-HMD user who can perform additional modifcations if necessary or permit the repair.
The lasso modality allows a user to encircle objects and include or exclude any sensor data inside the traced boundary (Fig. 2-f).Like the highlight modality, the user's index fnger is tracked, spherical markers are placed to visualize the generated boundary to the user at 30 Hz, and commands to start and stop tracking are provided verbally.Upon completion of the lasso modifcation, an array of centroid poses for each sphere defning a 3D polygon is sent to an ofboard server.Here the centroid poses are parsed to create a sparse 3D polygon for increased computation time.The sparse polygon is triangulated via Ear Clipping triangulation.Each triangulated surface is extruded a set distance bi-directionally along each triangle's normal.A triangulated mesh is wrapped around the extruded surfaces to create a complete triangulated mesh volume.Ray casting is then used to determine if each point in the cloud is inside the triangulated volume and the original point cloud is modifed.A modifed point cloud is visualized to the AR-HMD user who can perform additional modifcations if necessary or permit the repair.Fig. 2-c shows the positive detection point cloud after being modifed by each of the three modalities.After modifcations from the user, the robot's repair plan is regenerated.

METHODS
Because this tool is intended for use by non-experts to help robots complete corrosion management eforts as efciently as possible, an initial pilot study with fve participants measuring time, user workload, and system usability was conducted.These factors were compared across the three diferent interaction modalities (shape, hightlight, lasso) on three diferent objects (a horizontal pipe segment, vertical pipe segment, and fat wall) that mimic commonly corroded items in industry.Participants (aged 23-33) were recruited from the authors' laboratory 1 with varying levels of experience with Microsoft's HoloLens 2 and similar devices ranging from frsttime to expert users.Each participant was tasked with removing points from three point clouds containing noisy and inaccurate data superimposed over the test fxtures using one of the three modalities.Once a participant completed the task for all three test fxtures using one interaction modality, the participant completed a NASA-TLX [8] and SUS [3] survey.This process was repeated for all three interaction modalities with the orders of test fxtures visited and interaction modalities randomly shufed for each participant for a total of nine trials completed per participant.During each trial, participants were free to use the designated modality as many times as desired until the points in focus were removed.

RESULTS AND DISCUSSION 5.1 Time Study
The interaction duration for each modality with each test fxture was averaged across all participants to produce the results shown in Fig. 3.The portion of time spent by each user observing the scene without actively modifying the point cloud, the time spent by the Object 1) is a horizontal pipe section, 2) is a vertical pipe section, and 3) is a fat wall.user actively modifying point cloud, and the time used by our algorithms for computation and data transfer between the robot and HMD were also averaged for all participants.These results show that-across all three test fxtures-the shape modality required the most time to complete, followed by the highlight modality, followed fnally by the lasso modality, which was the fastest modality on average.The wall (3) fxture required the most time of the three fxtures using both the highlight and lasso modalities due to its size, which required each user to cover a large amount of distance using their fngertip.This is refected in the average modifcation duration for each of these modalities with the wall fxture.The vertical pipe (2) fxture required the least amount of time for both highlight and lasso modalities, but required the most time using the shape modality.This is likely due to the small amount of material users were asked to remove on the vertical pipe, which could be quickly covered using the fnger-tracking modalities but required users to spend time shrinking the exclusion volumes to ft the space when using the shape modality.The geometry of the horizontal pipe (1) fxture was more complex than the other two fxtures, which led to a longer-than-expected average time spent using the highlight modality to complete the task.Overall, these results suggest that the highlight and lasso modalities are more efcient than the shape modality for covering smaller, simpler surfaces but that this diference in efciency decreases when larger modifcations are required.

User Workload and System Usability
After each interaction modality trial, NASA-TLX and SUS survey metrics were recorded.The NASA-TLX [8] survey measures user workload and is used here to compare mean workload scores for each measure (mental demand, physical demand, temporal demand, performance, efort, and frustration) for all interaction modalities used across all objects.The results are shown in Fig. 4. The highlight interaction modality resulted in the lowest mental demand (raw mean score: 7.00 ± 2.55) when compared to lasso and shape modalities (raw mean scores: 8.60 ± 3.21 and 9.20 ± 2.77 respectively).This is sensible given the highlight modality does not require consideration of how to orient shapes in the 3D-space (shape modality) or ticipants for each modifcation modality with 95% error bars.
of the geometry of a larger free-form shape created in 3D-space (lasso modality).In regards to physical demand, the results show that the highlight modality had the highest score (raw mean score: 6.00 ± 4.30) when compared to lasso and shape modalities (raw mean scores: 5.00 ± 3.94 and 5.40 ± 2.51 respectively).This aligns with expectation, as the highlight modality should require the most physical demand, especially to cover a substantially large area that forces the user to traverse the entire space with their hand.There were no large diferences in the results from the performance and efort dimensions.The lasso modality proved to have the highest frustration score (raw mean score: 7.40 ± 4.51).This may be a result of how the points inside the free-formed lasso shape are determined to be excluded, which has limitations when creating complex combinations of convex and concave shapes that should be investigated further in future work.
A SUS [3] score was calculated using the standard SUS procedure for each user with each modality.The SUS scores for all users were averaged and the means were calculated for each modality.For the shape, highlight, and lasso modalities, the mean SUS scores were 69.00 ± 11.40, 74.00 ± 14.85, 65.00 ± 8.84, respectively.There was no statistical signifcance when comparing the three mean SUS scores against each other, though the mean scores for shape and highlight modalities were above 68 which is an accepted benchmark for above-average usability.

CONCLUSION
In this work, we present AR-STAR, an AR-HMD module, built on top a human-robot teaming AR-application AugRE [18], that enables users to visualize and modify autonomously-detected corrosion in situ.This supervised autonomy interface allows the robot's intent to be clearly understood and evaluated prior to the execution of a repair task.Using images of the scene and a point cloud of the detected corroded surface, users can assess what the robot intends to repair and permit repair of accurately-detected material, reject problematic repair plans for any reason, and make modifcations to repair plans to include missed material or avoid sensitive equipment near corroded material.We theorize deployment of robots in similarly-complex environments can be improved using similar supervised data manipulation, and intend to extend this work to develop a general-purpose AR tool for modifying robot sensor data in situ to improve robot task execution and HRI with non-experts.

Figure 1 :
Figure 1: Personnel using AR-STAR via lasso mode to edit point cloud data collected by a mobile manipulator system.

Figure 2 :
Figure 2: AR-STAR interaction modality stages captured on a Microsoft HoloLens 2. a) Wall with sensitive equipment and corroded surfaces represented with brown paper.b) Superimposed point cloud depicting what a robot has determined to be corrosion.c) Point cloud following user removal of data.d) Shape modality used to remove points around a gauge.e) Highlight modality used to remove points around a valve and pipe.f) Lasso modality used to remove points around an errant detection.fardistances.Upon completion of the user modifcations, an array of centroid poses and the geometric shape parameters (i.e., length, width, radius, etc.) of each shape is sent to the robot.The robot computes which points in the point cloud are within each 3D shape, modifes the point cloud, and resends the modifed point cloud back to the AR-HMD user who can perform additional modifcations if necessary or permit the repair.The highlight modality allows a user to trace over the visualized points to be included or excluded using their index fnger on either hand (Fig.2-e).Commands to begin and stop tracking fnger position are issued verbally and processed onboard the AR-HMD.Holographic spheres are virtually placed atop the index fngertip at a frequency of 30 Hz to visualize and track the area being highlighted.With this method, users can mark any location in the scene in a free-form natural manner.Upon completion of the highlight modifcation, an array of poses for each sphere's centroid and its radius is sent to an ofboard server.Here the squared distance between each centroid for each point in the point cloud cluster is compared to the squared radius.Points with values less than the squared distance were tagged and appropriately included or excluded.A modifed point cloud is visualized to the AR-HMD user who can perform additional modifcations if necessary or permit the repair.The lasso modality allows a user to encircle objects and include or exclude any sensor data inside the traced boundary (Fig.2-f).Like the highlight modality, the user's index fnger is tracked, spherical markers are placed to visualize the generated boundary to the user at 30 Hz, and commands to start and stop tracking are provided verbally.Upon completion of the lasso modifcation, an array of centroid poses for each sphere defning a 3D polygon is sent to an ofboard server.Here the centroid poses are parsed to create a sparse 3D polygon for increased computation time.The sparse polygon is triangulated via Ear Clipping triangulation.Each triangulated surface is extruded a set distance bi-directionally along each triangle's normal.A triangulated mesh is wrapped around the extruded surfaces to create a complete triangulated mesh volume.Ray casting is then used to determine if each point in the cloud

Figure 4 :Figure 3 :
Figure 4: NASA TLX mean measurement results over all par-