PRogramAR: Augmented Reality End-User Robot Programming

The field of end-user robot programming seeks to develop methods that empower non-expert programmers to task and modify robot operations. In doing so, researchers may enhance robot flexibility and broaden the scope of robot deployments into the real world. We introduce PRogramAR (Programming Robots using Augmented Reality), a novel end-user robot programming system that combines the intuitive visual feedback of augmented reality (AR) with the simplistic and responsive paradigm of trigger-action programming (TAP) to facilitate human-robot collaboration. Through PRogramAR, users are able to rapidly author task rules and desired reactive robot behaviors, while specifying task constraints and observing program feedback contextualized directly in the real world. PRogramAR provides feedback by simulating the robot’s intended behavior and providing instant evaluation of TAP rule executability to help end users better understand and debug their programs during development. In a system validation, 17 end users ranging from ages 18 to 83 used PRogramAR to program a robot to assist them in completing three collaborative tasks. Our results demonstrate how merging the benefits of AR and TAP using elements from prior robot programming research into a single novel system can successfully enhance the robot programming process for non-expert users.


INTRODUCTION
The proliferation of general computing technology necessitated the development of end-user tools, such as spreadsheets, for users who were not professional software developers.The increasing number of robot deployments (over 3 million robots operate in factories today [61]), creates a similar need for end-user robot programming.In pursuit of this goal, prior work in end-user robot programming has ofered diferent programming paradigms (e.g., imperative [40], datalow [33]) and representations (e.g., Hierarchical Finite State Machines [59], Behavior Trees [64]) to aid non-experts with programming robots.Recent work has begun extending these methods by using mixed reality technologies to improve an end-user's understanding of robot activities in 3D space [7,15,28,31,44,49,62,68,72].However, these methods do not easily allow for programming complex reactive robot behaviors that are common in real-world robotics applications.Beyond the most common applications (e.g., manufacturing), we envision robots playing a valuable role in assisting individuals with everyday tasks.These tasks may include cleaning dishes or putting away groceries, where reactive robot behaviors are often necessary for coordinating interactions with humans.Such tasks require users deine where objects are placed, triggers for actions, and multiple pick-and-place activities.For instance, when a user washes a dish and places it on a drying rack, a robot reacts by picking up the dish, drying it, and placing it in a user-deined location within a cabinet.
In pursuit of this vision, we combine the rich medium of AR with TAP.TAP, also known as event-driven programming, has gained popularity as a user-friendly approach for end-user programming, allowing users with no prior coding experience to develop successful reactive programs [77].As a result, TAP has been adopted across a wide range of real-world domains, including project management, security systems, and smart hubs [69,78].In TAP, users deine a set of circumstances known as triggers that initiate actions once the triggering conditions are met.In the context of dish washing, users can create a rule such as łIF a dish is on the drying rack, THEN dry the dish and place the dish in the cabinet." This rule prompts a robot to wait until it detects a dish on the drying rack, picking it up when detected, drying it, then placing it in a pre-deined cabinet location.The simplicity of TAP positions it to be an efective tool for non-expert users seeking to program robots for everyday tasks.
While prior work has explored the general notion of robot event-condition-action rules (e.g., [22,83]), TAP has only recently been investigated for end-user robot programming [48,52,73].In this context, Leonardi et al., 2019 [48] used TAP to enable non-expert users to craft reactive social robot behavior programs.Alternatively, Senft et al., 2021 [73] utilized TAP to enable non-expert users to program coordinated robot actions for human-robot collaboration (HRC) tasks.However, these existing TAP systems are constrained by a 2D screen development paradigm, which restricts users to deining programs and parameters in a manner disconnected from the actual operating environment of the robot.Such setups have been observed to diminish users' comprehension of the contextual aspects of their task [6,37,55].Conversely, an ARHMD provides hands-free mobility, a wider ield of view, and supports users in larger areas by allowing them to view the workspace from various perspectives.ARHMDs also enable more accurate depth estimation of virtual imagery, providing better blending of virtual and physical environments than 2D screens.In all previous studies focusing on TAP programming for robotics, users consistently expressed a desire for visual feedback and debugging support when building their trigger-action rules, as well as their own mental model of the system [48,52,73].Our insight is to unlock the untapped synergies that exist between recent developments in AR and TAP, which we actualize in developing PRogramAR as a new end-user robotics programming system.To do this, we utilize an augmented-reality head-mounted display (ARHMD) to contextualize information directly in the user's scene and thus providing the following beneits: (1) Users can program the robot within the entire 3D workspace in which it operates, rather than being restricted to deining 2D zones on a tablet with a 2D ield of view from the robot's camera as in Senft et al., 2021 [73], (2) Users can verify and monitor the correctness of their program by visualizing a simulated version of the robot's behavior via the robot's 3D digital twin, (3) Integrating TAP within AR, rather than displaying it on a separate device (tablet or desktop), presents users with a cohesive and holistic system, reducing potential confusion and frustration from context switching across devices, and (4) Users can freely position the AR TAP interface without physically holding it, enabling them to efortlessly monitor both the physical workspace and TAP rules simultaneously (useful in debugging).Contributions: We introduce PRogramAR, a system for supporting non-expert programmers with authoring reactive robot behaviors by adopting AR-based contextualization and simulation-based rule evaluation.By integrating known componentsÐAR and TAPÐwe enhance the user's capability to coordinate actions efectively during collaborative tasks with a robot.To evaluate PRogramAR, we recruited 17 participants who used our system to author robot programs for three HRC tasks.In this study, HRC is used to describe a human-robot team working together towards a shared goal, based on terminology from prior work [7,19,73].Overall, we contribute: (1) PRogramAR, a system for making reactive programming of robot manipulators easier for non-experts by combining AR and TAP, (2) A validation of the beneits of merging AR and TAP, with data collected from a diverse set of end-users with a wide age range, and ( 3) An open-source code release of PRogramAR generalizable to various robots and AR headsets to foster reproducibility and encourage future research and extensions by the community, which can be found at https://osf.io/gvxu5.

RELATED WORK
Our work on PRogramAR is inspired by past research in robot programming, trigger-action programming, and augmented reality.

Robot Programming
Prior research has investigated a variety of methods for robot programming.One prominent approach is that of skill demonstration (i.e., learning from demonstration / LFD), in which users deine robot actions through kinesthetic teaching, teleoperation, or passive observation (see [10,71] for relevant surveys).One advantage of LFD systems is that programming is directly embedded in the robot's operational context (i.e., how the robot moves in the real 3D environment); however, LFD can be diicult to generalize to new environments and is often used to teach a robot primitive motions, rather than to build coordination mechanisms that enable collaborative human-robot tasks through reactive robot programs.Methods for program speciication present an alternative approach, where interfaces let users deine and parameterize desired robot actions with varying degrees of abstraction.For example, research has explored visual robot programming tools where users allocate task execution through low diagrams, behavioral trees, or block-based programming interfaces [8,35,38,40,70].However, in order to use such systems, end-users are often required to know fundamental programming concepts (e.g., variables, conditionals, loops, etc.), which may limit system applicability.These systems also adhere to a traditional programming approach, requiring users to specify the whole robot program before execution.Moreover, the feedback provided by these systems is typically visualized on a 2D screen, thereby disconnecting it from the context of program execution where the physical robot moves through 3D space.Likewise teach pendants, currently the industry standard for end-user programming of repetitive robots tasks, can be complex and diicult to use for individuals that are not trained professionals [45,66].Therefore, we have designed a new program speciication system that eliminates the need for end-users to possess prior programming knowledge, and enables non-traditional programming worklows that cater to users of all backgrounds.Furthermore, we leverage AR to provide immersive, visual feedback in the real operational space, directly connecting program development with program execution.

Trigger-Action Programming
Trigger-Action Programming, which forms the foundation of popular tools such as If-This-Than-That (IFTTT), Zapier, and SmartThings, has been successful in engaging users at all levels of programming proiciency [13,21,23,32].Research has investigated various ways to reine this programming process by improving support for user mental model formation when designing their trigger-action sets [39], understanding common bugs that occur in a TAP program [11,63], and explaining TAP behavior in an understandable manner [84,85].More recently, researchers have begun to apply TAP to end-user robot programming.For instance, a study by Leonardi et al., 2019 [48] found that TAP can be an efective method for end-users to personalize social behaviors for humanoid robots.Further research was conducted by Manca et al., 2019 [52], who developed a visual analytics tool to understand the rules created by end-users.However, these systems focused solely on creating verbal responses to triggers without considering scenarios where a user may want to coordinate physical tasks with a robot.Senft et al., 2021 [73] built on this work in their Situated Live Programming (SLP) system, which provides a TAP interface for physical task coordination between humans and robots.In SLP, users can deine regions in their workspace as zones containing objects and positions relevant to TAP rules.SLP also incorporates live programming, which provides users with the lexibility to program the initial robot actions, then gradually construct the complete program.With each step, users can deine new trigger-action pairs based on the current state of the environment, facilitating incremental robot program development [73].In contrast, traditional programming techniques require users to specify the whole program before execution.Although promising, SLP uses a top-down camera attached to the robot's end-efector to visualize the scene on a tablet.This coniguration prohibits users from specifying zones outside the robot's 2D camera view.In addition, the tablet interface poses challenges in debugging TAP rules as users may ind it diicult to simultaneously monitor both the virtual TAP scene and the physical actions performed by the robot in the real world.Notably, a common message shared by users in prior work was that these systems lacked feedback, such as the ability to observe rules in action before they were run on the actual robot or support for rule executability.To address these drawbacks, we leveraged mixed reality technology to provide intuitive visual feedback during programming in the form of a novel TAP system.

Augmented Reality Robot Programming
Augmented Reality (AR) has gained popularity in robotics due to its ability to provide contextual information in situ within a user's environment, potentially improving situational awareness, system usability, and overall user interactions (see [1,50,75,76,81] for recent surveys of mixed reality robotics).Prior research has explored various forms of AR, including 2D overlay displays, projection-based displays, and AR tablets.2D overlay displays present a ixed view of the robot's workspace on a 2D computer screen, over which contextual information can be drawn [2,73].Projection-based displays directly project 2D visualizations into the user's workspace, often incorporating interaction mechanisms such as gesture tracking, smart touch tables, or programming wands [3,30,31,53].Tablets ofer mobility by overlaying AR content over the tablet camera feed, enabling users to monitor the scene from any viewpoint [17,26,44,47].Utilizing the beneits of AR, which stem from overlaying virtual information onto the real-world, each of these viewing modalities have enhanced the robot programming process for users.However, 2D overlays have a limited ield of view, inhibit the user's ability to communicate depth parameters, and visual feedback is viewed separate from the real-world.Projection-based displays are diicult to transfer to new environments and restrict mobility, while tablets occupy the user's hands and constrain the interface to within the small size of the screen.
Therefore, Augmented reality head-mounted displays (ARHMDs) have been used to alleviate these issues.ARHMDs free user's hands of hardware, promote unrestricted mobility and interaction throughout the whole workspace, and provide information directly contextualized in the user's real world.As a result, ARHMDs have been used to help industrial workers deine robot trajectories and action primitives [18,25,68].ARHMDs have also been used to communicate low-level robot sensor information to experts [19,58] and to facilitate debugging robot programs for expert roboticists [42].However, such systems have been designed for speciic professional workers, rather than for non-expert end-users.In other studies, robot motion intent was communicated to users via the Robot Digital Twin, but do not ofer an easy way to author reactive robot actions for human-robot collaboration tasks [72,80].Our approach in developing PRogramAR is inspired by that of Kragic et al., 2018 [46], Bambusek et al., 2019 [7], and Gadre et al., 2019 [29], who use AR to assist users in performing collaborative tasks with robots, to which we add the lens of trigger-action programming for deining reactive robot behaviors.

SYSTEM DESIGN
PRogramAR is designed to make programming of robot manipulators easier by adopting AR-based contextualization and simulation-based rule evaluation in combination with the beneits of TAP.Our system, which draws on prototype designs and indings from prior research projects (e.g., [9,15,48,72,73,80]), is composed of seven components: (1) AR Interface, (2) Rule Manager, (3) Object Tracker, (4) Rule Evaluator, (5) Motion Planner, (6) Physical Robot, and ( 6) Robot Digital Twin (Figure 1b).An example of a full worklow from our system validation ( §4) is depicted in Figure 2. In the following sub-sections, we describe each component of our system design.

AR Interface
Users interact with PRogramAR through an AR Interface embedded directly within the human-robot working environment (Figure 1a).This interface helps users create rules that are deined by triggers and paired actions that dictate when and how a robot should perform a task.To ground TAP rules in the real world, users create, move, resize, and delete 3D zones within the real environment to indicate regions relevant to triggers or actions (once created, each zone has a diferent preset color and a unique zone number displayed above it for ease of reference).By utilizing 3D zones, users gain complete expressibility as they can communicate depth information in 3D spaces such as shelves.This is in contrast to prior work that relied on 2D zones, which limited users' ability to specify depth information.With our interface, PRogramAR supports both traditional programming processes, where users deine their full program before execution, and live programming.In live programming, users can program TAP rules while the robot is planning (regardless of planning time) or executing actions, although edits may require re-planning.

Trigger-Action Programming Rules
PRogramAR currently supports combining triggers and actions into two types of TAP rules that support a user's mental model of a program: If-Then rules, as previously supported by Senft et al., 2021 [73], and our own addition of While-Do rules [39].Triggers are parameterized by objects, recognized items known to the system (tracked by the Object Tracker, with tracking described more in §3.5), conditions (e.g., łis", łin"), and zones.In our system, the currently supported triggers include when, (1) objects are in a zone (e.g., Box 1 is in Zone 2), or (2) objects are not in a zone (e.g., Box 2 is is not in Zone 3).Actions are parameterized by a robot action (e.g., łmove"), objects, and zones.Due to the limited capabilities of our robot, the only supported action is moving an object from a zone or its current location, to another zone (e.g., move Box 2 inside Zone 3).When the deined trigger is true, an If-Then rule executes its actions once before moving to the next rule.A While-Do rule performs its associated actions continuously as long as the condition is true before moving to the next rule.For instance, in a scenario where multiple objects need to be transferred from one zone to another, an If-Then rule might move only one object before moving to the next rule.In contrast, a While-Do rule might move all the objects before proceeding to execute the next rule.To summarize, the current rules, triggers, and actions supported by PRogramAR are as follows: • Rules: If-Then and While-Do • Trigger: Objects present within a zone • Trigger: Objects absent within a zone • Action: Moving objects from one zone to inside another zone • Action: Moving objects from any location to inside a zone The simplest rule would consist of a single trigger, containing a single object-zone pair, and a single action, also with a single object-zone pair.For example, a rule could be łIf . " Users can also specify rules of arbitrary complexity by using additional conditions connected by AND or OR logical operators in the rule triggers.Each trigger can also have multiple actions connected by AND operators.Similar to SLP [73], PRogramAR allows users to deine, edit, and delete TAP rules at any time (prior, during, or post robot execution).This creates a live programming environment to aid with debugging and progressively building reactive robot programs at runtime.In contrast to SLP [73], which prompts users to ix rule priority conlicts, PRogramAR executes TAP rules in a user-speciied order that can be adjusted as needed (Fig. 2f).Moreover, PRogramAR leverages AR to make use of all three dimensions of the user's workspace.This enables users to specify programs for placing a box on a shelf or, in future real world scenarios, moving a plate from a dish rack into a cabinet.Such programs are challenging to express in SLP [73], which relies on a top-down view, as it lacks depth perception, making it diicult to communicate spatial relationships accurately.

Rule Manager and Evaluator
As users create rules, they are maintained in a library within a Rule Manager, which communicates with other system components to manage rule options, available zones, and tracked objects while pushing updates to the AR interface.One key feature that goes beyond prior TAP systems such as SLP [73] or Leonardi et al., 2019 [48], is the Rule Evaluator.This component continuously checks whether the conditions of a rule are satisied by the current state of the world.The Rule Manager then pushes updates to the AR Interface to relect the status of each rule.The purpose of this feedback is to assist users with debugging their created rules by explicitly indicating whether a rule should or should not be executed.If the output of the Rule Evaluator conlicts with a user's expectations, then they may need to either edit their rules, or validate the current state of the world or virtual zones.Triggers and actions that evaluate to true, and are therefore in the queue to be executed, are colored green.Rules that evaluate to false, and are therefore not going to execute, are colored red (Fig. 2f).Red/green hues were chosen from a color blind accessible pallet to provide a level of contrast easily diferentiable by all users [60].

Motion Planner and Robot Simulation
PRogramAR leverages the MoveIt!Task Constructor (MTC), an open-source software for robot manipulator action planning that is compatible with over 150 robot platforms using MoveIt [36].While our current implementation targets the Fetch robot, a mobile robot with a 7-degree of freedom manipulator, other developers interested in utilizing or expanding PRogramAR can easily adapt it to their own MTC-compatible robot by replacing the MTC bindings speciic to Fetch (e.g., updating the platform coniguration in the launch ile) [82].When users choose to simulate or execute their program, the Rule Manager sends an action query to the Motion Planner.The MTC framework facilitates the planning of robot manipulator actions by solving individual sub-tasks and connecting them into a complete action plan [36].For instance, a pick-and-place task can be divided into stages such as robot approach, grasp pose, and lift direction.Each stage is solved using a motion planning framework such as OpenRAVE or MoveIt!, and subsequently connected sequentially to generate the full motion plan [20,24].
If the Motion Planner successfully generates a feasible motion plan for the programmed action, users have the option to simulate individual rules using the Robot Digital Twin.The digital twin is overlaid on top of the physical robot and demonstrates simulated motion plans contextualized in the real world.Once users are satisied the simulation, they may execute their program on the physical robot.During this time, a digital twin continues to mirror the robot's actions at twice the robot's speed, enabling users to preview the robot's behavior both before and during execution (Fig. 2b).At times, the Motion Planner may be unable to return a valid trajectory due to unreachable zones or objects, or obstacles that could cause collisions.In such cases, the system notiies users with an error message projected above the robot's head stating, łError: Zone too far or pick/place position too close to other boxes." Users can then debug their program accordingly.This approach is motivated by prior work emphasizing the importance of error detection and prevention in enhancing programming success [35,39].

AR Apparatus and Tracking
PRogramAR makes use of an ARHMD to present the main AR Interface and visualizations to users.While our current implementation uses the HoloLens 2, PRogramAR is built on top of the OpenXR application protocol interface (API), which allows any compatible mixed reality device, such as the MagicLeap or Meta Quest, to run our application.PRogramAR relies on having a single iducial marker placed in the workspace for aligning the coordinate frames of the virtual environment with the real world.The Object Tracker currently relies on external markers (in our validation we used four Vive trackers with lighthouse base stations); in the future, this might be performed directly using visual processing of the camera feeds from the ARHMD and/or robot.We derived the translation and rotation matrices necessary for aligning and calibrating our various coordinate systems (ARHMD, object tracking, and robot) as described by Peer et al., 2018 [65] such that user actions and program speciications could be accurately mapped into robot plans and AR visual feedback was appropriately displayed.

SYSTEM VALIDATION
To evaluate PRogramAR, we designed and conducted a validation study in which participants programmed three collaborative tasks with a Fetch robot.

Environment
Participants used PRogramAR in a controlled laboratory setting using a 1.2×0.75table as the shared workspace.On one side of the table there was a .58×0.4×.11 shelf, which the robot could be programmed to place objects on.Unlike previous TAP systems that were limited to a 2D top-down view of the workspace, we demonstrate the beneits of an ARHMD by requiring 3D placements of objects on the shelf on the left side of the robot's workspace (Figure 3).This poses a challenge for 2D interfaces when communicating depth information for TAP rules, especially when the top-down camera view may be occluded by the roof of the shelf.To align with real-world safety standards for human-robot shared workspaces [54,79], the task space was divided into three areas: (1) Robot workspace, where only the robot was allowed to work during execution, (2) Exchange area, where both the participant and the robot could enter while working, and (3) Participant workspace, where only the participant could work (Figure 3).

Programming Tasks
For this study, participants programmed a Fetch robot to perform three tasks that grew in complexity and therefore increased potential for working in parallel towards a shared goal: (1) Kitting, (2) Assembly-A, and (3) Assembly-B.These tasks were inspired by prior work [73] and real world use cases [54] and each had a time cap for completion.While kitting and assembly do resemble manufacturing tasks, we believe the requirement Participants then programmed the robot to move the boxes from the exchange area to particular locations within the robot workspace (numbered white squares in Figure 3).Task 2. Assembly-A: (15 minute cap) Participants programmed the robot to move boxes from the robot workspace into the exchange area.Once received, participants assembled four items from pieces in each container, placing them into white bins located nearby.Task 3. Assembly-B: (25 minute cap) Participants programmed the robot to move boxes from the robot workspace into the exchange area.Once received, the participant assembled the pieces from each container, put the assembled object back in the container and programmed the robot to move the boxes from the exchange area to one of the four initial positions in the robot workspace.
To accomplish these tasks, the robot must be programmed to transfer objects to and from diferent zones and object locations within the workspace.Participants were tasked with assembling objects that were chosen to be intentionally diicult to handle, with the aim of increasing the probability of success for participants who collaborated in parallel, rather than sequentially, with the robot.The collaborative behavior of working in parallel towards a shared goal was identiied when a participant actively engaged in their own physical tasks while the robot simultaneously executed its own user-programmed tasks.

Participants & Procedure
For this study, approved by our university IRB, we recruited a total of 20 participants from our local community through our university's online research recruitment platform.Since PRogramAR is intended for applications beyond manufacturing (e.g., services in the home), we recruited participants of all age ranges and experience levels.Three participants had technical diiculties and were unable to continue the study (e.g., failures with objects trackers or Wi-Fi connectivity).One participant, aged 83, lacked the dexterity to manually assemble our task objects (screws and fasteners).Instead, this participant performed a modiied version of our study, thus this participant's data is analyzed separately (see §5).As a result, our primary data set includes 16 participants (4 male, 10 female, 1 other and 1 prefer not to say), summarized in Table 1.The average age of the participants was 26.88 years (SD = 9.14) across a range of 18ś58.Eight participants (50%) reported having no computer programming experience, three (18.75%) reported 1 year or less, and ive (31.25%) reported 3 years or more.Seven (43.75%) of the participants indicated they own an IoT device, such as a smart hub, and participants' average familiarity with trigger-action programming was 3.00 (SD = 2.09) using a single item with a seven-point scale.Prior to the study, participants reported having little previous experience working with robots (M = 2.18, SD = 1.88) or using virtual or augmented reality technology (M = 2.81, SD = 2.16 on seven-point scales) on a seven-point range.Our sample of participants represents a broad distribution of age groups with diferent levels of experience, which relects many of our target end-users.
Each participant's session consisted of six phases: (1) Introduction, (2) Kitting, (3) Assembly-A, (4) Assembly-B, and (5) Conclusion.(1) Participants were given time to read and sign a consent form.Then the researcher explained what they would be doing and showed the participant a 5 minute tutorial video explaining how to use PRogramAR to program the robot.Then, the HoloLens was calibrated for the participant.Finally, participants were asked if they had any questions before beginning the irst task.(2) Participants began the irst task and a timer was started once they acknowledged they could see the interface in the scene.For this task, they were given 20 minutes and were allowed to ask clarifying questions on how the interface worked, but not how to complete the task.Once the robot correctly placed the inal object, the task was completed and the timer was stopped.(3) For the Assembly-A task, participants were given 15 minutes and were allowed to reuse the rules and the zones created from the irst task.The timer was started once the participant verbally conirmed they could see the interface in the scene.Once the robot correctly placed the inal object, the task was completed and the timer was stopped.( 4) For the Assembly-B task, participants were give 25 minutes and were allowed to reuse the rules and the zones created from the irst and second task.The timer was started once the participant verbally conirmed they could see the interface in the scene.Once the robot correctly placed the inal object, the task was completed and the timer was stopped.(5) Following the inal task, the researcher conducted a verbal interview with participants to understand their experience using PRogramAR.Then participants completed a questionnaire to gather demographics, and to assess the perceived usability of our system via the System Usability Scale (SUS) [14].Finally, the researcher debriefed participants by explaining the goal of the study and compensated them with a $15 gift card.

Analysis Method
To gather participants' feedback regarding their experience with PRogramAR, we conducted and recorded a semi-structured verbal interview with a pre-deined list of questions focused on the participants' interaction with the robot, the efectiveness of the programming tool, and their overall impressions of PRogramAR.We chose semi-structured interviews because it ofers a balance between structure and the lexibility to follow-up on unanticipated and interesting responses [27].To transcribe the interview recordings, we used an intelligent verbatim approach, aligning spoken data with written conventions while preserving the intended meaning and structure of the original speech [56].Our goal was to convey the key points and ideas in the conversation, rather than how it was said, and to improve readability.Following the transcription of our data, we applied thematic analysis, a method for identifying, organizing, and reporting patterns within a data set, enabling us to systematically summarize key features of verbal feedback gathered from participants.Thematic analysis is performed in six steps: (1) familiarization with the data, (2) generating initial codes, (3) searching for themes, (4) reviewing potential themes, (5) deining and naming themes, (6) producing the report [12].Our analysis revealed three themes, discussed in §5: (1) User-friendly robot programming, (2) Supporting diferent levels of expression, and (3) Supporting users through in-situ contextualization.

RESULTS
Following the completion of the study, participants gave our system an average SUS rating of 77.81 (SD = 11.79)resulting in what is an above average rating.A number of participants struggled to complete speciic tasks within the designated time constraints.We believe these instances are largely due to participants, many of whom had no programming, AR, or robotics experience, having a relatively short learning time (5 minutes) to become familiar with the many novel aspects of our system.Overall, participants averaged 16 minutes, 4 seconds (SD = 2 minutes, 39 seconds) to complete the irst task, 12 minutes, 40 seconds (SD = 2 minutes, 16 seconds) to complete the second task and 18 minutes, 15 seconds (SD = 3 minutes, 35 seconds) to complete the third, and we believe with higher time caps all participants would have eventually completed all tasks.In general, Task 1 took longer than Task 2 because participants needed extra time to understand the interface and the AR interactions.Participants could also re-purpose TAP rules created in Task 1 for Task 2, which saved time.As expected, Task 3 took the longest amount of time because it involved moving objects between the participant workspace and the robot workspace twice, rather than once (see Figure 4 for more details).
Over the course of the tasks participants became comfortable creating multiple rules at once and working in parallel with the robot.The number of participants who performed their tasks in parallel with the robot were zero (0%) in Task 1, nine (56.25%) in Task 2, and fourteen (87.5%) in Task 3.While this increase was afected by the task design (i.e., Task 3 was designed to generate more opportunities for parallel task execution), participants stated they became more comfortable using PRogramAR and progressively let the robot perform tasks simultaneously with them: P10:łIt was good once I got used to it.I think if you were in this environment, working this way . . . it will seem comfortable, to me once you do it a few times . . .By the third task, I was like, well, I'm going to be doing something while I'm having it do something.ž This emphasizes that with time to become comfortable, PRogramAR allowed users to manage their own tasks simultaneously with the robot's tasks.Below, we discuss other advantages and disadvantages of PRogramAR reported by participants grouped by the themes that emerged from our analysis.

User-friendly robot programming
Of the sixteen participants, thirteen (81.25%) commented on their positive experience using PRogramAR.These participants in particular appreciated its simplicity especially for non-experts, as relected in the following comment: P8:łI don't have to be a genius to be able to do this.It's not super confusing . . .I was actually surprised about that.ž Moreover, seven (43.75%) participants reported that using TAP within PRogramAR was less intimidating than typical computer programming, while three (37.5%)participants without a computer science background perceived that TAP was comparable to their current work applications and therefore felt familiar.
P11:łI wasn't really thinking about the coding element that much, which I think is good, probably in the sense of being user friendly.I don't think normal coding is super user friendly.ž P16:łI've done this type of if-then work in other database management . . .that's why I feel like I got used to the rule language pretty quickly.ž This feedback highlights that TAP is perceived as user-friendly when applied in an AR environment and reinforces the notion that TAP can lower the barrier to entry for robot programming as found in prior work [67].When participants were asked if they believed there was a group of users who would have a hard time learning this interface, four (25%) said that it would be their grandparents.On the contrary, one elderly participant (P7, age 83) who completed a modiied version of our study, due to diiculties manually assembling task objects, reported a positive experience learning and using PRogramAR.During the irst two tasks, this participant was provided additional guidance from the researcher, who helped them develop successful program for the irst two tasks.For the third task, due to the participant having diiculty putting objects together, they were instructed to put them to the side without assembling them, and to continue with the task as usual.For this modiied task, the participant was able to successfully build their program in the allotted time, without further guidance from the researcher.Including this participant's response, our SUS score rises to 79.12 (STD = 12.57).This experience suggests that users of all ages can adopt this technology if well-designed guidance is provided during the initial familiarization phase.This participant provided the following feedback on PRogramAR: P7:łIt was a simple interface to learn and just took a little to get used to . . .I have a 97 year old friend who refused to use a computer . . . he could have picked up on this I'm sure.ž To further enhance the programming experience for users, color-coded TAP rules were added to the Rule Manager (Fig. 2f).This feature indicated to users whether a rule could be executed given the current state of the world.Our feedback revealed that four (25%) participants found this feature to be particularly helpful for verifying the planned execution of created rules, e.g.,: P15:łI liked that it showed over here which rule is being executed and the highlighted true false condition, like when it was of or when a box was in a zone it was highlighted true and the if condition column was highlighted greenž P16:łI realized that the conditions weren't true because the box wasn't in that zone.So that helped me move the zone back where the box was.ž This indicates that the color-coded TAP rule feedback served as an efective visual cue, enabling users to make informed adjustments to their program.In summary, our user responses reinforce previous indings that the TAP paradigm can be adopted by non-expert users, and that proper feedback regarding TAP rules can enhance the programming process [48,73,77].Furthermore, it is encouraging that these ideas hold when providing TAP in AR to people of varying ages and backgrounds.

Supporting diferent levels of expression
As discussed by Senft et al., 2021 [73], our work similarly highlights that TAP supports diferent levels of expression.When it came to the inal task, participants were able to apply their own unique strategies.Of the eleven participants that successfully completed the third task, seven (63.36%) set up multiple zones for each object and placement position for their rules.All but one participant utilized a live programming approach throughout this task, continuously building and editing blocks of rules (Figure 2).The remaining four (36.36%) preferred to use a smaller number of zones that they moved around the workspace as the task progressed, also utilizing a live programming approach.Across all three tasks, we observed a small number of participants adopt a traditional programming process, deining all the necessary rules for completing the task before execution.Speciically, one (6.25%)participant in Task 1, three (18.75%)participants in Task 2, and one (6.25%)participant in all three tasks used a traditional programming approach.Although participants had the freedom to choose their strategies, it did not guarantee their approach would be efective or successful.One challenge ive (31.25%) participants encountered was with keeping track of multiple low-level rules.This was because participants often implemented too many zones and If-Then rules, for example: P9:łWhen I was adding the extra conditions and extra actions, I couldn't remember which ones I had already added . . .there's just a lot going on.ž Seven (53.84%) participants recognized that there might have been a more optimal approach than their initial implementation.Participants often attributed this diiculty to the time limit that created a sense of pressure to quickly incorporate multiple If-Then rules, which was the perceived easiest path.
P5:łI think if I had more time, I would have experimented . . .For the third test, I started to look at the other command, While-Do, to see what that meant.If I had more time to think of a more inessed trigger command, just but because the If-Then was familiar, and I knew it could still execute the task at hand.ž Therefore, although participants were given the lexibility to generate TAP rules how they wanted, future research should explore ways to assist users by automatically generating high-quality rules that participants may modify after its creation.This becomes particularly crucial as the complexity of real-world scenarios increases with a larger number of available rules, objects, and zones.

Supporting users through in-situ contextualization
A major characteristic of AR is its ability to merge virtual and physical worlds.Previous research has examined this property in order to improve robot programming using ARHMDs [7,16,18].One of the beneits of incorporating an ARHMD into robot programming is users are able to engage with the AR interface from anywhere in their workspace, unencumbered by a physical monitor.This beneit was recognized by two (12.5%)participants who stated: P16:łI like that no matter where I was sitting, I could kind of engage the interface.ž P17:łI like that if you feel like using this at home, you wouldn't have a bunch of, you might have a monitor, but it would just be like this and be pretty simple.ž Despite this positive feedback, one participant mentioned an alternate viewpoint.They would have preferred using a computer, since it was what they were used to, indicating that some users may be resistant to adopting technologies that are perceived as disruptive to existing worklows.Another motivation for incorporating an ARHMD was to enable users to visualize the entire workspace in which they were operating, rather than being limited to a single camera's point of view.During our study, two (12.5%)participants mentioned they appreciated this feature, for example: P2:łI like that you can see what's actually happening.I used VR glasses . . .and that was one of the ones where it's a video game where you can't really see your surroundings.So it was cool how I have an idea about where you are, but also seeing this other things.ž Furthermore, multiple participants successfully completed each task by utilizing PRogramAR to deine 3D location parameters on the shelf, which is diicult to do in prior work that utilized 2D screens.Another crucial feature that was provided by PRogramAR was the simulation component, which displayed motion plans using the Robot Digital Twin.This feature has been shown to improve safety and control in robotics applications when provided via an ARHMD [4,37,72].Seven (43.75%) participants mentioned using the simulation to build conidence in their program, with comments including: P5:łOnce I was able to have the If-Then statements do the simulation and make sure that it did the task as I intended, I felt pretty conident.ž P20:łI think it was better when I simulated it because at least I knew there was a pass . . . it made me feel better about it, doing it the way that it was supposed to and also just making sure that whatever I put as the rule was correct.ž Another participant envisioned the beneit of using the simulation tool when programming more complex tasks for service robots deployed in the home.
P13:łFor even more complicated tasks, like household chores or something robots were capable of, that would be really helpful to see like exactly what's going to happen once you program some set path or set of actionsž However, similar to prior observations (e.g., [16]), as participants became familiar with PRogramAR and the capabilities of the robot, eleven (68.75%) participants no longer found it necessary to rely on the simulation tool after the irst task.One limitation of our simulation caused by the motion planner, was if a goal was located too far for the robot to reach or if occluding objects prevented the robot from grabbing an object, no motion plan would be generated.Therefore, instead of providing a simulated motion plan, error messages describing these edge cases would be displayed above the robot.Consequently, four (25%) participants struggled with the inal task because they had diiculty deciphering the meaning of the error messages.P6:łI wanted to know what that error meant about the zone.There was no guidance on the panel . . .at one point I realized that I had accidentally moved one of my zones away from where I thought it was.I didn't realize that I had done that.ž Therefore, future research should continue to explore methods of integrating error feedback that naturally guides non-expert users towards identifying the source of program errors.Overall, participant feedback provided promising evidence regarding our motivation to integrate TAP into an ARHMD environment.For instance, participants derived conidence from using an ARHMD to verify their program using the simulation tool for the irst task.However, more work is necessary to further understand the efectiveness of our system in the context of non-expert robot programming.

DISCUSSION
In this study, we aimed to explore how the beneits of AR may enhance the process of programming reactive robot behaviors using trigger-action programming.First, our work conirms prior indings demonstrating that TAP is a user-friendly approach for enabling non-experts to programming robots.We observed PRogramAR accommodate varying levels of expressivity, with some participants working in parallel with the robot, while others worked sequentially, and most of the participants instinctively applied a live programming approach.It is also encouraging to see that these indings hold when providing TAP in AR to people of varying ages (e.g., 18ś83) and backgrounds, as multiple participants were able to complete the tasks within the given time frame using our system.Second, the inclusion of AR visualizations, particularly the 3D digital twin simulation, provided multiple users with conidence in their program execution.Participants also appreciated the ability to freely position the AR Interface anywhere in the scene and how AR allowed them to work within the entire human-robot workspace.We believe that by combining AR and TAP, our design of PRogramAR provides a starting point for future AR systems to build upon in order to provide richer expressions and visual debugging capabilities to users, thereby continuing to improve the end-user robot programming experience for all.

Limitations and Future Work
Although our validation of PRogramAR shows promise, limitations and future challenges remain.For example, the tasks programmed by users in our study were abstract (pick-and-place, generic assembly, etc.) and limited by the object tracking and manipulation capabilities of our system.Future work should examine more realistic and complicated tasks that initially motivated our work, such as tidying rooms or putting away dishes.To enable the speciication of TAP rules for more real-world tasks, our system needs to incorporate: (1) a more advanced object detection and tracking algorithm such as Yolo [43], (2) precise low-level manipulation capabilities that include generalizable object grasping techniques (e.g., Contact-GraspNet [74]) and actions such as twisting, opening, or deictic gestures (e.g., learning from demonstration [71]), and (3) increased TAP rule expressibility by including more object states such as dirty or clean dishes, social behaviors [48], and trigger-action rules (e.g., As-long-as-Do, If-When-Then [39]).However, to utilize these capabilities, PRogramAR will need to assist users with deining more complex robot programs.One way to do this is through virtual kinesthetic teaching, where users directly manipulate a robot's digital twin to ine-tune action plans.Researchers could also utilize the AR headset's egocentric camera to record users physically demonstrating tasks.This demonstration could then be translated into parameters for robot action programs.Developing such a system also unlocks opportunities to leverage AR's advantages in larger spaces, where users will need to create and keep track of more rules and virtual objects.For example, the Robot Digital Twin could demonstrate complex tasks like cooking and depict changes in object states (e.g., before and after food is cut or cooked) in a larger kitchen environment.In lengthy tasks, a simulation tool could condense robot actions into a shorter timeframe, facilitating quick debugging by users.Studies within larger environments, in which the workspace cannot be covered by a single viewpoint, may lead to greater insight into the beneits of AR over a 2D interface.However, these capabilities currently pose as challenges for future AR robot programming research, as they are not yet available in existing AR systems.Another limitation of our study was the long planning times of our mobile manipulator.This was a limitation of the solver, that would cause the robot to collide with objects if planning was done too quickly.To address this issue, future systems could continuously compute and store motion plans to be used when an action is triggered.In addition, reducing the planning time for future systems will help introduce more aspects of live programming, that could allow users to initiate re-planning of robot trajectories quicker.Also problematic was the task times that constrained the users.This constructed an artiicial limitation that prevented some participants from fully exploring the interface and creating desired rule sets.Therefore, future work might increase the task time or incorporate state of the art language models such as GPT-4, that could quickly provide applicable rules for participants to reduce their mental load [34,41].For example, participants could describe high-level goals using natural language, which could then be input into a large-language model.Then, the model could generate the robot program rules and provide users with a subset of task relevant trigger and action parameters for program customization.This approach would provide users with code templates, eliminating the need to build rules from scratch.Moreover, this could reduce the cognitive load for participants as they work with larger rule sets, a consequence of more complex tasks, environments, and advanced robotics systems.Finally, future work should investigate how to continue to incorporate error feedback that seamlessly guides non-expert users to the source of program bugs.For example, PRogramAR could be improved by providing a simulated motion plan that highlights problematic collisions that may occur, rather than describing errors in plain language [5,51,57].

CONCLUSION
In this work we present PRogramAR, a novel augmented reality trigger-action robot programming system that empowers non-expert users to create reactive robot behaviors for collaborative tasks.In the development of PRogramAR, we integrated concepts from various domains of robot programming and augmented reality interface research into a uniied and comprehensive system.Speciically, PRogramAR introduces a unique combination of trigger-action programming, ofering a high-level abstraction of robot programming concepts, and augmented reality feedback that is contextualized directly within the user's environment to facilitate the construction of accurate mental models of the programmed behavior.In our system validation, individuals of diferent ages and levels of experience successfully developed and deployed programs that enabled them to work in collaboration with the robot towards a shared goal.Moreover, the feedback received from participants support the advantages of merging augmented reality and trigger-action programming in the context of robot programming.We look forward to further explorations of this work in pursuit of a universally user-friendly robot programming system.

Fig. 2 .
Fig. 2.An example workflow for Task 2. Participants (a) started with all objects in the robot's workspace, (b) then created zones for each box and each place position.The goal was to (c) move the boxes to the exchange area so the participant could assemble the parts inside the boxes.Participants used the AR Interface (d-f) to create various TAP rules to be run on the robot.

Fig. 3 .
Fig. 3.The task workspace was divided into three areas, the middle of which both the participant and the robot could work in.The shelf on the let provided a 3D component to the workspace, which 2D interfaces cannot account for.

Fig. 4 .
Fig. 4. The time it took for participants to complete each task.The programs created in Task 1 could be reused for Task 2, resulting in quicker completion times for Task 2. Task 3 took the longest time because it required more steps.Five participants in Task 1, six in Task 2, and five in Task 3 were unable to complete the tasks within the alloted time.Their times are not shown within the graphs.

Table 1 .
A summary of quantitative results from our study.
of deining where objects are placed, triggers for performing actions, and multiple pick and place activities are transferable to other service applications (putting away dishes/groceries or tidying rooms) where similar speciications are necessary.Task 1. Kitting: (20 minute cap) Participants put two screws, one metal bar, one grey round fastener and one black round fastener, into four diferent boxes.