Virtual Reality-based Human-Robot Interaction for Remote Pick-and-Place Tasks

Virtual Reality (VR) has emerged as a promising medium, enhancing human-robot interaction by offering a more immersive and intuitive interface. This paper presents a novel approach utilizing VR, particularly the Meta Quest 2 device, as an interface for remote control of a robot performing a pick-and-place task in a proxy data center environment. A user interface was created in VR, enabling users to control the Fetch Mobile Manipulator robot and monitor its pose for secure and safe remote operation. This work integrates virtual reality, computer vision, and human-robot interaction methodologies. The system demonstrates the effectiveness of this VR-based human-robot interaction approach, showcasing its potential to enhance productivity and safety in industrial applications involving remote robot control.


INTRODUCTION
The feld of human-robot interaction (HRI) has witnessed signifcant progress driven by advancements in robotics and the increasing need for seamless collaboration between humans and robots [1,2].Recent research highlights the role of Virtual Reality (VR) in enhancing the immersion and realism of these interactions.Despite challenges such as latency and mismatches between virtual and real environments, the benefts of VR in improving communication, increasing safety, and enhancing training are notable [3].
The development of intuitive interaction methods, including head-mounted displays for remote control and gaze-based interaction techniques, underscores the potential of VR in creating engaging robot experiences [4,5].These advancements are crucial in settings like hospitals, schools, and factories, where efective human-robot collaboration is essential [6,7].
Furthermore, mixed-reality-enhanced HRI systems using imitationbased mapping approaches for teleoperation emphasize the integration of VR in enhancing teleoperation efciency, especially in precision tasks like remote pick-and-place operations [8][9][10].
In this paper, we present a comprehensive study of integrating VR into human-robot interaction for remote pick-and-place tasks.Our research introduces an innovative approach using the Meta Quest 2 device, enabling users to remotely control the Fetch Mobile Manipulator robot in a simulated data center environment.We explore the intersection of VR, computer vision, and human-robot interaction methodologies, demonstrating the system's efectiveness in enhancing industrial productivity and safety.Our contributions include a novel VR-based user interface, advanced perception through object detection, and efcient manipulation and navigation techniques.The experimental results and discussion sections provide in-depth analysis and insights into the system's performance, underlining its potential for practical applications in complex and dynamic environments.

RELATED WORK
The Human-Robot Interaction (HRI) feld is rapidly evolving, with key contributions across various domains.Sheridan [2] and Goodrich and Schultz [1] provide foundational insights into the current state and challenges in HRI.Steinfeld et al. [11] ofer crucial metrics for assessing human-robot interactions, essential for developing user-friendly robotic systems.
Lezoche et al. [9] discuss robotics in agriculture, focusing on productivity and sustainability, while Rufaldi et al. [6] highlight the importance of clear communication in HRI through augmented reality.Fong et al. [12] and Matarić [13] underscore the signifcance of social dynamics in HRI, exploring socially interactive and assistive robots.[14] Breazeal [15] traces the commercial journey of social robots, and Hofman and Ju [16] emphasize the critical role of movement design in robots for improved interaction.
Thrun [10] presents a theoretical HRI framework centered on autonomy and collaboration.Dragan [7] explores incorporating human behavior models in robot planning for better adaptation in shared environments.Broadbent [17] investigates the psychological aspects of human-robot interactions, and Turkle [18] examines the societal impacts of technology on communication.Riek [19] and Kanda & Ishiguro [20] highlight HRI's application in complex felds like mental health and education.

System Workfow
The system workfow comprises three main components: perception, navigation, and manipulation, as illustrated in Figure 1.The primary objective is to enable the robot to navigate to the tool wall, fetch a hammer, and place it in the designated location.
The VR part provides immersive visualization, facilitating path planning and task simulation.It enables operators to remotely control the robot, efectively bridging the gap between human input and robotic action.The Navigation part processes this input to direct the robot to the desired location, ensuring it is precisely positioned for task execution.It relays spatial data back to VR for accurate simulation feedback.Concurrently, the Manipulation part translates the simulated tasks from the VR space into physical actions, handling objects with dexterity informed by the robot's situational awareness from the Navigation part.This synergy ensures that the robot operates with an informed understanding of its environment, guided by human intent and refned by sensory feedback.

Perception
In this system, the perception part comprises two main components: User Interface and Object Detection.User interaction primarily occurs through a VR device, the Meta Quest 2, enabling interaction within a virtual space that represents a work environment and displays the robot's real-time location.Hammer detection, executed by object detection and an image segmentation algorithm using the YOLOv8 model, is trained with custom data from the makerspace tool wall.Camera feed processing occurs in real-time via the robot's onboard computer.[21] Image segmentation data for real-time hammer identifcation is obtained by capturing images in the makerspace environment and annotating hammers with Robofow for model training preparation.The customized dataset includes images reshaped to 640 × 640 × 3 dimensions, labeled with segmentation coordinates for the single class of hammer.Data augmentation involves removing low-quality images, such as those with blurred hammer shapes or hindered segmentation due to tool wall background, image rotation, and lighting alterations.The dataset comprises 220 valid images, divided into 151 for training, 42 for validation, and 27 for testing.
Image segmentation is further trained using the state-of-the-art model YOLO v8 for image recognition and segmentation.Hammer segmentation for this task consistently yields accurate results.Results demonstrate that the model achieves over 98% precision and 99% recall on the segmentation task when the confdence level is approximately 0.5.Local inference time on the robot is approximately 200-300 ms, supporting a live stream video of 4-5 fps.
The objects' position in the real world is estimated by determining the center of the cropped point clouds corresponding to these objects.Given that the point cloud camera and the built-in RGB camera share the same base frame, the segmentation results (in 2D image) are used to segment the point cloud (in 3D world) to obtain the objects' points.The center of the object is estimated by calculating the mean value of the cropped points.The center point is then sent to a specifc ROS topic, enabling the manipulation subsystem to estimate the gripper's position for object grasping.
All user interactions in this project are accomplished using virtual reality (VR).VR is chosen as a medium to accommodate remote workers.Theoretically, users located anywhere can utilize this system for remote maintenance of tools via their VR device.Data centers are generally specifcally located geographically for power and cooling reasons, while tech employees can work from anywhere.
The virtual environment modeled and tested is a makerspace with a fxed layout.The virtual environment is modeled in Unity based on architectural plans and approximate measurements of the physical space.A C# script in Unity establishes the connection between the headset and a computer running ROS via a WebSocket server and client.The ROS-enabled computer runs the server, while the headset operates as a client.User commands executed with the VR headset's controllers are sent from client to server as a formatted string.The server processes this string to execute ROS actions and publish to ROS topics subscribed by navigation and manipulation subsystems.In response, the server sends the client a formatted string with information about the robot's current position and status for display to the user.Location data is processed to transform a model of the robot for display within the virtual environment.Status data is simply displayed on the user interface.
VR ofers a more natural telepresence experience compared to traditional 2D screen interfaces.Within the virtual environment, users can teleport around the building to monitor various smart building features and current robot statuses.Users can assume manual control of a robot at any time with the touch of a button.Upon taking control, the robot's eye movement synchronizes with the user's headset orientation, allowing the operator to navigate the robot using a joystick.In designated areas, users instruct the robot to execute tasks with pre-defned afordances.The demonstration showcases that the robot identifes objects for retrieval, users select the desired object, and the robot's manipulation subsystem executes the "pick" command.

Navigation
Navigation relies on the ROS Navigation Stack.A map is created using teleoperation, enabling the robot to navigate the area.The navigation system includes key location points such as the start point, end point, bypass stops, and home location for the robot.These points are essential for guiding the robot to various destinations according to task requirements.To ensure safe navigation and avoid collisions with objects and humans, a keepout map is developed, delineating areas of-limits to the robot.This map, outlining boundaries and potential obstacles, facilitates autonomous navigation while adhering to safety zones.
Additionally, computer vision techniques are employed to align the robot perpendicular to the tool wall.This involves using a camera on the robot's gripper to capture wall images.Image processing algorithms then analyze these images to ascertain the robot's angle relative to the wall, allowing for adjustments to achieve perpendicular alignment.

Manipulation
In the manipulation component, the focus is on developing a robust manipulation node for operating the robot arms across various tasks.The MoveIt package, a motion planning framework, is utilized to implement key functionalities.The developed manipulation node is pivotal for enabling the Fetch robot's manipulation capabilities.It facilitates precise control of the robot arms, interfacing with the MoveIt package for intuitive and efcient arm movement commands.Utilizing a fast inverse kinematics algorithm, the node computes joint confgurations rapidly, ensuring real-time arm movements and enhancing system responsiveness.
Furthermore, the node integrates trajectory planning for smooth, collision-free arm movements.Optimized paths for the robot arms are generated, balancing efciency and safety in complex manipulation tasks.The trajectory planning algorithm considers the robot arms' kinematic constraints and the environment to create trajectories that avoid obstacles and minimize joint displacements.This results in precise, efcient arm motions, enabling successful navigation through cluttered environments.
In addition to basic arm control and trajectory planning, the node includes advanced features like obstacle avoidance and grasper feedback.An obstacle avoidance algorithm uses sensor data, such as depth or laser scans, to identify and respond to obstacles in the workspace.Trajectories are dynamically adjusted in response to detected obstacles, ensuring safe arm movements.This feature enhances the robot's versatility and capability in unstructured environments and unforeseen obstacles, aligning with real-world application demands.
Additionally, the manipulation node includes grasper feedback to evaluate the success of object pickup attempts.Sensory feedback from the robot's grippers is integrated, allowing assessment of Figure 6: Tool pick up when doing the test whether an object has been successfully grasped.This information is critical for autonomous manipulation tasks, enabling the robot to make informed decisions based on the outcome of its grasping actions.The grasper feedback algorithm serves as a reliable mechanism for the robot to assess its performance and adjust its actions as needed, thereby enhancing the efectiveness of its manipulation capabilities.

EXPERIMENT, RESULT, AND DISCUSSION 4.1 Experiment Setup and Procedure
Environment: The experiments were conducted in a makerspace environment, modeled in Unity from architectural plans and measurements of a physical space.This environment was flled with typical makerspace obstacles such as tables, chairs and various workstations, creating realistic navigational challenges.The space was expansive, crafted to simulate an active makerspace with dynamic elements like pedestrians intermittently blocking routes and other environmental variables, adding layers of complexity to the robot's navigation and task execution.Equipment Used: The Meta Quest 2 VR device served as the primary user interface, allowing participants to immerse themselves in the virtual space and control the Fetch robot.The Fetch robot, integral to the experiment, was equipped with advanced sensors and a camera for object detection and manipulation tasks.This combination of the intuitive VR interface and the versatile capabilities of the Fetch robot enabled a seamless and interactive human-robot interaction experience.
In the experiment, participants equipped with a Meta Quest 2 VR device embarked on a task within a virtual makerspace.Their mission was to guide a robot to a tool wall and select a hammer.The VR interface allowed users to direct the robot's journey, experiencing the environment through its integrated camera feed.Upon reaching the tool wall, the users picked the hammer, prompting the robot to execute a precise pick-and-place maneuver.The task culminated with the users ensuring accurate hammer placement and receiving feedback, illustrating an immersive, interactive human-robot collaboration.

Result and Discussion
Performance Metrics: The average task completion time was 6 minutes.The success rate of correctly placing objects was 96%.The system recorded 4 errors, primarily related to grip misalignment and navigation challenges.Object Detection and Manipulation: The YOLOv8 model used for hammer detection achieved a precision higher than 98% and a recall higher than 99%.Manipulation tasks demonstrated high accuracy, with the robot successfully picking and placing the hammer in the majority of trials.Navigation and User Interaction: While the system showed efective navigation capabilities, some challenges in obstacle avoidance were noted, resulting in occasional delays.Participants reported a high level of engagement and intuitiveness in controlling the robot, though some faced difculties in precise manipulation.
The experiments conducted to evaluate the performance of the VR-based human-robot interaction system revealed several significant fndings.The use of virtual reality, especially the HoloLens device, provided an immersive and intuitive interface for operators to remotely control the robot in a pick and place task within a simulated data center environment.VR's visual feedback and spatial awareness signifcantly enhanced the operator's ability to precisely manipulate the robot's movements and grasp objects.
However, navigation faced challenges related to object avoidance.Although the system demonstrated efective navigation capabilities, there were instances where the robot encountered obstacles, leading to delays in determining alternative paths.These delays in path planning are attributed to the environmental complexity and the time required for the robot's perception algorithms to accurately detect and analyze obstacles.

Figure 4 :
Figure 4: The map for testing on GIX Building Level 2 and the red highlighted area is the keepout zone which has many impassable obstacles

Figure 5 :
Figure 5: Align to robot to the tools and wall based on camera and lidar