Hands-On Robotics: Enabling Communication Through Direct Gesture Control

Effective Human-Robot Interaction (HRI) is fundamental to seamlessly integrating robotic systems into our daily lives. However, current communication modes require additional technological interfaces, which can be cumbersome and indirect. This paper presents a novel approach, using direct motion-based communication by moving a robot's end effector. Our strategy enables users to communicate with a robot by using four distinct gestures -- two handshakes ('formal' and 'informal') and two letters ('W' and 'S'). As a proof-of-concept, we conducted a user study with 16 participants, capturing subjective experience ratings and objective data for training machine learning classifiers. Our findings show that the four different gestures performed by moving the robot's end effector can be distinguished with close to 100% accuracy. Our research offers implications for the design of future HRI interfaces, suggesting that motion-based interaction can empower human operators to communicate directly with robots, removing the necessity for additional hardware.


INTRODUCTION
In recent years, rapid advancements in robotic technologies have led to their growing integration in our daily lives, acting as versatile assistants in workplaces and homes [2,28,34].These robotic companions enhance human capabilities and efficiency and, as such, substantially change how we interact with the world [19].As robotic solutions evolve and diversify, their capacity of autonomous actions increases -with seamless close-contact interactions between humans and robots becoming a reality [15].However, for successful Human-Robot Collaboration (HRC), effective communication channels must be established for accurate transmission of intent and coordination of respective actions [35].Traditional modes of HRI, such as voice commands [6] or touch interfaces [3] can effectively convey instructions to robots.Yet, as robots take on more complex tasks and work in proximity to humans -sometimes working hand in hand -the need for a more natural and intuitive communication approach becomes apparent.Motion-based communication, where humans actively manipulate the robot's end effector, can be a viable solution to this challenge.By mimicking how we naturally interact with one another, motionbased communication bridges the gap between humans and robots, promoting a more intuitive and seamless exchange of information and -consequently -improved collaboration.
Therefore, tracking and measuring movements is a crucial aspect to consider.Two types of tracking approaches -relative and absoluteexist.Inertial sensors (e.g., accelerometers [7]) provide relative tracking information, while optical tracking methods (e.g., cameras [1]) offer more exact positional data.Previous studies, like those using a Nintendo Wii controller [37], have successfully demonstrated gesture recognition using inertial sensors with limited training samples.In contrast, our approach focuses on mechanical tracking -positioned between relative and absolute trackingusing a seven Degrees-of-Freedom (DoFs) robotic arm for precise movement data.Our research uses the robot's joint motion data to enable accurate and efficient motion-based communication.
Gesture recognition has long been recognized as a challenging yet crucial aspect of enabling natural and intuitive communication between humans and machines.Recognizing gestures using pattern recognizers from the $-family has been one of the earlier approaches in the field [4,5,[40][41][42]45], leading to numerous follow-ons by other researchers [17,24,26,27].These recognizers utilize a predefined set of gesture templates to match and identify user motions.Templatebased solutions showed robust performance both in 2D and 3D spaces [13].
Machine Learning (ML) techniques are potent tools for gesture recognition [33], allowing systems to learn and adapt to a wide range of gestures.Although it may seem intuitive that existing ML approaches will work for recognizing gestures directly performed with a robot's end effector, so far, this remains unproven.When designing gestures, affordance plays a significant role in determining the gesture vocabulary.An artificial hand on a robot may prompt gestures like a handshake, while the affordance of a knob encourages other gestures.
This work uses the robot's sensor data to study recognizing and distinguishing gestures performed by direct interaction with a robotic arm.We performed a laboratory study (N=16) to assess the feasibility of four distinct gestures and determine their respective recognition accuracy, resulting in a f1-score up to 0.99.Here, we demonstrate the feasibility of direct interactions' usage for robust gesture recognition in HRI.

RELATED WORK
Our research integrates insights from collaborative robots in closecontact interactions and embodied user input for gesture recognition and classification.
In HRC, collaborative robots -known as cobots -are increasingly common in various settings, including domestic care [9,34].They are categorized based on environment sharing [30,36,38] and types of cooperation [8,21].Previous research focused on cobots adapting to human movements and behavior, while maintaining appropriate distance [31], avoiding collisions [22], and customizing assistance based on skills [10] and comfort [14].Supporting this,Drolshagen et al. indicated no adverse effects on collaboration when safety aspects are met [16], while Maurtua et al. questioned study participants that anticipate increased interaction with cobots in the future [29].Efficiency in HRC can be enhanced with suitable techniques and sensors [11].The embodiment of cobots positively affects perception and trust [47], while touch-based interactions improve non-verbal communication and reduce human stress responses [44].Interactive perception combines physical and traditional methods but may be limited by occlusion [25].
Gesture recognition plays a pivotal role in HRI.Utilizing the acceleration sensor of the Nintendo Wiimote, Schlömer et al. implemented gesture recognition by employing a hidden Markov model (HMM) for training and recognizing user-selected gestures [37].Despite the small training set, their evaluation demonstrated an accuracy between 0.85 to 0.95.Wu et al. proposed a similar approach using acceleration-based movement data, achieving an accuracy of almost 0.99 for four gestures using the FDSVM method.Cabrera and Wachs used the Microsoft Kinect sensor to use skeleton data for one-shot gesture recognition [12], comparing three classifiers with accuracy between 0.81 and 0.86.Using a WiFi-based approach, gesture recognition for multi-user applications was demonstrated by Venkatnarayan et al. [43].Their system identifies concurrent gestures, quantifies the total gesture number, and creates virtual samples for different combinations using training data from a single user.Achieving an accuracy of over 0.90, it can recognize up to eight gestures performed simultaneously.
By leveraging knowledge and techniques from collaborative robotics and gesture recognition, advanced systems capable of accurate gesture recognition -while adapting and responding to human movements and behavior -become possible.These capabilities result in smoother and more efficient HRC, fostering overall safer and more productive interactions.

GESTURE INPUT INTERACTIONS
We investigate the use of a robotic arm for effective embodied gesture recognition, specifically focusing on movements that directly manipulate the robot's end effector through physical interaction.In domestic care, assistive robotic arms often use multi-finger end effectors comparable to a human hand.For this kind of gripper, we use two types of handshakes as natural gestures.The robot's flanch also supports adding a grasping object (e.g., a protrusion).
Several gestures have been introduced in prior works [23].Here, we selected two letter-based gestures and two handshake gestures, simulating a natural human-like interaction with the robotic arm (see Figure 1).To accommodate the differing movements and grips between the gestures (letters vs. hand), we attached a spherical knob for the letters, enabling a firmer grip and smoother motion in 3D space.A hand model resembling a human hand was used for a more lifelike interactions.
Letter Using the knob end effector, we investigated two types of motions: circular, represented by the curved letter "S" fostering a smooth transition in performing the gesture, and linear, represented by the sharp-edged letter "W" consisting of several stops for direction changing during the motion.Handshake This gesture imitates the traditional human-tohuman handshake and, as such, was performed with the hand end effector.Potential applications include an introductory interaction to initiate communication with a cobot to start a procedure.We examined both the formal handshake, involving grasping the hand followed by an up-and-down movement, and the informal one, also known as the G-lock handshake.

STUDY: GESTURE RECOGNITION USING A ROBOTIC ARM
In this study, we introduce a novel approach in human-to-robot communication through gesture-based mechanical manipulation of a robotic arm.Our method centers on extracting movement values from the robotic arm's joints and does not depend on additional tracking requirements.We investigate how mechanical manipulation of a robotic arm can achieve accurate gesture recognition.

Study Design
We conducted a within-subject controlled laboratory study to assess the accuracy of gesture recognition.The independent variable were gestures, with two pre-selected letter gestures (denoted LS and LW ) and two handshake gestures (denoted HS and GL).The gesture recognition accuracy serves as the dependent variable.

Participants and Procedure
We recruited 16 participants (6 females, 10 males), aged between 22 and 35 years(M=27.75,SD=3.96) via mailing lists and social media.All participants were right-handed and reported no motor functions limitations or injuries.The study received approval from our institution's ethics committee, and each participant received a 10 Euro remuneration upon task completion.Study sessions for each participant began with the experimenter explaining the study's purpose and demonstrating the interaction process.Subsequently, participants completed demographic and consent forms.At the start of each run, the robot arm end effector was automatically positioned to a predefined starting point to maintain consistency across all participants.The study conductor briefed participants on the interaction pattern, allowing them to start and stop freely.The conductor then started the recording of the robot's movement.The corresponding movement data were recorded while participants were moving the robot arm end effector in the desired gesture.Each gesture was recorded five times consecutively, totaling 20 recordings per participant.Gesture order was counterbalanced using a Latin Square design.Following the completion of all recordings, a post-study questionnaire was administered.On average, participants finished the entire task within 40 minutes.

Apparatus
For our study, we used a Franka Emika robot placed on a fixed table at a height of 61 cm [18].Both types of end effectors were 3D printed by the research team; the models were acquired from an opensource library [32].Manipulating the robotic arm required setting the robot status to the free guiding mode for a safe interaction.We affixed a clamp to engage the free guiding mode buttons, allowing users to freely execute gestures.This action also deactivated any ongoing autonomous movements by the robot.While performing the gestures, participants did not receive any additional feedback.

Results
In our analysis, we collected objective and subjective measures.We applied non-parametric Friedman tests to detect significant main effects between gestures.Post-hoc, we conducted Wilcoxon signedrank tests (Bonferroni corrected) for pairwise comparisons.The effect sizes of the Wilcoxon tests are reported as r (r: >0.1 small, >0.3 medium, and >0.5 large effect).

Objective Measures.
For objective measures, we report the median (interquartile range) of the temporal and spatial dimensions of each gesture in ascending order (see Figure 2 left).

GESTURE RECOGNITION
Applying ML-based gesture recognition, we conducted a crossvalidated classification using an 80-20 and 20-80 training-testing data set and evaluated a user-independent classification.
Data Analysis.For the training process, we computed four lowlevel descriptive statistical values (minimum, maximum, mean, and standard deviation) for all positions, as well as velocity and effort values per joint for each of the seven joints.This combination resulted in a set of 84 features per gesture, per participant.As a classifier, we used the Random Forest (RF) implementation of scikitlearn with default parameters (e.g., 100 estimators) [39].
Cross-Validation Classification.To analyze the effect of differences between gestures, we calculated the mean accuracy for each individual gesture using k-fold cross-validation (k=5).We performed the classical 80-20 split of training and test set, and achieved an average f1-score of 0.99.To further test the robustness of distinguishing gestures across participants, we performed an inverse classification, where the train and test sets were 20% and 80%, respectively.An average f1-score of 0.96 was achieved.The confusion matrix of the 20-80 classifier is illustrated in Figure 2 (right).
User-Independent Classification.To test the generalizability of our approach, i.e., its performance against new and unknown users, the data set was equally divided into disjoint training and test sets (i.e., with no overlap of participants).Classifying two folds achieved a mean f1-score accuracy of 0.91 (0.85 and 0.97 respectively).

DISCUSSION
We investigated gesture recognition by using motion data of a robotic arm manipulated by the user, by incorporating relative and absolute tracking data of the robot's end effector.A critical consideration is the robot's inverse kinematics (IK), which may deviate from natural human motion due to joint angle constraints.
Validity of Mechanical Interaction with a Robot.Cross-validation produced promising results, with varying train and test set sizes achieving f1-scores between 0.96 and 0.99.These outcomes highlight the potential for solid performance even with limited training data.Furthermore, user-independent classification demonstrated robustness in recognizing gestures, irrespective of users, achieving an accuracy rate of 0.91.These findings underscore the practicality of our solution across diverse users and scenarios, particularly in systems requiring gesture recognition from different individuals.
Subjective Feedback.Participants perceived the gestures as neither mentally nor physically demanding, indicating a convenient and natural interaction.However, this perception might be influenced by the limited number of repetitions (N=5/gesture).Possible applications of this method include interrupting an undesired ongoing task or altering direction through a single, clear, and unambiguous command.
Limitations & Future Work.We selected and assessed various gestures with different handles for the robot's end effector, though not exhaustively exploring every combination of gesture and handle.Furthermore, our repertoire of gestures was limited to just four distinct ones.
Future work will involve expanding the number of repetitions per gesture and incorporating additional gestures.We anticipate conducting further studies based on this work, potentially involving a more diverse range of user groups.This may also encompass studies exploring the impact of altering robot tasks through mechanical input.

CONCLUSION
We highlight mechanical-based gestures in HRI by manipulating two robot end effectors (knob/hand) with four gestures (two letters / two handshakes).Classification achieved 0.91 for user-independent and 0.99 using 80% of the collected data for training.This robust gesture recognition establishes mechanical interaction with the robotic arm as a feasible, immediate, and intuitive user input.