Unveiling the Dynamics of Human Decision-Making: From Strategies to False Beliefs in Collaborative Human-Robot Co-Learning Tasks

As robots become more integrated into humans' daily activities, it is essential to understand how human decision varies during co-learning with robots in real-world scenarios. Despite great advances in developing humanoid robots, which aims to foster a seamless collaborative world where humans and robots coexist, a gap remains in the social bond between humans and robots, particularly in tasks demanding optimal teamwork. In alignment with current pioneering efforts in the human-robot collaboration field, this paper presents an experimental study leading to a rationale analysis and classification of human behavioral dynamics during a joint collaborative pick-and-place task with a robotic arm. Our post-experimental analysis categorized human behavioral dynamics into three distinct broad categories, which are "strategic explorers and decoders", "reactive navigators and dynamic responders", and "score maximizers and ideal collaborators". We provide in-depth analysis for each group, exploring potential reasons for their observed behavioral patterns and irrational decisions substantiated by intuitions from psychological and behavioral game theory, including concepts of false belief and strategy development.


INTRODUCTION
At least once in our lifetime, we all experience a moment walking through a narrow corridor or hallway, when another person approaches from the opposite direction, as illustrated in Figure 1 (a).Although we intend to avoid each other and proceed on our respective paths, we accidentally both move to the right, then to the left, and again to the right, in unintended synchronization.This undesirable and awkward situation is commonly known as sidewalk shufe, hallway dancing, or footpath foxtrot.A similar situation can occur in robot-robot or human-robot navigation, as shown in Figure 1 (b).This can cause signifcant hindrance in smooth human-robot collaboration, especially now, when ground and aerial robots serve humans in various applications, including warehouse manipulation, delivery, rescue operations, and entertainment [5,11,17,19,22].
To avoid such an oscillation in human-robot collaboration, Young et al. developed a dual expert chatter algorithm (or DEA) and a human-aware dual expert algorithm (or HADEA), which they tested in simulations for a double feedback closed-loop human-robot colearning system [27].The authors refer to the term "chatter" to the instances where humans and robots make the same choices.They used the famous Rescorla-Wagner model [18] in psychology to simulate human choices based on robot past choices [2,15,16,20].However, in joint human-robot real-world tasks, human decisionmaking behavior does not strictly obey the Rescorla-Wagner model [4,6,8,10,25].It may be more rational, strategic, irrational, random, or a combination of all these [3,7,13,21].
A brain monitoring study revealed that a specifc region of the brain linked to social cognition is more active in human-human interactions than in human-robot interactions [23], suggesting that the lack of social bonding in the latter could hinder efective collaboration in tasks requiring high cooperation from both sides, such as joint pick-and-place tasks.In line with these fndings, it is essential to address the core challenge of accurately identifying human decision-making models during co-learning tasks with robots, to bridge the existing cognitive and social gaps between humans and robots [1,9,14,24,26].Therefore, this paper presents an experimental study on a joint, concurrent pick-and-place task performed by a human and a robot, involving 11 human participants ( = 11).Additionally, we have classifed the observed behaviors of the participants into three groups and provided a comprehensive reasoning for their behavioral choices and actions during the game.

EXPERIMENTAL FRAMEWORK AND PROCEDURE
Contrary to the hallway dancing problem, the term "chatter" in this paper for the pick-and-place task denotes the misalignment of choices between human and robot while simultaneously selecting between a red and blue marker, as shown in Figure 2. The colearning experiment detailed in Section 2.2 was inclusive to all and did not target any specifc demographic population, except age: all human subjects were university students possessing developed theory of mind [12].

Experimental Scenario and Setup
We leveraged a 6-DOF collaborative Ned2 robotic arm as a counterpart to human subjects.For six subjects, the robot operated using the DEA algorithm, while for the other fve, the HADEA algorithm was used [27].To further challenge the human decision-making process, we subjected some participants to eight initial random choices by robot, discussed in Section 4.
The participants were unaware that the robot followed these algorithms to avoid chatter and would try to be collaborative.Our intent was to observe whether or not they would make signifcant eforts to fgure out if the robot was following a pre-set algorithm or predictable pattern.The observations of this experiment were discussed in Section 3. The implementation of chatter-avoiding DEA and HADEA algorithms in the context of this pick-and-place task is shown in Algorithm 1, where ' ' and ' ' represent the weights for choosing red and blue markers, and ' ' and ' ' denote the robot's and the human's choice at each iteration.

Co-learning Experimental Workfow
At the beginning of the experiment, each participant was motivated to score as high as possible over 20 iterations of selecting a marker, either red or blue, to match the robot's choice every time.The step-by-step experimental procedure is described in the following.

BEHAVIOURAL CLASSIFICATION AND RATIONALE ANALYSIS
It is clear from our observations that human behavior exhibits considerable diversity, with decisions varying signifcantly from individual to individual.This variability presents a challenge to come up with a robust, accurate, and one-time model that encapsulates all aspects of human behavior.However, it seems practical and of real-world applicability to categorize behaviors into distinct groups, allowing the development of targeted behavioral models for each classifed group.Subjects 1, 2, 5, and 6 were grouped as outlined in Section 3.1; Subjects 3 and 4 in Section 3.2; and Subject 7 in Section 3.3.

Strategic Explorers and Decoders
In this group, participants showed a common trend of sequences, including initial exploration and pattern decoding, followed by strategy adjustment and one or two instances of false belief towards the end.The behavior of subjects 1, 2, 5 and 6 closely mirrors with slight variations in the occurrence of these events shown in Figure 3 (a) (b) and (c).Figure 4 explains the behavior logic and the response rationales of Subject 1. From the human choices depicted in Figure 3 (a) of subject 1, we can see that subject started by selecting blue and succeeded in the frst 3 trials.However, after 3 trials, the subject switched to red possibly for two primary reasons: frst, there was a greater probability that the subject wanted to test the robot, especially since the game had just started.The participant might have been curious about how the robot would react to a change in their choice and if the robot's actions were genuinely infuenced by their decisions.Second, there was a lesser chance that the subject had a false belief about the robot's choice.Given it was early in the game and the robot hadn't switched colors before, the probability of the participant assuming the robot would choose red was relatively low.However, it is aberrant that choosing red has a higher probability.At this point, the subject also began making eforts to decode the robot's pattern, and it seems the subject perceived the robot has a pre-defned pattern, of changing its choice from red to blue (or vice versa) after every 3 or 4 iterations.With this perceived logic, the subject adjusted their choice from blue to red in the 11 ℎ iteration.At this point, it became evident to the subject that the robot was adapting to their choices, suggesting that the human was leading the robot in decision making.
As time passed, the subject became more motivated to achieve a higher score, prompting a signifcant change in their previously developed strategy.After three consecutive red choices, the subject decided to test their previous strategy in the 14 ℎ iteration sticking to red.Towards the end, it appears that the subject switched to blue, primarily driven by a false belief about robot action.The subject might have assumed that since they had previously chosen blue, and given that several iterations had passed without selecting blue, the robot might anticipate a blue choice.This led to the subject choosing blue in the 19 ℎ iteration.However, the robot, operating on a chatter-avoiding DEA algorithm, took a few iterations to avoid chatter and reduce weight for red choice, resulting in the robot choosing red for the last two iterations.

Reactive Navigators and Dynamic Responders
From the chatter frequency plots in Figure 3 (d) and (e) of subjects 3 and 4, it is evident that they exhibit the highest chatter frequencies among all participants, with 11 and 9 number of chatter spikes, respectively.Consequently, they achieved the lowest scores in the game, of 9 and 11 points, respectively, out of 20.These plots suggest that the behavior of these subjects is a mixture of reactive and predictive decision-making.They often provide an immediate counter-response to the robot's selections, through out the experiment, indicating an inclination to be more on outsmarting the robot on a move-by-move basis instead of sticking to their previously developed strategy or attempting to decode a pattern in the robot's decisions.
Unlike the other participants, these participants did not maintain a consistent color choice for extended periods to discover the robot's reactions; instead, their decisions were noticeably more dynamic.Modeling this small subset of individuals poses a signifcant challenge due to the aforementioned uncertainties and frequent irrationalities in the behaviors.

Score Maximizers and Ideal Collaborators
In this group, we classify participants who are intensely focused on achieving a high score and who aim to collaborate efectively with the robot to complete the task as soon as possible without testing or tricking the robot much.Figure 5 explains Subject 7's behavioral logic and response rationales.
From the decision plot shown in Figure 3 (f), we observe that Subject 7 started with a blue choice and, perhaps out of curiosity to test the robot, quickly switched to red to see the robot's reaction.Subsequently, the subject reverted to the initial blue choice.This rapid alternation between blue and red at intervals of one trial suggests a score-centric behavior, indicating that the subject was not keen on spending time testing or outsmarting the robot.Instead, their main focus was on achieving a high score.
Subject maintains this blue choice until the 6 ℎ iteration with an impressive score of 4 out of 6.However, at this point, the subject developed the false belief that the robot would now select red.This assumption was possibly based on the human's switch to red from blue during the 2 iteration, after which the robot, operating on the DEA algorithm, also changed to red in the 3rd iteration to avoid chatter with human's choice.This sudden false realization excites the subject to shift from blue to red in the 6 ℎ iteration.After recognizing that subject is leading and guiding the robot's decisions, the human confdently pursued their ultimate goal of maximizing their score and consistently chose red until the end.

NON-DETERMINISTIC ROBOT CHOICES
To increase the cognitive load on human decision making, we make the initial 8 choices of robot random.From 9 ℎ iteration, the robot started following the DEA algorithm.Figures 3 (g) and (h) show two notable observations; subject 10 achieved the highest score; luckily, during their initial 8 trials, there was only 1 instance of chatter.On the contrary, Subject 8 encountered chatter in frst 4 of the initial eight trials.This analysis suggests that, despite having a logical and sequential model for human behavior, unpredictable and contrasting outcomes can occur at any time.

CONCLUSION AND FUTURE WORK
In this paper, we have explored the chaos of human decision-making process during a joint co-learning pick-and-place task between a human and a collaborative robotic arm.While the arm operates on a Dual-Expert Algorithm and a Human-Aware Dual-Expert Algorithm to avoid chatter and maintain adaptiveness, it quickly adjusts to match human decisions post-chatter occurrence.Through the execution of human-robot co-learning experimental tasks, we have collected decision choices (red/blue) for both human participants and robot.And we tried to infer common trends and outliers in human decisions.We have classifed the human behavior into three broad, distinct categories: frstly, those subjects who believed the robot followed a pre-defned pattern so they attempted to decode it initially; secondly, subjects who exhibited more antagonistic and reactive behaviors in response to the robot's choices; and thirdly, those who focused on maximizing their performance by aiming to score highly during the game.Across all groups, we have observed a sudden episode of false belief in participants about the robot's next action, leading to a deviation from their current choice or strategy.
Acknowledging the imperative need to bridge the cognitive and social gaps that currently exist, thus making collaborative tasks more efcient and efective, our work is expected to serve as a foundational framework for developing robust human decision-making mathematical models.Ultimately, these models could assist in developing more robust chatter-free collaborative expert algorithms for robots in various real world applications, such as warehouse robots, ground and aerial delivery robots that experience human-robot or robot-robot adversarial problems.

Figure 2 :
Figure 2: Experimental Setup showing No Chatter Instance

Figure 3 :
Figure 3: Experiment data: (a) (b) and (c) as Strategic Explorers; (d) and (e) as Dynamic Responders; (f) as Score Maximizers; and (g) and (h) with 8 random robot choices at the beginning.

Figure 4 :
Figure 4: Rationale analysis of subject 1's decision behaviour

Figure 5 :
Figure 5: Rationale analysis of subject 7's decision behaviour