Recognition and Identification of Intentional Blocking in Social Navigation

One of the most studied interactions in social navigation is a collision between a human and a robot. An overwhelming majority of these studies focus on collision avoidance: shifting away from such situations or staying still until the conflict is resolved. However, to act socially, avoidance is not always the desired behavior. Consider a staff member in a hospital blocking a delivery robot’s path to type in a new delivery request. The robot should not steer away but rather stay put or even get closer to the person. This research provides a novel perspective on obstructions in social navigation. It does so by providing a vocabulary to distinguish intentional obstructions from unintentional blockings, and by designing a general obstruction-handling solution that can be augmented into robots both in academia and industry. This solution is named NIMBLE: Navigational Intentions Model for BLocking Estimation, and it provides a pipeline for handling intentional obstructions that is general enough to allow for varying implementations while maintaining a clear inference process for intentional obstructions. NIMBLE is evaluated using a case study of a robot navigating in a hospital. The paper provides a statistical analysis based on generated data and an exploratory evaluation using inputs from the robot’s sensors. Both effectively illustrate NIMBLE’s ability to distinguish between various intention types accurately.


INTRODUCTION
Service and assistive mobile robots will soon become integral to our daily tasks.However, there are still many challenges that need to be overcome.One such challenge is the issue of obstruction, which refers to a situation where a pedestrian blocks the path of a robot and prevents the robot from reaching its goal.In most cases, this interaction is handled as a by-product of collision avoidance, which is the common policy of navigation algorithms that assume that obstruction should be avoided.The robot should pass smoothly around people, usually without interacting with them directly [30].However, obstruction is a common issue that happens regularly in human interactions, where people often wish to interact rather than avoid collisions with each other.Similarly, this interaction should be acknowledged and addressed in human-robot interactions.Consider a delivery robot in a hospital on a collision course with a nurse.If the nurse looks directly at the robot, it is more likely that the nurse wishes to type in a delivery request into the robot's interface rather than collide with it.In this case, the robot shouldn't avoid the situation and even help by steering towards the nurse.Additionally, the robot must also recognize different intentions behind obstructions to determine the appropriate response.Contrary to the nurse example, hospitals are also environments known to be prone to cases of aggressive behaviors [16].When someone approaches the robot with the intention to damage it, it should move away quickly or sound an alarm.However, if the person is looking to take a selfie with the robot, it should probably act in a friendlier manner, either stopping and posing for the selfie or kindly ignoring it and moving on with its task.
This paper presents a novel approach for incorporating obstruction deliberation that could provide new system-level functionality to various robotic applications.It offers a formal definition of obstruction and suggests a coping mechanism for it.It involves three stages: detecting a conflict, classifying it as intentional or not, and identifying the intention behind the obstruction.The proposed mechanism, Navigational Intentions Model for BLocking Estimation (NIMBLE) is demonstrated using a Boston Dynamics Spot Robot, where three types of possible goals are being tested: Benign -a passerby that initiates a casual interaction with the robot; Authorized -a staff member that interacts with the robot to get help from it; And malicious -when a person tries to intentionally harm the robot.Each type is characterized based on attributes that can be extracted from visual sources such as speed and fiducial markers.To conclude, the contributions of this work are threefold: (1) A formal definition of obstruction.This definition will enable researchers to incorporate into a robot the ability to reason about obstructions in the social navigation context.This ability, in turn, will enhance the current navigating solutions with the fundamental capability of handling obstructions in a "social" way.(Section 3).(2) NIMBLE, a generic algorithm for conflict detection and classifying it as intentional or not.To provide a general solution for obstruction detection at a system level, the proposed algorithm does not depend on a particular problem domain (hospitals, warehouses, etc.) and does not rely on a specific type of sensor.An implementation of the algorithm is adjusted to a hospital case study, for deducing the intentions behind different instances of obstructions (Section 4).(3) Performance demonstration of the proposed algorithm, using both a combinatorial analysis of accuracy and a case study with a real robot in a lab setting (Sections 5,6).
This work prepares the grounds for robots to be able to address obstruction situations coherently, instead of consistently avoiding all pedestrians around.This new ability to distinguish between various forms of obstructions has the potential to bring about significant advancements in the functionality of social robots at the system level in various human-centric environments.

RELATED WORK
Socially-aware robots are required to have social skills to operate in human-inhabited environments.Such robots can follow a person, guide a person, fetch resources, and more.The field that investigates these tasks is called Social Navigation.Social navigation is an active research field combining perception, human-robot interaction, and motion planning with a common goal [20]: to enable a robot to navigate safely and efficiently among pedestrians without making them socially uncomfortable.Many studies research robots in public environments, where the robot navigates between pedestrians while treating them as obstacles.Such work has been prevalent since the beginning of social robots, with interactive museum tour-guide robots such as Rhino [5] and Minerva [47], whose navigation modules prevented collisions with humans and stopped the robots when a collision seemed close.Since then, many different approaches were proposed to model a collision interaction [8,14,28,[34][35][36] and to provide solutions to it [6,7,42,43,49].
Common to all of those solutions is that they do not consider that sometimes a pedestrian would like to initialize a close interaction with the robot, and the collision avoidance behavior can lead to undesired, inefficient, or unsocial behavior of the robot.Mavrogiannis et al. highlighted intention recognition as a core challenge of contemporary social navigation [30], but they focus on trajectory prediction for the sake of collision avoidance [23,32].Our work bridges the gap between existing work on social navigation and intention recognition by introducing a novel perspective on conflicts and providing a comprehensive framework for obstruction recognition and classification.We address the need for robots to recognize and respond appropriately to various human intentions behind obstructions, enhancing their social capabilities.
A different research branch focuses on goal and intention recognition from observations [45].In such problems, an observer aims to detect not only the current activity of an actor (e.g., walking in a particular direction) but also the reason for this activity (e.g., asking for the robot's assistance).Several papers incorporate physical embodiment as part of their recognition model [1,9,18,44], but to the best of our knowledge, this is the first work that leverages such goal recognition in physical environments to understand the intention behind a person who blocks the robot.

OBSTRUCTION IN SOCIAL NAVIGATION
We start by formalizing the meaning of obstruction (which is a type of intentional conflict) and how it differs from the more common terms collision and conflict.First, we refer to a collision as an objective social navigation state, where a robot and a human reach 0 distance from one another.While the word collision implies an undesired outcome, notice that sometimes it is a wanted state (e.g., for a hugging social robot [2,3] or a robotic dressing assistant [13,19]).Conflict, on the other hand, is a subjective state.In their survey on collision avoidance, Mirsky et al. [33] defined a conflict between a robot and other mobile robots or pedestrians as "a situation in which if there is no change of direction or a change in speed by at least one of the parties, they will collide." Notice that using this definition, not all conflicts are expected to end in a collision, and not every collision is preceded by a conflict (if both parties don't see each other before colliding).Moreover, unlike a collision, the presence of a conflict is a subjective matter that depends on the interpretation of the interacting parties.For simplicity, we refer to a party that perceives the coming collision as the observer and the second party as the actor.Consider, for example, the difference in navigational norms in different countries.Such cultural differences can affect the perception of a conflict, such that an observer will sense that a collision is imminent while the actor will not [15,31].Our research focuses on a different type of interaction, obstruction.Definition 3.1.An obstruction is a conflict where the actor intentionally wishes the observer to decrease its speed to zero.
Notice that this definition does not restrict the actor to be a human, and it can go both ways -the robot might block the human or vice versa.However, in this work, we focus on obstructions in which the robot is the party that is forced to come to a halt.Using this definition, for an interaction to be considered an obstruction, it must first be intentional, which is often the opposite of the situation in the collision avoidance case.This requirement enables us to later correctly classify obstructions when we identify an intention to collide.We can then further perform intention recognition when such obstruction is detected to understand better how the robot should react to this obstruction.In the context of existing literature on social navigation [11,30], we found that encounters with a robot can be divided into the following categories: (0) Unintentional collisions: is the most common type of investigated collisions, namely collision avoidance.The premise of this work is to disrupt the idea that all collisions are undesired and unintentional, so this category comes in contrast to the other blocking categories that are obstructions.(1) Malicious behavior: has been investigated in the past in various contexts of pedestrians who abuse robots [37], including specific work in hospitals [25], in workplaces [27], or violence initiated by children [4].(2) Authorized interaction: refers to interactions that are part of the expected role of the robot, often investigated in the context of service robots [10,21].(3) Benign interaction: is represented by incidental encounters in which the robot and the person each has their own objectives, that may not interfere with one another [40,46].Moreover, the three obstruction categories also align with the general types of collaborative team behavior known in the Multi-Agent Systems literature: adversarial, assistive, and independent [12].

NIMBLE: NAVIGATIONAL INTENTIONS MODEL FOR BLOCKING ESTIMATION
Our obstruction deliberation algorithm (Algorithm 1), Navigational Intentions Model for BLocking Estimation (NIMBLE), consists of three components for three stages of obstruction detection: (1) Conflict detection predicts for a pedestrian that is in the surrounding area of the robot if they are going to collide with the robot (Algorithm 2).(2) Conflict classification classifies whether the conflict is intentional or not when a pedestrian is in conflict with the robot (e.g., if the person is facing the robot or not).(3) Intention recognition identifies the intentions behind each detected obstruction using goal recognition techniques.2, showcasing a scenario involving malicious obstruction.To the left is the pipeline of the system's decision points.It starts at the top with continuous data extraction (such as camera inputs), followed by active evidence acquisition (e.g., pose estimation, speed), as demonstrated in Section 5.2.Once a conflict is detected (discussed in Section 4.1) and classified as intentional (Section 4.2), the intention of the obstruction is inferred based on the collected evidence (Section 4.3).To the right of Figure 2 is a visualization of the system's state and collected evidence as follows: A blue bounding box surrounds the detected person and their skeleton if detected.When the person enters the robot's "personal space", the bounding box changes to red, accompanied by the person's distance from the robot shown above the bounding box.Upon conflict detection, a white panel appears at the bottom of the image, presenting all relevant information for inferring the intention type.On the top left, the gathered evidence is presented.On the bottom, the intention type is displayed, each type denoted in distinct colors (benign in blue, authorized in green, malicious in red).The table shows the predicted probabilities for each intention type.The highest probability is represented by the darkest shade of color, while the lowest receives the brightest (white).Upon detecting a person, we actively collect relevant observations concerning that individual.In this case, the system identifies an offensive arm position by analyzing skeleton data.Furthermore, by considering the person's velocity compared to previous frames, the system deduces a high velocity.These observations are then forwarded to provide probability estimates for each potential intention type.In this particular case, the highest probability is assigned to a malicious intention, consequently leading to an accurate recognition of the obstruction as malicious.

Conflict Detection
Social navigation algorithms typically produce velocity commands that help robots navigate without colliding with obstacles or other agents [6,7].Although this approach is common and efficient, it overlooks a crucial aspect of social robots -the ability to proactively detect potential conflicts before they occur to determine an appropriate response to a given situation.In existing navigation solutions, conflict avoidance is often embedded within the navigation policy rather than explicitly calculating the predicted collisions.In such algorithms, it is not straightforward to properly examine a conflict and answer questions like "If no party changes trajectory, when and where will the expected collision occur?".Therefore, we propose a separate paradigm for conflict detection that can be integrated into existing systems, given each system's unique input.
Let us define  as all gathered data continuously processed from the raw output of the robot's sensors, and  as any additional evidence that can be collected upon request, such that  ⊆ .Both  and  can contain information regarding the robot's state (e.g., distance from walls and objects, list of previous positions, angles, velocity), the person's state (e.g., velocity, pose, head yaw), and aggregated information regarding both parties (e.g., whether the person is looking at the robot).A conflict detection function (Algorithm 1, Line 1) may take and leverage any part from  that is available to it.A conflict between the robot and a pedestrian is detected when the function   returns an indication for obstruction.This indication could be a future intersection point, a binary value, time to collide, or a combination of these outputs.In our case, as seen in Algorithm 2, the function returns a Boolean variable.Algorithm 2 implements this module using the distance between the person and the robot as the main conflict detector.In the hospital case study, we established a distance threshold of 0.5 meters.This fixed value symbolizes the minimal personal space that individuals generally prefer to maintain between themselves and a robot, as discussed in previous research of proxemics [39][31] [36].Consequently, when a person trespasses within this defined personal space of the robot, it is a reliable indicator of a potential conflict situation.Calculating the distance involves dividing a depth map into bins and selecting the smallest valid value that meets the required number of data points.In our case study, this logic is utilized from the Boston Dynamics' Spot SDK.Yet, it can be easily substituted with alternative algorithms and sensors capable of measuring distance, depending on available hardware and software.This algorithm can be implemented on most robotic platforms that have the ability to detect a person and provide means to estimate when it will collide with a person (e.g., using lidar or depth cameras to measure distance).Many mobile platforms already come with suitable sensors as they are essential for robot navigation.

Conflict Classification
Once a conflict is detected, the second part of NIMBLE determines whether the conflict is intentional.The robot can determine if a person is intentionally initiating a conflict with it by observing the pedestrian's behavior and assessing their awareness of the situation and the robot.Most predominantly, gaze is known to be a significant factor in recognizing a person's intention, especially in navigation contexts.[1,17,44].Literature on this topic provides strong evidence that people will often look in the direction they are walking.As a consequence, their gaze will also be oriented in the direction of their short navigational goal.However, additional signals can be leveraged if the person is distracted (e.g., by a nearby sign or looking at a phone).To make an informed decision concerning a person's intended trajectory, additional signals proven to correlate with the person's walking direction are the velocity of the body center of mass, head orientation, and foot orientation [48].
We highlight that gaze detection on a mobile robot can be challenging due to the limitations of the sensors currently in use.Moreover, detecting intentional conflict is most relevant when a person is in close proximity to the robot, which can further complicate matters as the sensors and algorithms are not optimized for such close ranges.Therefore, in our case study, we assume that an individual's body direction, specifically when turning away from the robot, serves as an indicator of their awareness of its presence and replaces the need to track the person's gaze.To make an informed decision regarding the intentionality of an obstruction, NIMBLE utilizes the function isIntentionalConflict, which consists of a set of predefined rules.It gets  (defined at section 4) as input and outputs a Boolean variable that indicates whether the conflict is intentional or not.This function is used to detect instances where individuals turn away from the robot.The implementation of this logic is dependent on the available data of the detected person.This process enabled us to classify whether a person is intentionally on a collision course with the robot or not with minimum hardware.Of course, this process can be improved given more computational resources or richer inputs.

Intention Recognition
NIMBLE's third and last part is the inference of the blocking individual's intention .This intention inference is achieved by collecting any additional relevant observations available from the robot's sensors (e.g., using uniform detection or RFID tags as an indication of a pedestrian's being part of the staff).In our proposed framework, we utilize a Bayesian Network (BN) to weigh in these pieces of evidence.BNs are known to be a useful probabilistic graphical model for such cases due to the dynamic nature of the network and the ability to infer probabilities with missing information [22].Additionally, BNs can be trained with few samples due to prior knowledge incorporation, or as in our case, acquire them directly from a domain expert with no data collection and no training.
A BN is a graph that represents dependencies (edges) between random variables (nodes).For each variable, a conditional probability distribution (CPD) table is assigned to represent the conditional probabilities for each variable's value, given the values of the variable's direct parent nodes.This tabular model is a compact way to store all the required information for calculating any full joint probability distribution [41].Formally, we define a BN as: Definition 4.1.Let  = ( 1 , ...,   ) be random variables.A Bayesian Network (BN) is a directed acyclic graph (DAG) that specifies a joint distribution over  as a product of local conditional distributions, one for each node: where  1 , . . .,   are the values of the variables   that are the parent nodes of   .Each node's probability distribution depends solely on the values of its parent nodes.In a Conditional Probability Distribution (CPD) representation, we assume that the parent nodes are independent of each other, so we can express the joint probability as the product of individual conditional probabilities: where  Parents( ) represents the values of all   , the parent nodes of   .For example, if  2 and  3 are parents of  1 then we need to calculate  ( 1 =  1 | 2 =  2 ,  3 =  3 ) and this probability can be extracted directly from the CPD of  1 .
Consider our case study of a hospital delivery robot.We constructed our BN from five variables: Four of them (Person Velocity, Staff, Mood, and Arm Position) are some possible properties of the blocking pedestrian that can be observed.The last one is a hidden variable (Intention) which is a value that can be inferred based on the observations.In our case study, all the variables are represented as discrete variables.A process of discretization was applied to two key variables with continuous values: Velocity and Arm-Position.We refrained from employing any algorithm for extracting moodrelated information from images due to the inherent difficulty in recognizing facial expressions.In the case of Velocity, the raw data comprised float values denoting the person's speed in seconds per meter.To facilitate discrete analysis, we set a threshold (0.05) that specifies the transition from slow to fast movement.The threshold value was established after conducting a series of experiments in real-world settings that tested the velocity values outputted when a person moved slowly and quickly.For the Arm-Position variable, we obtained a list of 33 points representing key positions in the 3D skeleton structure.The process of data discretization and establishing the criteria for identifying an offensive arm position followed a general guideline: considering an arm position offensive when the key point corresponding to the wrist was situated above the elbow within a defined confidence range.For each variable, a CPD was assigned.The CPDs were initialized using domain expert knowledge, representing an estimation of the various evidence occurrences within a hospital environment.These initial probabilities were derived from predefined encoded knowledge tailored to the specific problem domain.For instance, we encoded knowledge that individuals displaying fast, purposeful movement toward the robot while raising their hands were more likely to exhibit violent intentions.Staff members were expected to approach the robot primarily for assistance, while casual pedestrians did not have authorized intentions as they did not have authorization privileges.Importantly, as in the previous components, these characterizations are domain-specific and can be enriched with additional complexity.
Figure 3 illustrates the BN utilized in our case study, using the CPDs crafted by a domain expert.The power of BNs as mentioned in Definition 4.1, is the ability to calculate any probability of value combination.For example, the probability that a blocker's intention is authorized, given that they wear a robe (identified as staff), walk fast, have a neutral mood and arm position is notated as: This probability can be calculated as: Generally, to infer the intention of the blocker out of a set of possible intentions ( ∈  ), given a set of evidence ( 1 , . . .,   ) when each piece of evidence is an observed value of one variable, the probability for each intention type is computed using Bayes Rule: ( 1 , . . .,   ) Each probability in this equation can be computed directly from the defined CPDs by summing the probabilities for each variable using the law of total probability.There are several known techniques for computing these probabilities optimally, but in larger CPDs, they can also be estimated [22].In NIMBLE, we use the variable elimination algorithm, which is an exact inference algorithm that efficiently calculates the probabilities mentioned above [50].

Domain Adaptation
Most of our suggested pipeline is domain-agnostic and should work on every mobile robot system with any given sensors, as long as it can provide some minimal person detection capabilities, as mentioned at the end of Section 4.1.To adapt our solution to new domains the only needed changes are the inputs given to Algorithm 1, and the CPD formulation in Section 4.3 for intention recognition.The former adaptation is easier to consider, as it depends solely on the available hardware and software -one should use the resources they have; using better resources is likely to result in a more accurate prediction of intentional obstruction.To begin addressing the latter part of the required adaptation, the construction of the CPD, it is important to identify the potential variables in the specific domain and the relationships between them.After that, the CPDs should be initialized to reflect the probabilities in the real world.This initialization can be achieved by manually setting the CPDs (as performed in this article), or by training the BN using a limited number of samples.Moreover, new intentions can be added to support broader situations of interactions by simply adding relevant variables that affect the probability of such intention and updating the Intention's CPD accordingly.

EXPERIMENTAL SETUP
Our experimental analysis is divided into two distinct parts, each designed to assess specific aspects of our proposed pipeline.In the first part, we undertake a quantitative evaluation aimed at measuring the accuracy of our BN in inferring the intention type.This evaluation involves the utilization of simulated evidence to assess the network's performance and to demonstrate its accuracy, given reliable evidence.In the second part, we perform a case study using a real robot, in an exploratory trial of the obstruction deliberation algorithm.This case study serves as a means to identify and understand critical challenges that may emerge when implementing the proposed capabilities within a physical robotic system and how they affect the overall accuracy.

Intention Recognition Analysis
Our first objective is to assess the performance of the BN in terms of accuracy.In light of a lack of data, particularly in the context of a novel domain (obstruction), we adopted an approach involving the generation of evaluation data guided by a set of predefined rules.We systematically generated evidence configurations for every possible combination of the variables' values within the defined BN.The configurations are JSON files containing all possible combinations of observations.The motivation for creating these configuration files is to encompass all potential evidence scenarios that the robot may encounter in real-world situations.
Each configuration was assigned a corresponding artificial intention based on predefined rules: by default, all intentions were considered neutral (benign) unless a staff member was identified, in which case it was labeled as an authorized intention.Moreover, configurations exhibiting at least two suspicious behaviors, such as high velocity, a negative mood, or a threatening hand gesture, were categorized as having malicious intentions.In total, we created 81 distinct configurations, with 20 of them designated as authorized, 40 as benign, and 21 as malicious.

Case-Study on a Real Robot
This analysis aims to explore how NIMBLE performs in a realworld environment, particularly when interacting with various individuals.Five scenarios were examined, three of which involved the subjects: no conflict (pedestrian is passing from a distance), unintentional conflict (a person is turning away from the robot), authorized obstruction (a staff member that is marked with a fiducial marker), benign obstruction (legitimate passerby), and a malicious obstruction (a passerby that put the robot in danger).
A video demonstrating the five scenarios and the output of the recognition algorithm is attached in the supplementary materials to showcase the ability of NIMBLE to distinguish between the different use cases.
In the context of our case study, we performed all experiments utilizing the Boston Dynamics Spot platform in motion.We adapted the underlying code for sensors data extraction from the Spot SDK, provided by Boston Dynamics.Two of the robot's built-in sensors were used: a grayscale camera and a depth camera.To simplify the data extraction pipeline, we focused exclusively on conducting our experiments using the right front camera, as the Spot platform includes two cameras on its front.It's important to mention that there was a two-second delay between real-world events and the resulting visualization produced by our pipeline primarily due to a delay in transmitting images from the robot.
To assign evidence to a specific person, first the pedestrian's location in the image dimension is extracted through the use of a computer vision model, a TensorFlow-based object detection model called faster-RCNN [38] pre-trained on the COCO object detection dataset [26].The choice in this model was made since it is used in the Boston Dynamics' spot SDK.Additionally, our approach assumes that there is only one pedestrian detected in each frame.We also assume that the same pedestrian is detected over time until there are at least 10 consecutive frames in which no other pedestrian is detected.Those assumptions were made to avoid issues related to tracking multiple individuals, which is beyond the scope of this research and requires monitoring people over time.
We employed various techniques to provide concrete evidence regarding the detected pedestrian's actions.Firstly, we calculated the pedestrian's velocity by measuring the differences in distance between images captured at two consecutive time points, denoted as time  and  +1.Additionally, we leveraged pose estimation by utilizing a media pipe's model to infer the arm position of pedestrians [29].Our assumption was that when a pedestrian's wrist position was above their elbow, it indicated an offensive arm posture.It is important to note that this model's effectiveness is influenced by the presence of significant portions of the pedestrian's head in the captured images, which naturally limits its applicability to specific distances --typically within a range of up to two meters from the robot.We note that some of the observations were collected by means of fiducial markers instead of detection algorithms to avoid confounders that might interfere our study results.For example, to recognize whether the blocker is a staff member, we used a fiducial marker.Full deployment could instead use computer vision for detecting relevant features such as wearing a white coat or reading the RFID on the staff's employee tag.These methods, in addition to NIMBLE and the suggested obstruction deliberation pipeline, can be generalized and extended to any robotic platform capable of providing visual and distance data.None of the models employed were specifically trained on the sensor data of the particular robot in use, leading us to expect that they can be easily adapted to a wide range of sensor configurations.
In the context of our complete pipeline experiments, we provide qualitative assessments of the "off-the-shelf" observation-collection methods utilized with the Spot camera in a user study authorized by our institute's IRB.It is important to note that our primary research focus was on developing an intention recognition system rather than aiming to establish state-of-the-art observation-gathering solutions.Consequently, our emphasis in this evaluation is not on seeking extensive quantitative results of such solutions but rather on exploring the key issues in deploying NIMBLE on a real robot.
In the experiment, participants were tasked with walking toward a Boston Dynamics Spot robot and were instructed to block its path in three distinct scenarios: Benign, Authorized, and Malicious.Some example instructions given to the participants are: "You are walking down a corridor, and you encounter a robot approaching you.Intrigued, you decide to block its path intentionally"; "As a hospital staff member, intentionally block the path of the service robot to seek assistance, " and "Approach the robot to cause harm or attack, expressing your dislike for robots." For authorized obstruction, a fiducial marker was attached to the participant's shirt using tape, symbolizing staff status.To address malicious obstruction, participants were advised to utilize an object as an assistive tool.

Intention Recognition Analysis
As described in Section 5, the model's predictions were evaluated against the generated tagged intentions for each configuration.We report evaluation metrics by calculating individual metrics for each class (Authorized, Benign, and Malicious), alongside an overall metric that averages results across all classes.The results are presented in Table 1.In addition, the confusion matrix illustrated in Figure 4 (left) displays that the majority of confusion in unsuccessful classifications arises between the labels "Malicious" and "Authorized".This result means that the intention recognition part of NIMBLE has a bias towards an authorized intention.However, in the malicious class, there are zero false positives; the network predicts a malicious intent only when the evidence is tagged as such.In other words, in situations where the model predicts malicious intent, we can be confident that this prediction accurately reflects the true situation.
We have further evaluated the degree of agreement between the predicted and true intentions by calculating the kappa measurement [24].Our analysis has yielded a substantial agreement rate of 0.842, indicating a high level of agreement between the predicted set of intentions and the labeled one.

Case Study with a Real Robot
The full evaluation pipeline was executed on the robot in a lab environment, using ten subjects (one female, nine male) as the pedestrians headed toward the robot.The average age of the participants is (29) ±6.6, and all have little to no prior experience working with robots.In total, 47 trials were conducted, though not evenly distributed across scenarios.Of these, 7 trials were discarded either due to poor lighting or failure to meet experimental protocol (e.g., the candidate did not block the robot at all).To maintain coherence with the assessment of our Bayesian model, we computed the confusion matrix based on the experimental outcomes, as depicted in Figure 4 (right).The analysis of the real-world experiment yielded a moderate agreement rate of Kappa = 0.48, indicating a medium level of consensus between the predicted and labeled sets of intentions.Table 2 with the same metrics is also attached with two additional weighted accuracy measures over all the classes.To symbolize the estimated class occurrences in the hospital setting (Real world: 0.5-benign, 0.4-authorized, 0.1 -malicious) and the sample-based distribution in the Bayesian evaluation (Section 5.1) respectively.
While the object detector has demonstrated effectiveness in recognizing individuals from diverse angles and positions, it occasionally encounters outlier cases where it fails to detect a person or misinterprets other objects as humans in the scene.Similarly, the skeleton detector possesses its set of limitations, as detailed in Section 5.2, occasionally yielding inaccurate estimations or insufficient recognition of the skeleton that can impact the accuracy of the detected observations (for instance, erroneously detecting a raised hand).To mitigate these issues, we have diligently applied stabilization techniques outlined in Section 5.2.Despite these challenges, the majority of participants exhibited expected behaviors, such as increased speed and a raised hand, indicative of preparing for a malicious engagement with the robot.Another significant finding is the consistent absence of false positives within the malicious intent class across the experiments.This behavior aligns with the Bayesian model evaluation 4 (left), providing reassurance regarding the lack of fault accusations in identifying malicious encounters.

DISCUSSION
In our experimental evaluation, inner agreement was lower in the physical setup compared to the Bayesian model.This difference is mainly attributed to introducing an additional layer of potential error associated with the precision of evidence collection.In both experiments, NIMBLE consistently demonstrated the behavior of refraining from false positives of malicious detection, a crucial aspect for real-world deployment where we aim for minimal false alarms.Moreover, the high correlation between behavior characterization and participants' actions indicates the model's success.We report some technical challenges in deploying NIMBLE, including hardware limitations such as a restricted camera field of view and a two-second delay.While the model performs well in an ideal setting, the physical trials emphasize the necessity to refine characterization for real-world deployment.As each robot and each context is different, NIMBLE is designed in a modular way to facilitate varying implementations per use case.

CONCLUSION
In this paper, we provided a new perspective on intentional obstruction in a social navigation context.We introduced a formal definition for the concept of obstruction, separating it from other types of collisions.Then, we presented NIMBLE, our intention recognition pipeline, evaluated on a hospital case study.NIMBLE's performance is evaluated both systematically over possible evidence, and in an empirical trial using a Boston Dynamics Spot robot.The results showcase the system's efficacy in identifying various intention types.
Overall, as our study has yielded promising results, we hope that NIMBLE will pave the way to a new perspective in robot system design, where navigation around people blocking the robot's way will encompass more than collision avoidance algorithms.We encourage further investigation of several research avenues, including addressing ethical considerations, handling crowded scenarios, refining hardware capabilities, and enhancing the adaptability of the system to diverse real-world scenarios.

Algorithm 2 :
Conflict Detection -Distance-Based Input : distance between a pedestrian and the robot, TH: distance threshold Output : an indicator if a conflict detected 1 /* where TH is set to 0.5 meters */ 2 if  <=   then

Figure 3 :
Figure 3: the utilized BN used for intention recognition.In the bottom conditional probability distribution (CPD) each column represents the event ⟨, , , ⟩ where  ∈ {, } stands for the value of staff (yes, no),  ∈ { , } value for velocity (fast, slow),  ∈ {, } stands for the value for mood (neutral, upset) and  ∈ {, } stands for the value for arm position (neutral, offensive).

Figure 4 :
Figure 4: The confusion matrices between the model predictions and the labeled configurations (left) and the labeled physical experiments (right).Normalized values are used, given the number of true labels.The corresponding count of true samples is shown beneath each normalized value.

Table 1 :
Bayesian Model Evaluation Results

Table 2 :
Physical Experiment Evaluation Results Overall Experiments.