Conflict Avoidance in Social Navigation—a Survey

A major goal in robotics is to enable intelligent mobile robots to operate smoothly in shared human-robot environments. One of the most fundamental capabilities in service of this goal is competent navigation in this “social” context. As a result, there has been a recent surge of research on social navigation; and especially as it relates to the handling of conflicts between agents during social navigation. These developments introduce a variety of models and algorithms, however as this research area is inherently interdisciplinary, many of the relevant papers are not comparable and there is no shared standard vocabulary. This survey aims at bridging this gap by introducing such a common language, using it to survey existing work, and highlighting open problems. It starts by defining the boundaries of this survey to a limited, yet highly common type of social navigation—conflict avoidance. Within this proposed scope, this survey introduces a detailed taxonomy of the conflict avoidance components. This survey then maps existing work into this taxonomy, while discussing papers using its framing. Finally, this article proposes some future research directions and open problems that are currently on the frontier of social navigation to aid ongoing and future research.


INTRODUCTION
Enabling autonomous robots to navigate in the presence of people and/or other robots has been studied for the past 70 years.One of the first examples of social navigation is Grey Walter's work, who built robotic "turtles" that could navigate on their own [161].These robots, named Elmer and Elsie, were an exercise in minimalism and demonstrated that a small number of brain cells could give rise to complex behaviors.They each consisted of "two miniature radio tubes, two sense organs, one for light and the other for touch, and two effectors or motors, one for crawling and the other for steering".Their power supply was a hearing-aid battery.Nevertheless, these robots could navigate freely in an enclosed space and change their trajectory in response to light and touch.
Modern mobile robots are much more sophisticated and complex.Most feature a variety of sensors, intricate steering systems, and several layers of hardware and software to control their movement.Despite these improvements, mobile robots are still not prevalent in our homes and offices.One of the main reasons for this deficit is that comprehensive autonomy is still achievable only in controlled environments and is usually induced by hard-coded rules or learned from a relatively clean dataset [14,61,135].The problem of navigation in the presence of other robots and humans is complex and cross disciplinary in nature.Solutions draw from robotics, artificial intelligence, engineering, psychology, biology, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.Request permissions from permissions@acm.org.and other areas of study.As such, each of these communities has defined social navigation differently.In the multirobot community [159]  1 , social navigation usually refers to robot navigation in the presence of additional robots.In human-robot interaction (HRI), social navigation refers strictly to the task of navigating in a shared space with people.Rios-Martinez et al. [126] gave a compact description of socially-aware navigation: Socially-aware navigation is the strategy exhibited by a social robot which identifies and follows social conventions (in terms of management of space) in order to preserve a comfortable interaction with humans.The resulting behavior is predictable, adaptable, and easily understood by humans.This definition implies, from the robot's point of view, that humans are no longer perceived only as dynamic obstacles but also as social entities.
In the general social navigation setting, a social agent is an agent (either human or robot) that is aware of the objectives of others (human or robot) and considers them in its behavior, either by adjusting its policy or by indicating why it chose a potentially "anti-social" behavior.This general definition is quite broad, encompassing a wide variety of multi-agent navigation scenarios, including those that involve only robots.In practice, the term "social navigation" usually refers to a more human-centric perspective.Thus, this survey focuses on three requirements that separate human-centric social navigation from more general social navigation.These requirements are: (1) There exists an autonomously navigating agent.The agent has a specific, reachable navigational goal.
(2) There exists one (or more) humans or animals in the environment.
(3) The interaction takes place in the real world (either a controlled or natural environment), not in simulation.
Many papers have discussed challenges that occur when only one or two of these requirements are met.Teleoperation of robots is widely investigated within HRI, but it is not consistent with (1).The multi-agent systems (MAS) and distributed planning communities focus on constructing algorithms for multi-robot navigation, which do not meet requirement (2).Even within the HRI community, many works describe progress in social navigation in simulations rather than in real world environments, so that requirement (3) does not hold.Significant work has been done in the graphics community to model crowds and swarms, but these works also do not meet requirement (3).Our main focus is on papers that meet all three requirements.This survey also cites some papers for which not all of the above requirements hold, due to their contributions to our understanding of social navigation.In cases where the underlying scope of a paper is not fully aligned with this survey, we indicate the requirements that do hold on the first occasion that the paper is referenced.For example: Walter [161] 1, 3 is a work in which there is an autonomous agent (R1) -a mechanical "turtle" that navigates in the wild (R3) -but no human pedestrians are present (R2).
Even within the context of the three requirements discussed above, there are many behaviors that could be considered "social": following, giving navigational instructions, waiting in line, and others; as discussed later in this section.
To limit the scope of this survey, we focus on one specific type of social interaction with people which requires the robot to reason about an encounter, specifically conflicts.A conflict is a short-term interaction between a robot and a human in which there is a chance that the robot and the human will collide.Note that this potential event can be objective, meaning that if no party changes its course they will collide; or it could be that the passing of the robot is perceived as being on a collision course by the human.Additionally, not all interactions in social navigation are conflict avoidance.For example when a robot is designed to carry a person's luggage and follow them, the task is a social navigation task in which the robot needs to detect the person, reason about the proper distance from them, and drive at a safe and comfortable speed.These challenges, however, are orthogonal to the challenge of avoiding conflicts with other pedestrians.Understanding conflicts in social navigation requires a definition of what a conflict is in this context: A conflict between a robot and other mobile robots or pedestrians is a situation in which if there is no change of direction or a change in speed by at least one of the parties, they will collide.
By this definition, not all conflicts end in a physical collision, but every collision is preceded by a conflict.Moreover, as the interacting parties can falsely predict an upcoming collision (e.g. a human feels that the robot will come too close and is risking a collision), the presence of a conflict is a subjective matter that depends on the interpretation of the interacting parties.This survey is not the first to identify navigational conflicts as being separate from collisions.
A footnote from Van Den Berg et al. [157] implies a difference between reasoning about conflicts in motion planning and avoiding collisions: Note that the problem of (local) collision-avoidance differs from motion planning, where the global environment of the robot is considered to be known and a complete path towards a goal configuration is planned at once, and collision detection, which simply determines if two geometric objects intersect or not.
However, in Van Den Berg et al. [157] they do not elaborate on this idea.Based on this scope, the contributions of this survey are as follows.
(1) It surveys work in which the authors include conflict avoidance in their models.
(2) It introduces a taxonomy of the attributes that vary between models and algorithms for conflict avoidance.
(3) Based on this taxonomy, it identifies the attributes of existing works and categorizes these works into tables.
(4) It summarizes the current state of the art in conflict avoidance in social navigation, including a practical checklist to follow when introducing a new contribution the body of literature.
Previous works have presented ideas that overlap those in this survey, but from different perspectives.There are surveys on topics relating to social robotics [37] 1, 2 ; and to numerous related navigation topics such as: path planning [11] 1 , vision for navigation [8,27] 1 , perception and semantics [41] and localization and mapping [24,39, 132] 1, 3 .
There are also many surveys on social navigation that focus on elements such: as joint or group navigation [67,99,121,165], giving navigational instructions [149,166] 1 , detecting dynamic objects [31,71] 1 , social contexts such as waiting in line [104] 1, 2 or distributing flyers [131] 1, 2 , and other factors which are not discussed in this survey [119].None of these surveys, however, focus specifically on assisting in detecting or avoiding conflicts.Here we provide details on the major related surveys, both to provide a reference for readers who are interested in those different points of view and to define the scope of this survey.

Kruse et al. [78]
1, 2 highlight a rising interest in the topic of social navigation since 2000, and identify specific tasks and challenges that social navigation encompasses.Interest is still on the rise, meaning that there are many new works on this toplic; requiring this survey to narrow its focus somewhat as we update their coverage of the topic.Our focus is on the narrower topic of conflicts that arise between robots and pedestrians.Hoogendoorn and Bovy [56] 2, 3 introduced a three-tiered model of navigation utility, decomposing it into strategic (high-level decision making), tactical (global navigation), and operational (local navigation and event handling) levels.This survey focuses mostly on the operational level: setting local goals and re-planning as needed.Recently, Gao and Huang [40] provided a review of scenarios, datasets, and methods used in social navigation.They described the main use-cases as: passing, crossing, overtaking, approaching, following, leading, accompanying, and combinations thereof.Our survey's perspective is different in that it does not categorize papers according to the aim of the navigating parties, but rather according to situations in which these parties are (or will be) in conflict.In this sense, Gao and Huang [40] review a wider set of social navigation tasks, though they do not propose a taxonomy of conflicts as introduced in this survey.
Charalampous et al. [18] present a survey in which they aim "to systemize the recent literature by describing the required levels of robot perception, focusing on methods related to a robot's social awareness, the availability of datasets these methods can be compared with, as well as issues that remain open and need to be confronted when robots operate in close proximity with humans." This survey extends their initial discussion on robot design for operation in close proximity to humans; or as we refer to it, robots in conflict situations.Specifically, we aim to provide basic definitions to be used to standardize future works on the problem of robots that navigate in close proximity to people.López et al.
[83] 1, 3 provide a survey on turn prediction and how upper body kinematics can signal upcoming turns.In their survey, they identified that Gaze Yaw is the earliest predictor of walking turns; but that existing data do not support quantifying how much -or how reliably -timing and distance can be anticipated.They found, however, that Head Yaw was the most reliable kinematic variable for predicting walking turns about 200ms from commencing to turn.
Their survey can inform the design of conflict resolution by enabling the robot to predict upcoming turns using these signals.Another recent survey focuses on algorithmic requirements and methodologies for robot navigation [97].Their survey revolves mostly around perception and trajectory modeling rather than actuation.While the authors mention collision avoidance as an important robot navigation task; they do not focus their survey around collision avoidance, as presented here.
The survey by Xiao et al. [163] reviews methods that use machine learning techniques for the general problem of mobile robot navigation.Their survey focuses on the comparison between machine learning and classical approaches in terms of their scope and performance on real-world navigation problems.In contrast, this survey is on social navigation, with focusing specifically on conflict avoidance, and the papers may use any (learning or non-learning) method in approaching the problem.For a more general perspective on the current state of social navigation, Mavrogiannis et al. [91] identified three broad themes that are being investigated: planning, behavior design, and evaluation.These themes impact all social navigation tasks rather than being specific to conflict avoidance, and thus their discussion does not focus on this aspect.This survey is more specific to the context of collision avoidance in social navigation, and it drills down to provide an elaborate taxonomy of models and algorithms for such scenarios.
The remainder of this survey is organized as follows: Section 2 proposes a taxonomy for social navigation, identifying important factors of the social navigation problem.Sections 3 and 4 present a selection of relevant works that have contributed models and algorithms, respectively.Section 5 focuses on the evaluation metrics used in social navigation and refers to some existing benchmarks.Finally, Section 6 highlights open problems in social navigation with respect to the proposed taxonomy and provides a checklist for researchers to consult when investigating a new social navigation problem.

TAXONOMY
This section systematically describes a taxonomy used in this survey to categorize social navigation models (Section 3) and algorithms (Section 4).Here we describe the process used to collect the papers used in this survey.We started with existing surveys on social navigation [17,78] and we collected all of their references, as well as papers that cite these works using Google Scholar.In selecting which papers to include, we used the criteria specified in Section 1 to guide the process.Overall, this survey contains 54 (out of 166) citations that do not meet all three criteria outlined in Section 1, but which nonetheless provide fundamental contributions to out understanding of the social navigation problem; or which are surveys on topics relevant to social navigation.We iterated through the process of collecting papers that cite, and are cited by, our current bibliography, until doing so yielded no new papers meeting all of the outlined requirements.The only exception to this process is when several papers have been published by the same group.Research groups often publish multiple papers on the same project.In these cases we include more than one paper if they are categorized differently by our taxonomy.Otherwise, we include only the most recent paper.Figure 1 summarizes the paper selection process for this survey.
For each of the resulting 112 papers on conflict avoidance in social navigation, we identify seven attributes, listed in Table 1.Below we discuss this list of attributes (in bold) and the values (in italics) they can take.(Abbreviations for many values are used in tables in Sections 3 and 4.These abbreviations appear in parentheses next to their corresponding value.).We acknowledge that not all papers can be situated precisely within this taxonomy.In these cases, or if the value is not stated in the relevant paper, we label the corresponding attribute with the value "None" or "Neither" (e.g.some of the papers do not provide any empirical analysis, and thus the experiment type attribute is "None").This taxonomy is constructed with the goal of encompassing as much work as possible, such that any new contribution can be easily placed in a clear context.

Taxonomy A ributes and Values
Some of the attributes and the values presented here are not intuitive.Here, we explain their rationale.

Number of Agents
Absolute Number (Abs) / Density (D).Some papers deal with a one-on-one interaction whereas others deal with multiple agents in a shared space.We mention, when known, how crowded the environment is.Most works report either an Absolute Number of participants or a Density (measured as # / 2 ).When presenting an absolute number of pedestrians, we include the navigating robot in the count.This allows comparison with multi-robot research where the number of agents includes multiple robots that are running the same algorithm.[93] or LM-SARL [19]).We mention the specific motion control that is used when possible.

Observability
Communication None (N) / Indirect (I) / Direct (D).This attribute refers to communication that is conveyed by the robot, and not to communication that is conveyed by the other agents.None means that the robot is not doing anything specifically to convey its navigational goal.Indirect communication refers to situations where the robot uses whatever mechanisms it already possesses to signal its intentions, such as legibility [30] 1 and stigmergy [7] 2 .Direct communication means that there is some mechanism that is added to the robot to allow communication.See Figure 2 for examples.
Experiment Type Simulation (Sim) / Laboratory (Lab) / In the Wild (ItW) / Survey (Sur).Many researchers run experiments in Simulation as part of their evaluation, either as the only type of evaluation or in addition to real-world experiments.Laboratory experiments are defined as experiments in the real world in a controlled environment such as a laboratory or using a scripted scenario.In the Wild are real world experiments in an unstructured environment or with no predefined script for the pedestrians.All of these types of experiments can be accompanied by post-interaction Surveys.When a paper reports on more than one type of experiment, we include the details of one experiment, ordered in this prioritized order: In the Wild, Laboratory, Simulation, Survey (when more than one methodology used).There are two exceptions to this policy: the first exception regards surveys, which are often used as an additional metric for an experiment in the wild or in the laboratory.Thus, if an experiment is accompanied by a survey, the survey is also mentioned.The second exception regards papers that report two or more experiment types, where one of them is a small-scale in the wild experiment that does not report significant results.In such cases, we report the paper according to the experiment with reported results, but add a superscript + symbol next to it to indicate that the paper also includes an in the wild experiment (e.g.Lab + ).
Agent Type Human-Robot (H-R), Human-Agent (H-A), Human-Human (H-H), Robot-Robot (R-R), Homogeneous Agents (Hom), Heterogeneous Agents (Het).This survey focuses on social navigation involving a person and a robot (Human-Robot).Due to the difficulty of evaluating such interactions, many models and algorithms are evaluated on a different set of agents.The most common approaches are running a simulation in which the human counterparts are controlled by a real human (Human-Agent) or by some other set of predefined or learned behaviors (either Homogeneous Agents or Heterogeneous Agents).Several included papers provide a fundamental understanding of human navigation and present evaluations that do not involve robots at all (Human-Human) or that do (2) virtual gaze [50], (3) sensor rotation [35], (4) arrow signaling [133].
not involve humans (Robot-Robot).These papers are cited using the notation presented earlier (e.g.citation 1 ), highlighting that they do not satisfy one or more of the inclusion criteria.
Observability is important to consider, especially when discussing simulations.Simulations explicitly model the observations that can be made by agents acting in a scene.Many simulations assume that a robot (or pedestrians around it) has full (ground truth) observability.Other simulations restrict observability in artificial ways, attempting to emulate realistic sensing capabilities (partial observability).In the discussion of these papers, it is important to note that some observation modalities may be unrealistic to implement on real robots.The conclusions of such papers may not translate to the context of real-world embodied social navigation.
With respect to the Communication attribute, we make a distinction between communication that is indirect or direct and communication that is implicit or explicit.Implicit communication is often used to describe any non-verbal communication that is conveyed by people (e.g. the interpretation of eye gaze is implicit), and explicit communication is performed specifically with the intention of communicating with others (e.g.speech is explicit) [25] 2, 3 .Robots do not generally naturally communicate implicitly (for example, not all robots have "eyes" and those that do do not necessarily need to turn them to "look" at something, or reflexively turn them to where they are about to navigate).As such, we make the distinction between direct and indirect communication as defined above, and keep the implicit/explicit distinction as one reflective of mimicking human behavior.Using these definitions, the possible combinations for robot We also highlight that some of the taxonomy attributes are very concrete and define low-level components used in the interaction (e.g. the motion control used), while other attributes are more abstract (e.g.robot role).Usually, the abstract attributes and their values depend on the concrete attributes.Figure 3 presents the hierarchical structure of these attributes, in work that is consistent with the three requirements outlined in Section 1.The bottom part represents the attributes that are independent of other attributes.The values assigned to the attribute at the end of an edge affect the values that can be assigned at its origin.For example, the values of the Communication attribute will be directly affected by the Number of Agents in the environment and the robot's Observability.In turn, the choice of value for the Communication attribute directly affects the Agent Types that can perceive the chosen communication channel.

Additional Concepts
There are some additional concepts that are worth mentioning, but which we decided to exclude from our taxonomy.
Here we list these concepts and explain why they are not included in the taxonomy.As research and discussion on social navigation progresses, this taxonomy could be extended to include these attributes.
One seemingly-important factor to consider in the taxonomy is collision type.When referring to collisions, most papers describe head-on collisions or side-on collisions, with rear-end collisions as the least commonly investigated type.Among the papers in this survey, none explicitly discuss only one type of collision.There are several papers that propose ways to categorize collisions according to the required response from pedestrians and / or the robot.Reynolds [124] defines two types of collisions: unaligned collision avoidance and separation.Unaligned collision avoidance is a behavior that "tries to keep characters which are moving in arbitrary directions from running into each other [124]." Separation is similar to a rear-end collision and refers to a simpler form of movement: "Separation steering behavior gives a character the ability to maintain a certain separation distance from others nearby [124]." Mavrogiannis et al. [93] discuss the point in space and time where agents collide, calling this point "entanglement." This concept raises an additional question about the concrete implementation of this collision point -what is considered close enough to be an entanglement in a social context?For example, Mavrogiannis et al. [92] utilized a minimum distance of ≤ 1 meter between the robot and the human.While it is simple to classify the direction of a collision, it is more challenging to define properly the minimal requirements of an encounter to be considered a collision.Is entering a person's personal space a collision?Is brushing against their leg?Overall, the definition for collision varies between researchers and may be a parameter that can be adjusted.
Another common discussion point is context awareness and semantic mapping.Many papers discuss the need for mobile social robots to be aware of their context [18].A leading approach to enable this is semantic mapping, where the robot constructs maps that represent not only a metric occupancy grid but also other properties of the environment [74] 1, 3 .This survey does not focus on the mental model of the navigating robot (or of the other agents) in the environment, so this is not included in the taxonomy.It is, however, an important factor to consider when designing a robot for social navigation, as context awareness could greatly influence a robot's behavior.
Another thing to consider when designing interactions between mobile robots and pedestrians is how people react to humans vs. robotic counterparts.Will humans interacting with another people produce a similar or different responses when interacting with robots?The assumption that people will behave in the same way when encountering a robot as they would another human is common in HRI and other research communities, though it is not unanimously agreed upon.In their survey on proxemics for social navigation, Rios-Martinez et al. [126] stated that "This article starts from the idea that people will keep the same conventions of social space management when they interact with robots than when they interact with humans.Researchers in social robotics that believe in that hypothesis can rely on the rich sociological literature to propose innovative models of social robots." As a counter opinion, Butler and Agah [10] indicate that people are most comfortable when a robot moves at speeds that are between 0.254 / and 0.381 / , while the normal walking speed for young human is about 1 / .This difference suggests that people prefer a robot that moves more slowly than people do.Until there is a clear theory regarding the reactions of people to other people vs. robots in social navigation -and until that theory is tested -it is reasonable to exclude assumptions regarding whether people react to robots similarly or differently from how they react to other people from this taxonomy.
The distinction between social cues and social signals [160] 2, 3 is used in this survey, but they are not included as attributes in the taxonomy.Cues are the low-level inputs that the robot can receive or send, such as gaze, position, language, etc. Signals, on the other hand, are emotions, personality, and other traits that are more high-level.Signals discussed in the context of social navigation usually serve a purpose in conflict avoidance, and the way to implement them in a robot (or detect them in a human) is through social cues.How a robot can best communicate with humans is a rich and versatile research area; and is taken into consideration through the attributes observability and communication in the taxonomy.
One attribute that is relevant in a broader context than social navigation is focused vs. unfocused interaction.
Goffman [45] 2, 3 defines these terms to categorize scenarios in which the robot and the human share their focus (shared attention) vs. scenarios in which the robot and the human share an environment, but not attention.Rios-Martinez et al. [126] use this attribute to identify different types of navigational behaviors in robots: minimizing probability of encounter, avoiding collisions, passing people, staying in line, approaching humans, following people, and walking side-by-side.Because the papers in this survey revolve around conflicts, the robot and the human do not share focus, and hence all included papers involve strictly unfocused interactions.Focused vs. unfocused interaction are not considered as part of the taxonomy.
Additionally, the topic of differences in navigation between independent pedestrians, groups, and crowds has enjoyed recent popularity [47,101,165].Most social navigation papers either consider interactions with a single individual or with a crowd of individuals (as defined as Number of Agents in our taxonomy).An early sociological study showed that people tend to move in small groups rather than alone, but that group size distribution depends greatly on context (a casual Saturday afternoon stroll vs. a workday morning commute) [23] 2, 3 .Recent research has demonstrated that in many contexts, more than 50% of pedestrians are traveling in groups [99] 2, 3 .Thus, the context in which navigation takes place determines whether it is necessary to consider the surrounding crowd.
Lastly, we address a distinction that is relatively straightforward to understand intuitively but is challenging to formalize: Conflict Prevention vs. Resolution.Consider a person walking in a crowded environment who is looking at their phone.Without watching the surrounding crowd, two people might collide -which means they have reached a conflict.If the person looks up early enough, they might side-step abruptly without a change of speed -which means that a conflict was resolved.If the person decides to step away to a less crowded area, this behavior is prevention.On one hand, it is clear that prevention and resolution are different tasks that can direct the robot's behavior: prevention is the task of designing the robot's motion to steer away from potential conflicts, while resolution is the task of altering the robot's motion and behavior when a conflict is already imminent.On the other hand, formalizing this distinction is challenging, as it is non-trivial to define what is an "imminent" conflict.Whether a robot is designed to prevent or resolve a conflict, the premise of all of the covered work in this paper is that the robot is always attempting to avoid conflicts in social navigation.This requirement provides a crisp way to identify relevant papers that fit into this survey, without the need to explicitly cluster the interaction into prevention or resolution.

MODELS
This section details various models used for social navigation.The discussion is grouped according to three main underlying models: Multiagent systems, human-inspired models, and physics-based models (specifically, the social force model and other force models).Each of these categories represents a different set of assumptions -as well as a different research community -that each model stems from.Navigation in multiagent systems is usually designed with the premise that agents navigating in an environment are homogeneous.These papers include multi-robot navigation models and crowd modeling.A multiagent social navigation model generally reasons about agents with different -sometimes unknown -behaviors.Other models are inspired by insights about human navigation.These papers provide measurements and rules that explain how people navigate among each other.Such a social navigation model translates these rules into robot motion and perception.We taxonomize papers using the social force model in their own category; inspired by physical force modeling.Many models have been proposed which build upon the seminal work by Helbing and Molnar [53] 2, 3 , using additional forces.Finally, some papers sit at an intersection between two categories.In such cases, the paper is grouped with work that uses similar motion control.

Multiagent Systems
Two communities that have contributed significantly to the study of social navigation are the multi-robot navigation and graphics communities.Both of these communities have proposed different approaches to model the behavior of a crowd.The multi-robot community focuses more on safety and feasibility in the real-world, while the graphics community focuses on robustness.Multi-robot work usually is based on a few interactions under realistic constraints.On the other hand, the challenge of crowd modeling taken on by the graphics community is to model interactions between hundreds and thousands of agents simultaneously.However, because the graphics community does not need to implement these systems on real robots, the perception and movement restrictions on those agents tends not to be grounded in the physical constraints that both robots and real people must contend with.
Many researchers have approached the challenge of multi-robot navigation [164] 1, 3 .This is a fertile and active research area that deserves its own survey.We discuss only a few selected publications that have had significant influence on social navigation.Van Den Berg et al. [157] present the principle of optimal reciprocal collision avoidance (ORCA), which provides a sufficient condition for multiple robots to avoid collisions among one another, and guarantees collision-free navigation.Chen et al. [19] model human-robot and human-human interactions, then infer the relative importance of learned features through a pooling module via a self-attention mechanism, finally planning motions.
Another branch of multi-robot research focuses on planning under uncertainty, and leverages Markov Decision Processes (MDPs).Foka and Trahanias [36] model a probabilistic prediction of people's destinations.They use a Partially Observable MDP (POMDP) solved online at each time step to determine which actions the robot actually performs.
Gupta et al. [47] recently presented an additional POMDP model for intention-aware navigation in crowds, where the model can address decisions related both to the robot's speed and its heading.Bandyopadhyay et al. [3] model human intention with a Mixed Observability MDP (MOMDP), then plan the motion of a robot leveraging this model.
The graphics community has contributed several important models to social navigation, as well as simulation environments that can be utilized to evaluate other models and algorithms (see more about these simulation environments in Section 5).Musse and Thalmann [103] propose a model of crowd behavior, where agent behavior is determined using a predefined set of rules.Strassner and Langer [142] use behavioral rules for modeling each person's behavior in a crowd.Such behaviors include perceiving, storing, and forgetting knowledge.Bonneaud and Warren [9] model pedestrian behavior using an empirically-grounded emergent approach, where the local control laws for locomotor behavior are derived experimentally, and the global crowd behavior is emergent.Okal and Arras [111] present a model for crowd behavior in which groups are formed.Their representation gives each individual an internal state, where under a set of predefined conditions pedestrians can choose to walk together.

Psychology and Human-Inspired Models
The contributions discussed so far have focused on multiagent or multi-robot navigation systems that have been adapted to accommodate human pedestrians.A different approach starts with the modeling of human behavior, which then leverages these models for improving robot navigation.Cutting et al. [26] 2, 3 empirically evaluates human behavior in situations of obstacle avoidance.Their work investigates the relationship between object avoidance and finding one's aimpoint in a series of human studies.Their results are summarized as a decision-tree to facilitate reasoning about collision detection with other objects (static or moving) and Gaze-Movement Angle (GMA); the angle between one's gaze and one's direction of movement.Their model can be used to estimate where a collision might occur.As a different way to estimate an expected collision point, Carel [12] defined to be the time to bypass a dynamic obstacle (human or not).Moussaïd et al. [98] use to heuristically plan how to navigate in a way that avoids collisions.Park et al. [116] claim that GMA-based collision prediction has several advantages over the time-to-contact ( ) approach.It is more robust to variations in the speed and the path of the other pedestrian.It also does not assume either constant speed or a linear path, so the accuracy of the prediction is not affected by these variations.Kitazawa and Fujiyama [73] investigate the Information Process Space (IPS) of a navigating person when walking in a hallway in the presence of static objects and other pedestrians.In this work, they identify the area that the observing pedestrian considers as the one in which a collision with another pedestrian could occur in a short time (see Figure 4).In an extension of this work, Park et al. [116] propose a collision avoidance behavior model that is based on their empirical results about IPS to generate more human-like collision avoidance behaviors.
Another concept from psychology that has had a significant impact on social navigation is that of personal space [42,49,57].While the original formulation of personal space is depicted by Hall [49] 2, 3 as a concentric circle, later work extends that to an egg [51] 2, 3 , ellipse [53], or as asymmetrical (smaller on the dominant side) [42] 2, 3 shape.
Closely related to personal space is the concept of density in crowds.The average density of people in a non-crowded environment has been evaluated to be 0.03 pedestrians per 2 , whereas in a moderately crowded environment, there are 0.25 pedestrians per 2 [99].Rios-Martinez et al. [125] incorporate both personal space and IPS-based constraints into an adaptive optimization algorithm to enable more human-like navigation.Truong and Ngo [154] propose a comprehensive framework that reasons about pedestrians' extended personal space and social interaction space to identify a Dynamic Social Zone (DSZ); a concept which is incorporated into their motion planner.
Others Table 3 summarizes the taxonomy values for models inspired by human behavior, physiology, and psychology research.

Physics-based Models
Researchers have also used models inspired by physics to represent dynamics and interactions between different moving agents.Helbing and Molnar [53] were the first to propose the Social Force Model (SFM), a model inspired by fluid dynamics that describes an agent's motion using a set of repelling and attracting forces.They evaluate this model in a simulation of homogeneous SFM-based agents.Many contributions extend SFM models to handle additional forces: Karamouzas et al. [64] add an evasive force that uses collision prediction and avoidance, which makes agents more proactive and anticipatory than the classical SFM.Moussaïd et al. [99] propose several group-related forces that help Fig. 4. Information Process Space -the visual processing coverage of pedestrians, as measured by [73] and depicted by Rios-Martinez et al. [126].

ALGORITHMS
This section discusses contributions in the form of algorithms and hardware augmentations that enhance social navigation.Most of the work presented here fits our basic definition of social navigation, however several papers are included which have not been evaluated in the context of navigating around people.These papers are included if their contribution can be applied in the context of social navigation.Broadly speaking, this section is divided into three main approaches: Approaches that infer the human's trajectory and adapt to it; Approaches that convey the goal or trajectory of the robot to the person it is interacting with before reaching a conflict; and mixed approaches which mediate between the inferred trajectory of the human and the desired goal of the robot.

Inferring Human Trajectories
Many social navigation contributions have been inspired by the way humans navigate in social contexts.The majority of these papers can be split into two categories: online and offline inference.Online inference means that a robot observes the behavior of a person during deployment and incorporates its inference about the person's planned trajectory into its execution.Offline inference happens prior to the execution stage, usually on more than a single trajectory.
The robot learns to predict human trajectories or imitate them from a set of observed trajectories.
4.1.1Online Inference.Cutting et al. [26] offer an early attempt to evaluate the trajectory of a passerby by calculating their GMA and reacting to it.The robot designed by Tamura et al. [148] detects pedestrians by using a laser range finder and tracks using a Kalman filter.They apply a social force model to the observed trajectory to determine whether the pedestrian intends to avoid a collision with the robot or not, and select an appropriate behavior based on the estimation result.Gockley et al. [43] discuss how to avoid rear-end collisions in the context of person following.They propose a laser-based person-tracking method and evaluate two different approaches to person-following: direction-following, where the robot follows the current location of the person; and path-following, where the robot tries to follow the exact path that the person took.They show that while no significant difference was found between the two approaches in terms of the distance or time between tracking errors, participants rated the robot's behavior as significantly more natural and human-like in the direction-following condition.In addition, participants felt that the direction-following robot's behavior is more similar to the participants' expectations.
Others have leveraged human gaze to infer the trajectory of pedestrians.Gaze is a very strong communicative cue used by humans, in the context of collaborative settings in general [15] [137] and as a cue for interacting with copilot systems in cars [58,59], also with the aim of inferring the driver's intended trajectory.Gaze is also often fixated on objects being manipulated, which can be leveraged to improve algorithms which learn from human demonstrations  Henry et al. [54] use Inverse Reinforcement Learning to learn motion patterns of humans in simulation that can later be used for planning in social navigation.
An alternative approach to IRL with a similar objective is to model social navigation trajectories using a Maximum Entropy Probability Distribution, where cost is also implicitly defined by identifying an underlying model from demonstrated data.Maximum entropy probability distribution has been used by Pfeiffer et al. [118] to model agents' trajectories for planning and by Kretzschmar et al. [75] to infer the parameters of the navigation model that matches the observed behavior in expectation.Kuderer et al. [79] also use human demonstrations, but instead of using a Markov Decision Process, they elicit features from the human trajectories, and then use entropy maximization to determine the robot's behavior.Luber et al. [87] use unsupervised learning from surveillance data to learn motion patterns and augment a motion planner with this knowledge.
Sisbot et al. [138] create a human aware motion planner (HAMP) that is explicitly given a cost model for safety and for legibility, and the robot reasons about the joint cost of these two properties in its planning process.Costs were also implicitly defined by identifying an underlying model from demonstrated data.Kirby et al. [70] model human social conventions at the global planning stage.This enables it to mediate between different, sometimes conflicting objectives.For example, consider a goal that is down an intersecting hallway to the robot's left.While the social norm in many places is to pass a pedestrian from the right side, the robot may choose to walk across the hallway in front of an oncoming person, effectively passing them on the left of the corridor.This behavior is the result of mediating between two objectives: complying with the right-alignment social norm, and minimizing the time to the goal.
Many algorithms use hand-crafted behaviors to avoid conflicts, i.e. to realize collision avoidance.As a continuation of previous Collision Avoidance Deep Reinforcement Learning (CADRL) work [21], Chen et al. [20] further propose a hand-crafted reward function to incorporate the social norm of left or right-handed passing in a DRL approach and enabled a physical robot to move at human walking speed in an environment with many pedestrians, called Socially Aware CADRL (SA-CADRL).Along the same line of research, but to relax the assumption of other agents' dynamics, Everett et al. [32] propose GA3C-CADRL, using an LSTM to allow reasoning about an arbitrary number of nearby agents and GPU to maximize the number of training experiences.Similarly, the reward function by Jin et al. [60] accounts for ego-safety, to measure collision from the robot's perspective, and social-safety, to measure the impact of the robot's actions on surrounding pedestrians.Other options that utilize DRL include using a Hidden Markov Model (HMM) in a higher hierarchy to learn to choose between target pursuing and collision avoidance using RL [29].Tai et al. [147] use Generative Adversarial Imitation Learning (GAIL) to learn continuous actions and desired force toward the target.This improved safety and efficiency over pure BC.Li et al. [80] propose a new problem: socially concomitant navigation (SCN).In addition to collision avoidance in traditional social navigation, in SCN the robot also needs to consider the motion of its companion so as to maintain a sense of affinity when they are traveling together towards a certain goal.Taking features extracted from a LiDAR sensor along with the goal as input, a navigation policy is trained by Trust Region Policy Optimization (TRPO) to output continuous velocity commands for navigation.Bera et al. [6] create SocioSense, a social navigation algorithm that categorizes pedestrians according to psychological traits (e.g.shy, tense) and adjusts the robot's velocity according to the pedestrians around it.Lu et al. [86] incorporated a dynamic measure into their reward to reason about the density of the crowd when deciding on the distance from other pedestrians.They then extended the deep neural network architecture from SARL [19] to choose the optimal action with the shaped reward that reasons about the "uncomfortable distance" between the robot and a pedestrian.
To observe social rules when navigating in densely populated environments, Yao et al. [165] propose to utilize information about social groups to address the "naturalness" aspect from the perspective of collective formation behaviors in the complex real world.They used a deep neural network, called Group-Navi GAN, to track social groups and navigate the robot to join the flow of a social group through providing a local goal to the local planner.Other components of the existing navigation pipeline, e.g.state estimation, collision avoidance, etc., remained the same.The classical navigation pipeline, with the assistance of a learned local goal, was capable of navigating safely in a densely populated area following crowd flows to reach the goal.Liang et al. [81] develop CrowdSteer, a RL-based collision-avoidance algorithm that navigates in dense and crowded environments.The algorithm is trained using PPO in simulation with simulated human agents, and was deployed in the real-world.Martins et al. [90] propose ClusterNav, an algorithm that gets human demonstrations using teleoperation, then uses Expectation Maximization to learn how to navigate in an unsupervised manner.Their approach cannot reason about dynamic obstacles, hence it is unable to reason about interactions with people during navigation, so it does not appear in our tables.
Table 5 summarizes the taxonomy values for the inference algorithms for social navigation discussed in this subsection.Dragan et al. [30] formally define the concepts of legibility (motion that allows the observer to confidently infer the correct goal) and predictability (motion that conforms with the observer's expectations) in robot navigation.They show that human-robot collaboration is affected by the way the robot plans its motion, and to perform better, the robot design should switch from a focus on predictability to a focus on legibility.This section presents several approaches to show that cues associated with the robot's proxemic behavior were found to significantly affect participant perceptions of the robot's social presence while cues associated with the robot's gaze behavior were not found to be significant.

Conveying the Robot's Goal to the Human
However, Fernandez et al. [33] show that people are able to adapt to LED-based cues after watching a demonstration of its use, and May et al. [94] present a robot that was able to convey its intention using a mechanical signal but not using a gaze cue.Hart et al. [50] challenge these previous results by providing a different naturalistic gaze cue using a virtual agent head which is added to a mobile robot platform, and compared its performance against a similar robot with an LED turn signal.The results of this work suggest that people are able to perceive the naturalistic gaze cue and react to it.These conflicting results can be attributed to the vast differences in signal implementation between the different experiments.
Table 6 summarizes the taxonomy values for algorithms that focus on conveying the robot's intention to a human.Karamouzas et al. [65] identify a power-law interaction that is based not on the physical separation between pedestrians but on their projected time to a potential future collision, and is therefore fundamentally anticipatory in nature.

Mediating Conflicts in Navigational Intentions
This finding highlights that there is a value in understanding and mediating between the human's navigational goal and the robot's.
Murakami et al. [102] propose to smooth a wheelchair's trajectory to avoid colliding with pedestrians.Kruse et al. [76,77] investigate classic navigation algorithms that create erratic trajectories near obstacles that make a robot look confused.To address this challenge, they use context-dependent cost functions and directional cost functions that help a robot to solve spatial conflicts.One result, for example, is adjusting the robot's velocity instead of its path.
Silva and Fraichard [136] tackle the mediation problem using the notion of motion effort and how it should be shared between the robot and the person in order to avoid collisions.To that end, their approach learns a robot behavior using Reinforcement Learning that enables it to mutually solve the collision avoidance problem during simulated trials.Svenstrup et al. [143] propose a modified RRT for navigation in human environments assuming access to full state information.The proposed RRT planner plans with a potential field representation of the world, with a potential model designed for moving humans.Alternatively, recent work by Truc et al. [153] focused on drone navigation around people.This work introduced a human-aware 3D reactive planner for drone navigation.This planner is based on stochastic optimization of two criteria: discomfort due to the of the drone to pedestrians, and visibility of the drone.
A different line of research combines social navigation and person following.This combination can work in several directions: both Müller et al. [100], Topp and Christensen [150] present collision avoidance algorithms that are utilized in the context of following one particular person through a populated environment.Alternatively, in Yao et al. [165], the robot leverages the planning of other pedestrians and follows them instead of searching for a solution on its own.Table 7 summarizes the taxonomy values for mediation algorithms for social navigation discussed in this subsection.

EVALUATING AN INTERACTION
The numerous different metrics and evaluation methods used in social navigation make apparent the need to standardize them.This section is meant to provide tools and metrics to evaluate new research in social navigation with respect to the existing literature and with our proposed taxonomy to provide context for evaluation.As we are surveying an interdisciplinary area, many of the metrics used so far for evaluation were adapted from other research areas (e.g. Human-Computer Interfaces, psychology, physics, mechanical engineering, and more).To pinpoint the most common and useful metrics, we discuss only the metrics that were used in the papers that were presented in the tables in Sections 3 and 4. For each metric we present, we mention the taxonomy attributes that are the most relevant and can directly affect the values of the metric.For example, measuring group formation directly depends on the Number of Agents in the environment, since if there is only one pedestrian it cannot form a group.Table 8 summarizes this evaluation according to the different aspects of the interaction: properties of the interaction itself, actions taken by the human or the robot, emergent behaviors, algorithmic properties, and others.This last aspect includes both qualitative evaluation and prediction accuracy, which is a very common metric to estimate the proficiency of obstacle detection, a preliminary step before the actual interaction.

Interaction Properties
This subsection discusses measurements that are related to the nature of the interaction itself, and are meant to evaluate how successful and efficient an interaction is.These metrics are objective, and external to the robot and the human.
Conflicts Count is one of the most common approaches to estimate the success of an interaction.This measurement is quantified in several ways: by counting desirable outcomes vs. undesirable outcomes, by counting accidents, or by counting interactions that ended without the robot reaching its goal.In this category, we also consider experiments that counted how many times the robot was required to replan [100] and how many targets it was able to reach in total [48].This measure is affected by the Number of Agents, the Experiment Type, and the evaluated Agent Type.
Speed is another very common metric used to evaluate an interaction.In general, faster velocities imply that the robot was able to navigate confidently without slowing down.Many researchers used this metric to complement conflict count, to account for cases where a robot may reach its goal quickly but frequently collides with walls.As a reference point, the robot's speed is usually compared to the average pedestrian speed (1.3 ± 0.2 m/s), but this value depends on whether they walk alone or in a group, as group size affects speed more than density level [99].Gérin-Lajoie et al. [42] measured similar results for natural walking around dynamic obstacles (1.44±0.17m/s).Accordingly, this measurement is greatly affected by the Robot's Role in the interaction, the Number of Agents, the Experiment Type, and the Agent Type.
Path Time is a way to measure the velocity of the robot throughout a full interaction.As the robot might accelerate or decelerate, recording the total time that it took the robot to reach its goal is a simple way to measure its performance.
One unique metric that is also relevant to throughput is "social work", defined by Ferrer et al. [34].This metric measures the total work done by the robot, and the summation of the work done by each person in the scene.Kanazawa et al.
[62] examined the total waiting time that the robot had experienced during the interaction.This measure depends on the Robot's Role, the Number of Agents in the environment, and the Experiment Type.
Path Length provides another perspective about the interaction, and is correlated with speed and path time: by counting any two of these three metrics (Speed, Path Time, and Path Length) one can get a reasonable estimation of the third.
As such, this metric is also affected by the same attributes as the other two metrics: the Robot's Role, the Number of Agents, and the Experiment Type.
Acceleration is a way to measure the changes in the robot's behavior throughout the interaction.A robot that accelerates or decelerates several times in an interaction is an indication that it had to replan or adjust to avoid a conflict.This metric is highly affected by the Robot's Role, and the Number of Agents.
Smoothness is a generalization for several metrics that measure the total energy that was put into the interaction by the robot or the human.Successful interactions are expected to require less energy than unsuccessful interactions, which force the robot to replan.Smoothness can be evaluated in several ways, including acceleration/deceleration over time, total kinetic energy used [116], path irregularity (how many unnecessary turns were taken) [48], cumulative heading change [112], and the integral of the square of the curvature to measure the smoothness of a pedestrian's path [64].
This measure is influenced by the Robot's Role, the Observability that can enable the robot to plan better ahead, and the Motion Control used.
Avoidance Distance is a way to measure how close the robot came to a conflict or a full collision with a human.Usually, a robot that is able to avoid pedestrians from afar is considered more successful than a robot that almost reaches collision [143].However, this success sometimes creates a tradeoff with the total length of the path the robot needs to take and the smoothness of the path.This metric is affected by the Robot's Role, the Number of Agents, and the Motion Control used that might have its own predefined distance-keeping restrictions.

Robot/Human Actions
While the previous subsection considered measurements of the interaction as a whole, in this subsection we discuss measures that evaluate the actions taken by the robot or the human.
Degrees Turned As part of an interaction, either the robot or the human (or both) turn to avoid collision.Evaluation which consists of this measurement usually tracks the degrees of the lane change of either party.This measure will be highly affected by the Robot's Role which will determine who will turn, the Number of Agents in the environment, and the Motion Control used.
Gaze is a general measurement, in which several different aspects can be evaluated, including fixation count and length [108], and the Gaze-Movement Angle (GMA) [26].Kitazawa and Fujiyama [73] investigated gaze patterns in a collision avoidance scenario with multiple pedestrians moving in a wide hallway shape area.They show that pedestrians pay much more attention to ground surface to detect potential immediate environmental hazards than fixating on obstacles.Therefore, most of their fixations fall within a cone-shape area rather than semicircle, and the attention paid to approaching pedestrians is not as high as that to static obstacles.Metrics that involve gaze are affected by the Robot's Role, Observability, Communication protocols that the human should be aware of, the Experiment Type, and Agent Type which can all have great effects on gaze patterns.
Head Orientation and Body Positions are ways to capture some intermediate value between the degrees turned in practice, and the changes in GMA.Recently, Kitagawa et al. [72] leveraged people's reliance on such cues and incorporated similar body rotations into an omni-directional robot to improve the way pedestrians perceive its performance.These metrics are highly affected by the Robot's Role in the interaction, the Communication channel used, and the Agent Type.

Emergent Behaviors
Several experiments have been designed to identify specific movement patterns and flow patterns that emerge during execution of social navigation algorithms, or to mimic human movement patterns that emerge in these contexts [5,84].
In many cases, these patterns are in the form of lanes [53] or group clusters. .
Lane Emergence is a phenomenon that exists in human crowds -whenever an environment becomes crowded enough, it is likely that people will follow the path of others who are going in the same direction [43,165].Group Formation is another phenomenon whose appearance implies the success of the interaction.However, unlike lane emergence, group formation is usually an explicit objective of a work that discusses these types of interactions: such work focuses on understanding how groups of pedestrians move together [99], and are investigating whether a robot can seamlessly join such a group [103], bypass it [144], or disperse it [22].This measure is affected by the Number of Agents and the Agent Type.
Maximal Density is a metric frequently used in simulations to stress-test an agent's ability to navigate in an environment with multiple other agents.When shifting to the real world, Fruin [38] 2, 3 proposed 6 levels of crowdness, which is referred to as Level of Service, as depicted in Figure 5.When comparing to human-only navigation, the average density of people in a non-crowded environment was evaluated to 0.03 pedestrians per 2 , and in a moderately crowded environment, there are 0.25 pedestrians per 2 [99].Notice that density, or the Number of Agents is an attribute in this survey's taxonomy -in this specific section, we only refer to evaluation that uses density as a metric, rather than as a controlled variable.

Algorithmic Properties
The previous subsections focused on measuring physical quantities, either about the interaction as a whole or about one of the parties.In this subsection, we focus on more algorithmic aspects of the interaction.The metrics presented here can often be measured internally by the robot.
Computation Time in social navigation refers to the robot's processing time.As the robot should perform in real-time, there is a need to evaluate whether the robot can process the required information, plan, and execute its plan on time.
Two different components that are measured by computation time are: interaction processing, which is usually measured in milliseconds [157], and learning (in data-driven approaches), which is usually measured in learning episodes for achieving a desired behavior [29].Computation time is influenced by the Number of Agents, Experiment Type and Agent Type.
Model Prediction is a crucial part of every social navigation interaction: in order to properly act, the robot should first be able to accurately predict the behavior of other agents in the environment.Some contributions focus solely on improving the part of the interaction that involves understanding the environment given sensor information, and accurately predicting trajectories [79, 96] 2, 3 , while others evaluate the prediction of pedestrian trajectories interleaved with robot execution [6,106].This metric is influenced by the Robot's Role in the interaction, Observability, and Agent Type.

Other Evaluations
So far, all evaluation metrics were objective and could usually be quantitatively evaluated.Some contributions focus on analyzing an interaction and identifying theoretical concepts, thus have no empirical evaluation, while others test subjective quantities (e.g.comfort level) or provide a qualitative evaluation of an interaction.
Survey Questions are the most common approach to elicit information from users about how they perceive an interaction with an agent or a robot.These metrics consist of comfort levels during the interaction [57,94,158], social presence [68,113], expectation matching [43,75], and more.With respect to comfort, Torta et al. [151] identify specific values for this comfort zone (182 cm from a sitting person and 173 cm from a standing person).Syrdal et al. [145] present an empirical evaluation of the role of video prototyping and evocation as a good way to evaluate non-functional aspects of HRI.Another type of subjective evaluation is of proxemics [112,143], which is related to avoidance distance that was discussed earlier, but can encompass additional information about the interaction.For example, Hall [49] identifies different interaction ranges: intimate space (up to 0.45m), personal space (1.2m), social space (3.6m), and public space (7.6m).When mapping these distances to human-robot interactions, the comfortable distance from a robot is 0.2m, and arrival tolerance 0.5m [19,78].A survey is also referred to in this survey as an Experiment Type, hence this is the most related attribute.
No Evaluation is a category designated for papers that make only a theoretical contribution, such as classifying different abstract types of interactions [124] or ones that provide only a qualitative analysis of an interaction [150].Accordingly, research with no empirical evaluation might be affected by all attributes of the taxonomy, depending on the subject of the analysis.to promote this goal is by using existing simulations or resources that can have a similar baseline.In this subsection, we identify some of the recent efforts to create social navigation benchmarks and evaluation frameworks.
Carton et al. [13] propose a framework for the analysis of human trajectories, and show that humans plan their navigation trajectories in a similar fashion when walking past a robot or a human.
Simulations are commonly used to evaluate of a social navigation algorithm or model (39 of the 75 surveyed papers used simulations), either as a preliminary step to physical navigation or as a completely independent task.Next we point out several available simulation tools that can be used to evaluate new contributions.For various reasons, there are not many contributions that can generalize to real-world interactions: First, robots can only be tested under similar conditions, meaning that an evaluation platform for large mobile robots will be different from one for smaller robots.Explicitly identifying how accurate a robotic design is (e.g.2D vs. 3D representation, joint movement, 3rd person vs. 1st person evaluation, etc.) is a key component in the design of any real-world robot experiment [145].In addition, real human-robot interactions require human presence, which introduces a lot of variability and cannot be just compiled into an algorithm that can be used repeatedly.
Mavrogiannis et al. [92] recently published a case study where people and robots navigated in a shared space.The robots used three distinct navigation strategies, executed by a telepresence robot (two autonomous, one teleoperated).
The first is Optimal Reciprocal Collision Avoidance (ORCA), a local collision-free motion planner for a large number of robots as proposed by Van Den Berg et al. [157] and the second is the social momentum (SM) planning framework, which estimates the most likely intended avoidance protocols of others based on their past behaviors, superimposes them, and generates an expressive and socially compliant robot action that reinforces the expectations of others regarding these avoidance protocols [93].These two chosen navigational strategies are agnostic to the fact that the other agent is a human.This assumption leaves an opportunity for further investigation.

DISCUSSION
In this survey, we identified specific components that comprise a social navigation interaction, and introduced a detailed taxonomy to provide researchers with a framework and a language for comparing and contrasting research in social navigation (Section 2).We then compiled a comprehensive list of papers that contribute to social navigation and discussed them according to their values given our taxonomy (Sections 3 and 4).Next, we surveyed the different measurements used to evaluate an interaction in this context, and highlighted the relations between these measurements and the taxonomy attributes (Section 5).
Social navigation is a growing research area.While we expect that the attributes we chose for the taxonomy will remain relevant in the years to come, additional attributes will be added and the focus of specific work might shift to deal with new settings.However, any progress in the field must be rooted in the fundamental components of social navigation as they are presented in this survey.In addition, the proposed taxonomy can serve as a framework that To conclude, we expect the field of social navigation to gain increased popularity and lead to more real-world applications during the next decade.This survey aims to help lay the groundwork for these exciting developments by mapping existing approaches onto a novel taxonomy, and providing a context for new contributions to social navigation. http://arxiv.org/ps/2106.12113v3

Fig. 1 .
Fig. 1.Study flow diagram showing the inclusion methodology used in this survey.

Fig. 3 .
Fig. 3.The hierarchical structure of the taxonomy's a ributes

[128] 1 , 2 .
Though the use of instrumentation such as headmounted gaze trackers or static gaze tracking cameras is limiting for mobile robots, recent work in the development of gaze trackers which work without such equipment [127] 1, 2 may soon allow us to perform the inverse of the robot experiments presented here, with the robot reacting to human gaze.Ratsamee et al.[122] propose to avoid collisions with humans by considering a social model that takes into consideration body pose and face orientation.

4. 1 . 2
Offline Inference and Learning .While the previous subsection focused on the recognition of human's trajectories during execution, some leverage these trajectories to learn and infer how a human would react in a social navigation interaction.Pacchierotti et al.[113] design a rule-based strategy for people passing that was inspired by spatial behavior studies.This strategy intends to mimic the way people avoid collisions once inside a person's personal space.One such successful approach uses Inverse Reinforcement Learning (IRL) to elicit the explicit cost representation to imitate human's social navigation behavior.Instead of hand-crafted functions, these papers use IRL to leverage data-driven approaches.IRL was extensively used to infer reward (cost) functions from human demonstrations.The most straightforward application of IRL is by Kim and Pineau[69], to learn a cost function that respects social variables over features extracted from a RGB-D sensor.This work used IRL to infer cost functions in a social navigation context: navigational features were firstly extracted from an RGB-D sensor, then represented as a local cost function learned from a set of demonstration trajectories by an expert using IRL.The system still operated under the classical navigation pipeline, with a global path planned using a shortest-path algorithm, and local path using the learned cost function to respect social variables.Obstacle avoidance was still handled by a low-level controller.Okal and Arras[112] tackle cost function representation at a global level in social context: they developed a graph structure and used Bayesian IRL to learn the cost for this representation.With the learned global representation, traditional global planner (A*) planned a global path over this graph, and POSQ steer function for differential-drive mobile robots served as a local planner.

Fig. 5 .
Fig. 5. Levels of Service from A to F: How crowded is the environment (taken from Fruin [38]) For several algorithms deployed in crowded environments, the researchers were able to detect the emergence of lanes in robotic navigation context, and considered this behavior as a sign of success, since lanes are usually an efficient way to navigate in crowds.This measure is affected by the Number of Agents, the Experiment Type, and the evaluated Agent Type.

Table 1 .
The Social Navigation Taxonomy Full / Partial / Depth / RGB.If the work is set up in simulation, the robot can have either full or partial observability.Work that involves experiments or evaluations with real robots usually reports specific type(s) of sensors that were used, such as depth sensors (e.g.LIDAR), or cameras (e.g.RGB, or RGBD).If more than one type of sensor is used, we mention all of them.

Table 2 .
An overview of the different multiagent based models used in social navigation.Role refers to the robot role, Obs. is observability, Com.refers to communication, and Exp.type is the experiment type.

Table 3 .
An overview of the different human-inspired and psychology-based models used in social navigation.Role refers to the robot role, Obs. is observability, Com.refers to communication, and Exp.type is the experiment type.

Table 2
summarizes the taxonomy values for models inspired by multiagent systems research.
have analyzed how gait and posture are affected by a sudden trajectory changes, as one might expect to see in conflict avoidance.Patla et al. [117] 2, 3 analyzed head yaw, trunk yaw and foot position when turning due to [66]]pected obstacle vs. turning abruptly due to an unexpected obstacle.To analyze the relationship between head pose and predicted walking trajectory, Unhelkar et al.[156]2, 3 discretized walking trajectories as a decision problem regarding which target a person would walk toward.They incorporated this information into an anytime path planner[105]1 and evaluated this enhanced planner in simulation.Holman et al.[55]2,3extend this predictive model to incorporate gaze.Senft et al.[130]identify and implement a navigational pattern for making space in a hallway.Their model involves controlling the robot's rotation and sliding motion, and consists of three steps: step, slide, and rotate.All of the contributions above leverage insights from empirical studies on humans and robots to manually construct models for social navigation.However, together with the improving abilities of machine learning, different learning techniques have been used to learn models of navigation in social contexts.Lu et al.[85]propose a planning model that can be tuned to match different social navigation contexts.Bennewitz et al.[5]learn motion patterns of people that can be used for trajectory prediction in social robots.Henry et al.[54]extend this approach by modeling partial trajectories.More recently, Vasquez et al.[158]used Inverse Reinforcement Learning (IRL) to infer a reward function for social navigation.They introduce a new software framework to systematically investigate the effect of features and learning algorithms used in the literature.They investigate the task of socially-compliant robot navigation in crowds, evaluating two different IRL approaches and several feature sets in large-scale simulations.Karnan et al.[66]collected a large scale human demonstration dataset, containing socially compliant data of navigation behaviors in natural indoor and outdoor spaces on a university campus.They used behavior cloning to learn a global and local planner to mimic human navigation behaviors.

Table 4 .
An overview of the different physics-inspired models used in social navigation.Role refers to the robot role, Obs. is observability, Com.refers to communication, and Exp.type is the experiment type.
[143] pedestrians that walk in a group.Swofford et al.[144]2, 3 use a Deep Affinity Network (DANTE) to predict the likelihood that two individuals in a scene are part of the same conversational group.They take into consideration the social context in which these interactions take place.A different type of force inspired work uses potential fields attached to moving pedestrians[143].This model has been leveraged in a modified Rapidly-exploring Random Tree (RRT) for navigation in human environments, though it assumes access to full state information.Table 4 summarizes the taxonomy values of models inspired by physics and mechanical engineering research.

Table 5 .
An overview of the different inference algorithms used in social navigation.Role refers to the robot role, Obs. is observability, Com.refers to communication, and Exp.type is the experiment type.
[141]nd for navigation in particular[2]. It been shown that humans are not the only species that can partially understand gaze cues from a very young age, but also chimpanzees and dogs [120, 140] 2, 3 .Gaze and head pose have both been shown to be significant indicators of a person's attention, which can be used to infer navigational goals.Stiefelhagen et al.[141]2, 3show that the visual focus of a person's attention can be deduced from head pose when the visual resolution is insufficient to determine eye gaze.Smith et al. [139] 2, 3 extend their work to a varying number of moving pedestrians.Of course, this gaze behavior extends beyond walking and bicycling.Recent work has studied the use of gaze as a modality for plan recognition in games

Table 6 .
An overview of the different intention-conveying algorithms used in social navigation.Role refers to the robot role, Obs. is observability, Com.refers to communication, and Exp.type is the experiment type.

Table 7 .
An overview of the different mediation algorithms used in social navigation.Role refers to the robot role, Obs. is observability, Com.refers to communication, and Exp.type is the experiment type.

Table 8 .
[100]erview of the different metrics used in the surveyed papers to evaluate a social interaction.Murakami et al.[102], Pacchierotti et al.[113], Müller et al.[100], [155]s et al. [84]created a rule-based simulation that can model up to 10, 000 pedestrians in an urban environment.Treuille et al.[152]offered a real-time crowd model based on continuum dynamics, which can facilitate large-scale simulations for navigation.Heïgeas et al.[52]presented a simulation platform where pedestrians act according to a physics-based particle force interaction model.Recently, Khambhaita et al.[68]created a simulated benchmark for social navigation tasks instead of physical experiments.This simulation is implemented with OpenAI Gym.Tsoi et al.[155]presents a testing platform that combines ROS and Unity into a social navigation testbed.In this platform's current version, it can measure whether or not the robot reaches its goal, time to goal, collisions with static objects, final distance to goal, collisions with pedestrians, and closest distance to pedestrians.