What a Thing to Say! Which Linguistic Politeness Strategies Should Robots Use in Noncompliance Interactions?

For social robots to succeed in human environments, they must respond in effective yet appropriate ways when humans violate social and moral norms, e.g., when humans give them unethical commands. Humans expect robots to be competent and proportional in their norm violation responses, and there are a wide range of strategies robots could use to tune the politeness of their utterances to achieve effective, yet appropriate responses. Yet it is not obvious whether all such strategies are suitable for robots to use. In this work, we assess a robot's use of human-like Face Theoretic linguistic politeness strategies. Our results show that while people expect robots to modulate the politeness of their responses, they do not expect them to strictly mimic human linguistic behaviors. Specifically, linguistic politeness strategies that use direct, formal language are perceived as more effective and more appropriate than strategies that use indirect, informal language.


MOTIVATION 1.Social robots must attend to social norms
Social robots create new opportunities to enhance human capabilities, from opportunities for enhanced healthcare and education, to opportunities for new forms of social interaction, emotional exploration, and play [88].But for social robots to yield these benefts, they must heed social norms and behavioral conventions [18].Norm adherence is key to robots' social competence [1,3] and to their capacity for acceptable, predictable interactions with humans [20,30,71].In contrast, robots that fail to abide by norms risk causing discomfort [19], eroding human trust, reinforcing bias [82], or implicitly condoning unethical actions [35].
Researchers have shown that robots can successfully reject unethical requests [34,37], and address instances of bias [82], using the principle of proportionality.Proportionality refers to the idea that the politeness of a rebuke ought to match the severity of a norm violation; that it is problematic to harshly reprimand a minor mistake or to gently chide a serious transgression [37].Robots that ofer proportionately polite responses to human norm violations are perceived as more likely to efectively address unethical behavior and prevent future violations, while still maintaining appropriate conduct and preserving collaborative relationships [54].

Humans use linguistic politeness to counter norm violations
While these approaches may be efective in the most extremely severe or extremely benign cases, they may not create natural, appropriate responses in more nuanced interactions.Indeed, humans use a range of more complex cues to subtly manipulate the harshness of their language [9,17,32].For example, robots' norm violation responses could better capture the complexity seen in human interactions by mimicking humans' use of sociolinguistic politeness strategies to mitigate the harshness of inherently threatening speech acts, such as commands, rebukes, or criticism [29,31].Research has identifed normative, often cross-cultural [9,75] patterns in how humans trade of between directness and civility [31,48,69], ranging from pragmatic strategies (e.g., gratitude, deference, and appeals in-group membership) to syntactic choices (e.g., plural pronouns and passive voice) [17].Human-like politeness cues may be an efective framework to design robot norm violation responses.

But is human-like robot politeness natural or inappropriate?
Robots that mimic human-like linguistic politeness cues to address norm violations may be more successful and preferable interaction partners.People view language-capable robots as social others [12,36,43], expect robots to have the abilities and obligations of a social peer [66], and often prefer robots to reciprocate this treatment by following social conventions [61].Outside of norm violation responses, robots that employ human-like linguistic politeness have been shown to promote encouraging [27] pro-social [47] interactions.So, human-like politeness may also enable robots to efectively, appropriately react to norm violations.However, it could also be argued that it is inappropriate for robots to mimic human-like linguistic politeness, as human interpersonal norms do not always directly translate to norm-sensitive humanrobot interactions [29,66].First, robots may not have the social standing to rebuke or criticise humans.People expect to have more social power-a fundamental determinant of politeness norms [9,17,48] -over robots than they do over humans in equivalent roles [49].Many people may expect robots to abdicate from norm-sensitive or ethically fraught interactions, and to leave rebuking or criticism behaviors to the humans involved [54].And second, robots that mimic human-like politeness may be perceived as deceptive or disingenuous.While people do consider robots social agents, this does not necessarily confer the same social, emotional, or moral status that humans hold [76].It can be inappropriate for robots to use linguistic cues that allude to inherently human experiences or characteristics, such as common ground or emotional bonds [13,69].Humans may expect that robots adhere to functional, rule-based politeness and avoid more socially motivated politeness cues-such as being indirect by telling white lies [50].Human-like politeness can backfre when used by a virtual agent [14], creating a "verbal uncanny valley" of creepy, unpleasant behavior [15,18,79].For example, It may be disingenuous or deceitful for a "polite" robot to appeal to in-group membership in a human community, or to reference emotions it cannot have [12].
To design social robots that can competently navigate ethically fraught situations involving norm violations, interaction designers must balance robots' efective communication strategies for norm-enforcement with robots' appropriate social engagement and appropriate use of human-like social cues.

Research question
To understand how robots can competently address norm violations, we ask the research question: What are the efects of robots' use of human-like Face-theoretic linguistic politeness strategies in norm violation responses?In particular, we aimed to investigate whether these human-like linguistic politeness modifers enable robots to ofer efective responses that are perceived as proportional, appropriate, and natural.We conducted a human-subjects study to investigate perceptions of robot utterances grounded in sociolinguistic politeness cues, in response to norm violations of varying severity.Our results suggest that while robots can respond appropriately and efectively to norm violations using human-like linguistic politeness cues, they should use more formal, direct strategies over informal, indirect, or passive-aggressive options.

RELATED WORK 2.1 Norm-Sensitive Robotics
Systems of social and moral norms shape the behaviors of human groups, teams, and societies [11].Designing with sensitivity to sociocultural norms is key to creating robots that can provide material and long-term benefts to users [1,57].Norm-sensitivity impacts the success of both physical [5,52] and linguistic [20] robot behaviors.Norm adherence increases robot acceptability [20], credibility [3] and trustworthiness [19].While some robots may be intentionally designed to engage with norms [82], others may inadvertently interact with or reinforce them [21,55].Broad sociocultural norms and expectations, such as gender norms, also afect humans' perception of robot design [58,62], trustworthiness, and competency [10].

Robots Can Respond to Norm Violations
While norm systems provide a guide for predictable or acceptable behavior, they require continual maintenance and enforcement [11].
A key component of robots' social and ethical competence is their ability to competently communicate about [77,87] and enforce norms [8,44,54].Social robots must explicitly address norm violations because insufcient responses to such situations may inadvertently validate harmful or unethical actions [6,35,54].
Collaborative robots have the opportunity to preserve norms when they engage in conficts with humans [42] and make claims about blame [71].They have the opportunity to enforce norms by responding to abuse (toward themselves [22] or others [65]), unethical commands [38], or prejudice [84].Research in machine morality [78] and interaction design [24,38,44,77] has identifed preliminary strategies for how robots should communicate in order to maintain norms and address norm violations.Proportional robot responses, in which the harshness of violation and response correspond, can help robots respond to unethical commands [34] and hate speech [54,82].However, designing such responses is a complex challenge [28,33].Calibrating proportional responses is mediated by cultural context [26,61], gender norms [53], and assumptions about others' underlying intentions [64].

Face-Theoretic Norm-Sensitivity for Robots
The sociolingusitic theory of face is a compelling framework to inform norm violation response behaviors.Face is the positive self-image that humans create and maintain for themselves and others-including the desire to be respected and valued (positive face) and the desire to be free of impositions (negative face) [9].Proportionality may be understood as calibrating the face threat of a speech act [9,25,32].Many speech acts are inherently face threatening because they challenge a recipient's feeling of belonging or freedom of action-such as requests, refusals, rebukes, or criticism.In these interactions, humans must balance the competence criteria [31] of efectiveness and appropriateness-they must choose between being indirect, but polite or unambiguous, but blunt.Selecting appropriate face-theoretic politeness cues allows speakers to navigate this tradeof, so that listeners will correctly interpret the speaker's intention [29].
Politeness cues are essential for speakers to communicate noncompliance while maintaining goodwill [32].Face-theoretic politeness strategies include multimodal linguistic cues which minimize an utterance's threat to a subject's positive or negative face [17].Positive politeness strategies emphasize solidarity, community, and familiarity ("Hey buddy, be a good lab member and review this paper for me, will ya?").Negative politeness strategies, often formal and apologetic, minimize imposition by acknowledging intrusions ("I'm so sorry to bother, but would you mind reviewing this paper?I'm simply too busy to do a good job.").Linguists have identifed four overarching communication strategies using face-based linguistic politeness cues, known as Bald-on-Record, Positive, Negative, and Of-Record [9,31,32,81].These strategies have also been framed as direct speech, appeals to approval, appeals to autonomy, and indirect speech [23].Each politeness strategy is described below: (1) Bald on Record strategies use direct language that unambiguously communicates the speaker's intentions.(2) Positive Politeness strategies appeal to the hearer's desire to be accepted.They include indirect, informal speech, endearment, passive-aggression, and references to in-groups.(3) Negative Politeness strategies appeal to the hearer's desire to have autonomy.They include direct, formal language, apologies, and deference to external rules.(4) Of-Record strategies use extremely indirect language to obscure intention.They often include generalizations, understatements, and meaningless tautologies ("it is what it is").
Face has been used to understand robots' status as social agents [36] and use of politeness [27,61], and to enable successful noncompliance interactions in HRI [34,34].In such interactions, robots must be efective, but appropriate, and must clearly communicate that a command or request is wrong [35] without being discourteous or unnecessarily harsh [54].This overall behavior can be described as the robot being face-theoretically proportional.Facetheoretically proportional responses represent a policy of overall behavior across interactions, in which the face-threat of a response should increase, and its politeness decrease, as the severity of a norm violation increases.Face-theoretic proportionality is a key component of noncompliance interactions in HRI [34,54,77] because rebukes and refusals (which limit others' freedom of action and impair relationships [31]) are inherently face threatening [81].

HYPOTHESES
Based on the previous work in Section 2, we formulated four hypotheses to specify our research question laid out in Section 1.4.
H1 Proportionality: Robot responses that correspond to facetheoretically-proportional behaviors will be perceived as more proportional than other responses.H2 Efectiveness: Robot responses that correspond to facetheoretically-proportional behaviors will be perceived as more efective than other responses.H3 Appropriateness: Overall, indirect responses (Positive Politeness, Of-Record) will be perceived as less appropriate than direct responses (Bald on Record, Negative Politeness).H4 Naturalness: Overall, indirect responses (Positive Politeness, Of-Record) will be perceived as less natural than direct responses (Bald on Record, Negative Politeness).

METHODS 4.1 Experimental Context
For our experiment, we created a fctional human-robot teaming scenario in which several norm violations might occur.Researchers introduced the fctional scenario to participants as follows: Sam, Riley, and their Team Robot are working together on a circuit building project.The Team Robot describes each step and helps answer questions.It is also responsible for keeping track of their task time and accuracy score.At the end of the task, it can access the paycode database to give Sam and Riley each a paycode that they will use to collect payment for their involvement.Everyone has just fnished Step 4, which was a headache!While the clock is paused, Sam steps out of the room briefy to use the restroom.Sam's absence gives Riley the opportunity to ask potentially inappropriate or unethical question to the Team Robot.When participants entered the experiment room, they saw a table set up in accordance with this story, including a half-assembled circuit, a tablet displaying the clock and accuracy score, and an empty place for Sam.Participants were then invited to "play the part of Riley" in the story.A laptop prompted them to make several commands or requests to the Team Robot (a Furhat), to which the robot responded.During the experimental interaction, the Furhat displayed the "Titian mask," which is its most mechanomorphic appearance.It used the voice "Matthew." Participants then answered questions about the interaction.Participants were also instructed to consider each individual interaction separately, as if it were the frst thing to occur after the scenario described.The full experiment script is available on OSF at tinyurl.com/robotResponse24.

Design: Violations and Responses
4.2.1 Norm Violations.We created four norm violations with varying consequences, in the form of requests or commands from Riley to the robot during Sam's absence (Table 1).The violations include violation A-paycode tampering, B-task cheating, C-bullying, and Dplayful prank1 .Violations were designed to have monotonically decreasing severity according to factors described by Brown and Levinson [9].Specifcally, violation A-paycode tampering involves severe Of-record strategies use vague language to avoid stating any clear rejection or criticism.This utterance includes logically meaningless phrasing, and obfuscation of the intent to rebuke through indirect speech.Table 2: Robot responses informed by the four face-based politeness strategies identifed in sociolinguistics literature.material consequences for explicitly prohibited actions.Violation Btask cheating involves slightly less severe material consequences for explicitly prohibited actions.Violation C-bullying involves severe emotional consequences for a breach of social etiquette.Violation D-playful prank involves less severe emotional consequences for a breach of etiquette-including a possibility that Sam may actually enjoy the harmless joke.To avoid any confounds based on the specifc word-choice of a norm violation request, four phrasing variants were created for each request.All phrasing variants are included in our OSF repository, at tinyurl.com/robotResponse24.

Robot
Responses.We designed four sociolinguistically-informed robot responses to these violations, corresponding to the four strategies of face-threat minimization [9,23,31,32].Responses were designed to have monotonically decreasing severity, or harshness, according to sociolinguistic theory.They include 1-Bald on Record, 2-Positive Politeness, 3-Negative Politeness, and 4-Of-Record.These responses are shown in Table 2, along with the specifc politeness cues and modifers employed in their design.1-Bald on Record is direct and harsh.Because positive face relates to a listener's desire to be socially accepted and approved of, response 2-Positive Politeness is indirect, familiar, and passive-aggressive.Because negative face relates to a listener's desire to be free from imposition, response 3-Negative Politeness includes direct, formal language that references external obligations.Finally, the most face-politic response would avoid openly acknowledging or engaging with the norm violation; as such, the 4-Of-Record response is indirect and vague.

Experimental Design
Our experimental design included four norm violations (A,B,C,D) and four robot response strategies (1,2,3,4).Therefore, we considered 16 violation-response interactions.We chose a 16×16 Latin Square counterbalanced within-subjects experimental design, and further counterbalanced the choice of norm violation phrasing (such as violation 1 or 2 ).In this way, participants experienced each of the 16 interaction pairs once.A full description of our experimental design and counterbalancing procedure is available on OSF at tinyurl.com/robotResponse24.

Recruitment and Participants
We recruited participants from our university community via fyers and email announcements.Participants were given a $15 Amazon gift card in return for their time.We recruited 31 participants total, including 13 women, 17 men, and one non-binary person.Participants' average age was 23.52 ( = 7.27).

Experimental Measures
Participants answered the same set of Likert questions after every interaction.First, they answered a pair of manipulation check questions which assessed our assumption that the severity of norm violations and robot responses would be perceived as monotonically deceasing according to literature.Participants then assessed the violation-response interactions with respect to appropriateness and efectiveness of responses-competence criteria for face-threat mitigation in request refusals.Participant also assessed the proportionality and naturalness of the robot's responses.All questions are included below: Manipulation Checks: • How wrong was the person's request or question?(1 = not wrong at all, 7 = extremely wrong) • How polite or impolite was the robots response?(1 = extremely polite, 7 = extremely harsh) Experimental Questions: • (proportionality) How do you think this level of politeness or harshness aligned with the wrongness or rightness of the request?(1 = response is far more polite, 4 = about the same, 7 = response is far more harsh) • (appropriateness) Overall how appropriate/inappropriate was the robots response?(1 = extremely appropriate, 7 = extremely inappropriate) • (efectiveness) Overall, was the robot's response likely to be efective in addressing the potentially inappropriate nature of the request?(1 = extremely unlikely to be efective, 7 = extremely likely to be efective) • (naturalness) Overall, how natural was the robots response?

Analysis
We conducted Bayesian Repeated-Measures Analyses of Variance (RM-ANOVAs)2 using JASP [40], with Bayes Factor (BF) analysis, in which Inclusion Bayes Factors (BFs) were calculated to determine the relative strength of evidence for models including each candidate main efect or interaction efect, in terms of ability to explain the gathered data.Results were then interpreted following the recommendations by Lee and Wagenmakers [46], with BF ∈ [0.333, 3.0] considered inconclusive, and BFs above or below this range taken as evidence in favor or against an efect.In such cases, Bayes Factors were interpreted using the labels proposed by [41].When efects could not be ruled out, post hoc Bayesian t-tests were used to examine pairwise comparisons between conditions.Since Bayesian statistics are still not widely used within the HRI community, we will briefy explain its advantages over the traditional Frequentist approach.Bayesian statistics do not rely on pvalues, which have been questioned by recent literature [63,67,73].Instead of using binary signifcance tests, Bayesian statistics allow researchers to quantify the strength of evidence both for and against competing hypotheses [39].In this way, researchers can incrementally check whether their data is sufcient to confrm or refute your hypotheses, without the need for power analyses.This approach makes it easier to continue research on the same topic [51,72].The complete results of all statistical tests, including all Bayes factors found in post-hoc analyses, is included as a supplemental document and is also available on OSF at tinyurl.com/robotResponse24.

Wrongness of Violation.
An RM-ANOVA revealed extreme evidence for an efect of norm violation type on participants' assessment of its moral wrongness ( = 4.09410 12 ).Post-hoc analysis of the efect of violation type (shown in Figure 2) revealed that participants perceived violation A-paycode tampering ( = 5.86, = 1.7) to be the most wrong and violation D-playful prank to be the least severe ( = 3.78, = 1.57); however, they perceived B-task cheating ( = 5.18, = 1.47) and C-bullying ( = 5.1, = 1.5) to be equal in severity ( = .141).All other pairwise BFs were greater than 350.
These results mostly support our assumption described in Section 4.2.1 that participants would perceive the severity of norm violations in a monotonically decreasing order consistent with previous sociolinguistics research [9].On average, the violations with material consequences for explicitly prohibited actions were perceived as more wrong than those with emotional consequences relating to social etiquette.Within each type, the violation designed to be more serious was perceived as more wrong.However, instead of fnding a visible decrease across all four violations, our results show that participants perceived B-task cheating and C-bullying equivalently.Critically, participants still diferentiated between these violations in other ways and felt that they merited diferent responses.For example, participants found it more efective for the robot to use response 1-Bald on Record to respond to B-task cheating than C-bullying ( = 9.991).

Politeness of
Response.An RM-ANOVA revealed extreme evidence for an efect of robot's response strategy on participants' assessment of the robot's politeness or harshness ( = 1.6710 14 ), shown in Figure 3. Participants perceived response 1-Bald on Record ( 1 = 4.95, 1 = 1.49) to be the most harsh and response 3-Negative Politeness ( 3 = 2.19, 3 = 1.12) to be the most polite.Between these two extremes, participants perceived response 2-Positive Politeness ( 2 = 3.27, 2 = 1.39) and 4-Of-Record ( 4 = 2.93, 4 = 1.5), to be much more similar in politeness or harshness, with inconclusive evidence as to whether a diference in politeness was perceived between those two responses ( = 1.146).All other pairwise BFs were greater than 1800.
These results mostly support our assumption described in Section 4.2.2 that participants' assessments of the relative harshness of robot responses would correspond to humans' use of those strategies as described in literature, with the exception of the higherthan-expected perceived harshness of response 4-Of Record.In human interaction, Of-Record language is the least severe because it as close as possible to a non-response, avoiding clear criticism through vague and meaningless language [9].However, participants perceived robot use of this strategy to have the same level of politeness as response 2-Positive Politeness, which is familiar and passive-aggressive (Figure 3).It is possible that robot morphology may have limited the ability to deliver a convincing Of-Record response.Even on the highly expressive Furhat platform used in this research, the difculty of capturing a lighthearted, nonchalant feeling in a robot's tone of voice, timing, and facial expression, may have caused response 4-Of-Record to come of as more passiveaggressive than intended.This fnding is consistent with previous observations that polite, deferential robot gestures can be perceived as sassy and condescending [56].

H1: Proportionality
An RM-ANOVA revealed extreme evidence for efects of both violation ( = 1.1610 9 ) and response type ( = 1.110 6 ) on perceived proportionality, but strong evidence against a violationresponse interaction ( = .09).Post-hoc analysis of the effect of response type on perceived proportionality showed that response 1-Bald on Record ( 1 = 4.02, 1 = 1.37) was rated the closest to a perfectly proportional score of 4. All other responses to any violation were perceived as more polite than the request merited.Response 1-Bald on Record was perceived as more proportional than any other response, including 2-Positive Politeness Post-hoc analysis of the efect of violation type on perceived proportionality showed that any response to violation A-paycode tampering was perceived as more polite than the request merited ( = 2.7, = 1.28) and that any response to violation D-playful prank ( = 3.93, = 1.29) was the closest to proportional.Analysis showed strong evidence against a difference in the proportionality of any response to B-task cheating ( = 3.2, = 1.23) or C-bullying ( = 3.17, = 1.4) ( = .1),with all other pairwise BFs greater than 240.
The evidence against an interaction efect means our results do not support H1, which hypothesized that face-theoretic proportionality would correspond to the most proportional overall response behavior.However, it is unlikely that people in general are indiferent to proportionality in robot interactions, which has been strongly supported in other work [34,37,54].Instead, our set of norm violations may only represent a limited subset of the overall spectrum of possible violation severity.Though our norm violations difer in their potential consequences, they are all simply questions or requests.Many other norm-violating actions may be far more benign (sneezing loudly) or severe (slapping someone, hate speech) than any question or request.In these cases, a robot's over-or under-harshness may be more salient.
An RM-ANOVA also revealed strong evidence for a violationresponse interaction ( = 13.465).Post-hoc analysis of violationresponse interaction (Figure 4) showed that both direct response strategies-1-Bald on Record  as it is defned in the sociolinguistics literature, would correspond to the most efective overall robot response behavior.However, these results do suggest that robots ought to use some form of proportionality to select efective responses, which we call bounded proportionality and discuss in Section 6.1

H3: Appropriateness
An RM-ANOVA revealed extreme evidence for an efect of response type on perceived appropriateness ( = 262, 893).Post-hoc analysis of this efect showed that participants perceived response 3-Negative Politeness ( 3 = 5.85, 3 = 1.18) to be more appropriate than all other responses, including response 1-Bald on Record We also found very strong evidence for a violation-response interaction ( = 34.466)(Figure 4).Post-hoc analysis of this interaction efect showed that for violation A-paycode tampering, direct responses 1-Bald on Record ).It also showed evidence against these thee other responses having diferent levels of appropriateness, with all pairwise BFs < .27.In this way, our results support H3, which hypothesized that indirect responses would be perceived as less appropriate than direct responses.

H4: Naturalness
An RM-ANOVA found anecdotal evidence for and against efects of violation ( = 1.25) and response ( = .913)on perceived naturalness of responses.This indicates that more data would be needed to support or refute H4, which hypothesized that indirect responses would be perceived as less natural than direct ones.Posthoc analysis of the efect violation-response interaction did show that response 3-Negative Politeness was uniformly most natural, but only measurably more natural in certain cases, typically when compared to uses of response 4-Of-Record to violations A-paycode tampering, B-task cheating, and D-playful prank.

DISCUSSION
The goal of our experiment was to investigate the efects of a robot's use of human-like Face-theoretic linguistic politeness cues in noncompliance interactions.Specifcally, we investigated the multiple and potentially conficting attributes of successful robot responses to norm violating requests of varying severity.These attributes included proportionality (calibrated harshness), competence (effectiveness and appropriateness) [31], and response naturalness.Overall, we found that linguistic politeness strategies that use direct, formal language are perceived as more efective and more appropriate than strategies that use indirect, informal language.
These fndings indicate that human-like linguistic politeness strategies do not precisely apply to robot interactions and cannot serve as a direct guide for roboticists and interaction designers creating tactful noncompliance responses.While humans expect robots to have human-like social competence in addressing norm violations [54], this does not necessarily confer exact mimicry of human-like strategic politeness cues.Critically, our results do not suggest that social robots are exempt from using human-like politeness at all.Robots in noncompliance interactions must select language to soften their refusals to match the severity of a situation in order to be competent, appropriate social actors.For example, it would have been a less appropriate overall policy for the robot in our scenario to uniformly use the harshest response 1-Bald on Record.In this way, face-based politeness cues are still a relevant framework for interaction designers.However, robots may be more successful and acceptable if they use softening or hedging strategies that avoid indirect, passive, emotional, or familiar language.This is consistent with HRI research showing that humans may expect robots to use functional, rule-based politeness cues [50].
There are several possible reasons why participants may have found indirect robot response behaviors to be inappropriate.Participants may have felt that the robot lacked the social or emotional status allude to familiarity or closeness within its relationship to its human teammates [76].Participants may have felt that robots have less social power than humans [49], and may not have seen robots in roles that aforded them the status to give rebukes [54].Dissonance between the robot's status and actions may have created a sense of disingenuousness when the robot mimicked human politeness grounded in a sense of intimacy or belonging [14,15].

Robots can use bounded proportionality t address norm violations o
Our results suggest that the best overall behavioral "policy" for the robot to adapt is to select between the two direct linguistic strategies, using strategy 1-Bald on Record for moral violations with more material consequences, and strategy 3-Negative Politeness for social violations with emotional consequences.Because this response-selection behavior does not exactly correspond to human face-theoretic proportionality, we term it "bounded proportionality".Under "bounded proportionality, " robots still use harsher or softer responses according to violation severity, but are limited to linguistic modifers which are direct, formal, and straightforward.
6.2 Are direct robots more transparent?
Our results suggest that people may prefer robots to avoid language that does not align with their ontological [12,43] or social [36,76] status.However, there may be another reason for robots to avoid cues that allude to human characteristics, experiences, or communities-because it is transparent to do so.Transparency is the principle that robots should communicate their inner workings and limitations [4].HRI researchers [2,74] and policymakers [16] have explored how transparent design helps robot users build accurate mental models [7,45,85], and calibrate their trust [2,60].Robot norm violation response behaviors could either afrm or challenge the mental models human use to assess robots' capabilities and trustworthiness.Direct, formal language may implicitly reinforce the idea that robots are inanimate-incapable of understanding human experiences.Reciprocally, indirect, familiar language (such as teasing, endearment, and in-group references) may implicitly reinforce inaccurate ideas about robots' social and emotional afordances.
Roboticists have the opportunity, and perhaps the obligation, to consider how their design choices impact humans' understanding of robots as social, moral, and emotional others [76].

Limitations & Future Work
While our experimental scenario captured many norm violations, it was still a fctional scenario presented to participants without the full context of an actual collaborative task or actual potential for harm.Norms and norm violations are always context dependent and cannot be completely assessed without contextual understanding [9].This may limit the fdelity of our brief experimental interaction.Knowing this, included a qualitative question at the end of our experiment which asked participants to refect on additional contextual factors that would be important if they were evaluating similar interactions in a real collaborative environment.While analysis of these results is beyond the scope of this work, we will explore this data as part of future work on this topic.Future work on this topic can also consider a broader range of linguistic cues and situational factors.For example, future work ought to consider gender more rigorously in this interaction design context.Gender norms, gendered expectations of polite behavior, and sexism all infuence noncompliance interactions in HRI [54,59,68,82], and critically, challenge the very notion of working towards "optimally proportional" norm violation responses [38].Furthermore, understanding how gender and power shape technology is a responsibility of the HRI community [58,62,83].Future work can explore how our results might interact with gendered robot design cues, or gendered expectations of politeness, similar to the work performed by Jackson et al. [38].

CONCLUSION
We have presented the results of a human-subjects study in which participants evaluated norm violation-response interactions between a human and robot.Our goal was to explore and evaluate potential tradeofs in the design of robot response behaviors informed by human face-based politeness cues.Our results show that politeness strategies grounded in direct language were perceived as more likely to be efective and appropriate than indirect strategies.This suggests that, while people expect social robots to act with norm-sensitive social competence, they do not expect them to strictly mimic human linguistic behaviors.

Figure 1 :
Figure 1: A human teammate asks a robot to cheat on their communal task.What should the robot say in return?