Attitudes towards Social Robots (ASOR): Revisiting the Scale with Four Types of Robots

Our work is theoretically grounded in the notion of sociomorphing contending that not all dimensions of experienced sociality with robots pertain to projection of human-like mental states i.e. anthropomorphism. To investigate dimensions of attributed sociality, we deployed the Attributes towards Social Robots scale (ASOR) in a video-based online study (n=202) with four different robots (Starship Delivery Robot, Telenoid, Blossom, Vector). The four robots were rated slightly differently which aligned with our expectations because of the differences in appearances and how they were contextualised in the videos. However, further evaluation of the statistical properties of the scale and the solicited qualitative feedback to the items pointed to limitations of the scale.


INTRODUCTION
Anthropomorphizing robots is commonly discussed in Human-Robot interaction (HRI) and social robotics (SR) in relation to developing robots with "social presence" and "social behaviors", and eforts at understanding why people treat robots as social actors (c.f.[3], [2]).While acknowledging diverse defnitions and assessments of anthropomorphism and sociality c.f. [7], [1], our work is rooted in a theoretical contention that not all experiences of sociality with robots amount to attribution of human-like mental states to robots.This theoretical anchoring is informed by our ongoing work concerning experienced sociality with autonomous delivery robots.
Relevant to this report fndings from observations and interviews with passersby were: i) people engage in pro-social behaviors towards functional robots and experience them as social actors of a kind, ii) people rely on anthropomorphic language to describe these interactions, simultaneously recognizing how such language fails to represent their experience.The lack of linguistically precise tools for describing social experiences with robots has been pointed out by Seibt and colleagues in reference to the so called descriptive problem [5,6].To assert that not all experiences of sociality converge to imagined or projected human-like capacities -even though both lay people and researchers are prone to describe them as such (c.f.[8]) -Seibt et al. proposed sociomorphing as a better ft for understanding the actual asymmetric social capacities people attribute to and experience with robots [6].Sociomorphing can manifest in types of experienced sociality.To descriptively anchor these, they developed the Ontology of Asymmetric Interactions (OASIS) theoretical framework based on the interdisciplinary evidence from empirical, conceptual, and phenomenological research (ibid.).
Motivated to explore how people perceive robot sociality on these theoretical grounds, we deployed the Attitudinal Stances towards Social Robots (ASOR) scale, developed by Damholdt et al. [4], in an in-between online study with four robot conditions including three social and one functional robot.We chose ASOR because of its conceptual bridge to the OASIS framework through the notion of sociomorphing [4, p.29].Our research goals were to: (i) Investigate whether ASOR achieves diferent outcomes for four diferent types of robots; ii) Assess statistical properties of the scale to compare the results with Damholdt et al. 's; iii) Probe limitations of the scale via qualitative feedback to the items.

ATTITUDINAL STANCE TOWARDS ROBOTS SCALE
The ASOR scale was developed to gauge how social robots can infuence fve dimensions of social relatedness.These include: (1) Socio-practical relatedness which obtains when "the agent is perceived as an interaction partner capable of training or acting in "accordance with a norm"" [4, p.28]; (2) Intimate-personal relatedness obtains "when a person is attached to an item"; (3) Moral relatedness obtains "when the agent is perceived as a moral agent or as a moral patient.";(4) Psychological relatedness refers to the situations when "the agent is perceived as having feelings and emotions and the perceiver is engaged in the processes of social cognition that are described by accounts of emotional contagion, sympathy, or compassion.";(5) Mental relatedness obtains when "the agent is perceived as having intentions and beliefs, and the perceiver is (consciously or unconsciously) engaged in the processes of social cognition that are described by theories of 'mind-reading'."(ibid.).Along these fve dimensions, 37 items were generated and statistically assessed with a sample of 339 participants.A factor analysis yielded a three-factor solution consisting of 25 questionnaire items.The resulting 3 factors were defned as follows: (1) ascription of mental capacities (AMC) assessing whether the respondent perceives the "existence of an emotional and mental life, hereunder the robot's self-understanding and the social obligations towards robot.",(2) ascription of socio-practical capacities (APC) encompassing items that gauge whether the respondent believes "the robot will act in the respondent's best interest and will act consistently over time" [4, p.48].; (3) ascription of socio-moral status (AMS) including the items that "refect the status of the robot as a social agent but also the specifc expected role and status it would have to the respondent" (ibid.).The limitations of the scale, as outlined by the authors, included: the need for further establishing cross-cultural validity, construct validity and predictive validity of the scale; and only moderate internal consistency of the AMS subscale suggesting the need for further elaboration.

REVISITING THE ASOR SCALE 3.1 Study set-up and participants
Our study consisted of the original 25-items ASOR scale and four experimental conditions with four robots: the humanoid robot Telenoid, the small creature-like robot Vector, the DIY social robot Blossom, and the commercial autonomous delivery robot by Starship.A sample of n = 202 participants (aiming at 50 per robot type) was recruited via the online research platform Prolifc ©(2023).Balanced sample (50% male and 50 % female participants) and "All countries" settings were used.Participants were paid a reward of £15.38/hr for their participation in the study (the study took on average about 8 minutes to complete).One participant was excluded because of unrealistically short survey completion time.
After reviewing the informed consent information, participants were asked a demographic question (age).The resulting mean age of participants was 27.88 years (SD=7.77).Participants were then taken to the video of one of the four robots depicting the given robot performing task(s).They were then directed to the pages displaying the ASOR scale items.The items were presented in the same order as they were presented in the original study by Damholdt et al. 1 .For consistency, we also preserved the 4-points Likert scale from 1 (not at all) to 4 (to a high degree) used in the original study 2 .As we were interested in participants' feedback to the individual items, we included an option to respond "not applicable/other" (NA) for each of the 25 items of the ASOR scale.When the NA option was selected, participants were asked to further elaborate on their choice in an open response text feld.Towards research aim (iii), we also recruited six additional participants among colleagues and friends.These participants had various disciplinary backgrounds.They were instructed to provide more extended feedback to the items they found challenging to interpret or respond to.They were asked to fll in the survey for all 4 robots with a break between several days up to one week between each survey.Only qualitative data (feedback to the items) from these six participants were included in the analysis.These participants were not compensated monetarily.

Stimulus videos
The videos were chosen to showcase (social) afordances of the robots and depicted tasks aligning with realistic use case scenarios.For the Telenoid robot, the same video as in the original ASOR study by Damholdt et al. [4] was used.The video, lasting 77 seconds, depicted a conversation between a woman and a Telenoid.The original video was in Danish; for our study we incorporated English subtitles.In the conversation, the woman expressed her concerns about her daughter's slow progress at school.In response, the robot suggested to assist with tutoring.The Starship robot video (80 seconds long) depicted a Starship robot delivering a package to its destination.The video started with the robot introducing itself in English, followed by it driving away.The video included a scene where the robot asked a pedestrian to press the trafc light button at the street crossing and thanking the person for granting the request.The video of the Vector robot (72 seconds long) consisted of interactions between a woman and the robot.These included: the robot "waking up" following the woman's command, the robot responding to the question about the identity of the woman, the robot "purring" in response to being "petted" on its back, and reporting the weather.The video of the Blossom robot (73 seconds long) depicted a woman inquiring the robot: "Hey Blossom, how are you?".In response, the robot moved its body and the woman commented jokingly "You seem happy today" and asked whether it would like to watch a movie.The rest of the video depicted the woman and the robot in front of the laptop screen, with the robot moving in reaction to the animated flm.

ANALYSIS AND FINDINGS 4.1 Insights on the four robots
To analyze how people assessed the four robots on the original ASOR scale, we computed the scales by averaging the items.Through that, the subscales become continuous variables.Table 1 shows the means and standard deviations for each robot and subscale: All four robots were rated rather low on the ascription of mental capacity, but higher on the ascription of socio-practical capacity and social-moral status.The results for the AMC and APC are in line with our expectations.For the AMS scale, we were surprised by the high ratings and by the fact that Telenoid was rated the lowest.We assume that the results on the AMS subscale are potentially inverse.For this subscale calculation, we reversed the items, as suggested by Damholdt et al.However, when looking at the semantic core of the original AMS items (see Table 2), reversing seems counter-intuitive.A One-Way ANOVA with a subsequent Tukey HSD post hoc test partly supported our assumptions.There was a signifcant efect of the robot type on the ascription of AMS: F(3, 197)=6.54,p<.00 ( 2 =.09).The post hoc tests revealed signifcant diferences between the Telenoid group and the Blossom and Vector group, but not Starship which to us was slightly unexpected, but it is to our conviction also related to the fact that the items of this subscale did not translate to the Starship use case well because they are tailored to social (by design) robots.Another signifcant diference was found for the APC subscale: F(3,197)=3.8, p<.01 ( 2 =.06).
Here the post hoc tests revealed signifcant diferences between the Blossom group and the Telenoid and Vector group, but again not Starship.The 2 values indicated medium efects, however the actual diferences in the mean ratings were small.

Evaluating the statistical properties of ASOR
Towards the research goal (ii) of exploring the statistical properties of the ASOR scale, we performed a reliability analysis with listwise deletion on all data (n = 201).Both the AMC (n = 196, = .88)and APC (n = 187, = .80)subscales had satisfying reliability.The AMS subscale had low reliability (n = 197, = .56).Given no AMS item stood out for improving reliability by deletion, this pointed to uncertainty regarding the dimensionality of the ASOR scale and confrmed the need for further elaboration of the AMS construct, as suggested by Damholdt et al.
We then conducted an exploratory principle factor analysis on the 25 ASOR items with varimax rotation to test if the factor structure as described by [4] would be replicated in our data.Pairwise deletion was applied to retain the maximum of possible data points considering the "not applicable" option.At most, 5 participants were excluded for an item (see Table 2).We performed the Kaiser-Meyer-Olkin test (KMO = .845)to measure the sampling adequacy and Bartlett's test of sphericity (BTS = [ 2 ] (300) = 1860.635,p = <.001).The results indicated that our data were appropriate for analysis.
An initial inspection of the eigenvalues showed 6 factors with eigenvalues above 1.The frst four factors explained a combined 51.7% of the total variance (27.3%, 10.6%, 8.2%, 5.6%) and had eigenvalues between 6.8 and 1.4, while the ffth and sixth factor explained 4.4% and 4.2% of the total variance and had eigenvalues slightly above 1 (1.09 and 1.04).The scree plot indicated retaining a 4 or 6 factor solution would be justifed.Thus, the factor analysis did not replicate the expected 3 factor solution.We investigated the rotated component matrix to identify which items constituted the factors in our solution, and to establish which items caused ambiguity compared to the original ASOR scale.
We specifed three criteria for factor interpretation: (a) An item needed a loading above 0.6 to be allocated to a factor.(b) An item was allocated to a factor if the diference between its highest loading and any other loading was at least 0.2.(c) A factor was suitable for interpretation if at least four items adhered to the frst two criteria.The overall structure of Factor 1 for the most part aligned with the AMC as in [4] and included six items that primarily gauged people's expectations about the robot having mental and emotional states (AMC B07, AMC B08, AMC B09, AMC C01, AMC C02, AMC C03).The remaining fve of the original AMC items (AMC B05, AMC B06, AMC C04, AMC D01) loaded elsewhere than the core group.The clustering of the items under Factor 2 (APC A04, APC A06, APC A07) suggested a semantic core that aligned with the ascription of the socio-practical capacities as predicted by Damholdt et al.However, fve items of the original APC subscale (APC A01, APC A02, APC B02, APC B03, APC D03, APC D05) loaded elsewhere, thus deeming Factor 2 overall unsuitable for interpretation per our criteria.The clustering under Factor 3 (AMC B06, AMC D01), based on the semantic core of the two items, suggested the underlying construct tapped into the dimension of perception of the robot as a moral patient, though again the factor included only two items.Factor 4 consisted of only two items (APC A02, AMC C04).Factor 5 consisted of the grouping of the three items gauging ascription of sociomoral status (AMS A03, AMS A05, AMS D04).Semantically, these items were associated with a feeling of discomfort experienced in relation to the robot.On Factor 6, only one item loaded sufciently (APC D05).In sum, these results suggest that,though there is some alignment between the factor structures resulting from our study and the one of [4], only 1 factor (AMC) performs as predicted.

Qualitative feedback to the items
For reporting, we clustered the qualitative feedback to the items based on the kind of issues raised by the participants (these included both participants online and the additional participants): Domain specifcity: Qualitative feedback to items APC A01 and APC A02 suggested it mattered to participants whether a specifc robot was developed to perform the task in question.For instance, in the case of the Starship robot, participants indicated they struggled to see how it would make sense to take medical or fnancial advice from it: 'This is too far away from its intended use to answer for me.".Lack of transparency about how the robot arrives at a decision: In response to items gauging whether participants trust the information provided by the robot (e.g., APC A07), participants indicated it was a challenge for them to respond because they lacked information with regard to the process leading to a specifc output: "I consider the robot as just 'front end' in this case and would probably try to disregard how good/bad it is.".Lack of afordances to perform an action: Selected items assumed specifc capabilities related to the robot design e.g., dialogue-based interactions (e.g., APC A02, APC A07).E.g., for Blossom, participants pointed out they did not see how the dialogue assuming items were relevant.Item containing two questions: In the feedback to the item APC A07 participants pointed out that the item in fact contained two, and not one question.Lack of interpretation anchoring with regards to mode of enactment: One of the common problems identifed in qualitative feedback was that participants did not know whether the items gauged if the robot could have a genuine phenomenological experience, or merely mimic or simulate e.g., an emotion or a mental state (e.g., AMC B08, AMC B09).Recognition of dependency on external actors such as programmers: Similarly, for the items that were implicitly entangled with the notion of agency (e.g., AMC B07, AMC C02), participants indicated that it was not clear to them whether the item probed the robot independently carrying out a

CONCLUSION
We conclude ASOR is a promising -though imperfect -tool to assess dimensions of attributed sociality.Based on our experience, there are several weaknesses of the scale beyond those identifed by Damholdt et al.First, though the descriptive problem is emphasized, and distinction between modes of simulations (functional replication, imitation, mimicking, displaying, approximating) is made in the context of the OASIS [6], many ASOR items rely on anthropomorphic language confating these diferences and leading to conficting interpretations.Second, the weak(er) performance of the APC and AMS subscales may have to do, in line with the descriptive problem, with the ambiguity of certain terms (e.g., intelligence, recognition), but, also with mixing of perspectives.Some items probe the expectations with respect to the robot capabilities, while others, assuming specifc robot capabilities, probe anticipated human reactions in underdetermined situations.The challenge with the APC scale may be further aggravated by the fact that no established unifed theory exists with respect to socio-practical relatedness.Simply put, while we believe socio-practical relatedness will play an increasingly important role when it comes to embedding robots in everyday lives, we are yet to formulate sound conceptual grounding concerning what socio-practical relatedness actually is and what it involves.Moreover, ASOR faces challenges common to other scales in HRI, such as probing post-hoc rationalizations, and, as is the case with our study, doing so based on a one-of exposure to a video stimuli as opposed to the actual phenomenological experience of interacting with a robot.Recognizing the challenges laid out above, our aim in a follow-up study is to introduce and test concrete incremental steps towards the improvement of the ASOR scale at the level of individual items and semantic cores of the subscales.

Table 1 :
Ratings for each original ASOR scale for each robot Do you think that you would like doing things with the robot (e.g.play Ludo, chit-chat, learn a new language, or cook)?
*The frst three letters of the item codes indicate the original ASOR subscales

Table 2 :
Item wording, codes, and primary factor with loading.NF stands for not assignable to factor decision, or it would be a result of someone else programming it to behave in a specifc way: "If it was programmed to do so.".Lack of context to interpret the question: One of the items that yielded a lot of feedback was AMS A03.Participants indicated the term intelligence is just too broad, and it was a challenge for them to respond to this question without any additional context provided: "What do you mean here?Being in one room with it or in any other kind of relationship?It depends.Intelligence is also subjective until some point, I guess".Participants experienced similar issue with the item APC A06, indicating they did not know what "best possible" meant, and what kind of advice that would be: "Advice on what?advice on where to cross the road or advice on whether I should take out a loan and invest it in bitcoins?".