Which Artificial Intelligences Do People Care About Most? A Conjoint Experiment on Moral Consideration

Many studies have identified particular features of artificial intelligences (AI), such as their autonomy and emotion expression, that affect the extent to which they are treated as subjects of moral consideration. However, there has not yet been a comparison of the relative importance of features as is necessary to design and understand increasingly capable, multi-faceted AI systems. We conducted an online conjoint experiment in which 1,163 participants evaluated descriptions of AIs that varied on these features. All 11 features increased how morally wrong participants considered it to harm the AIs. The largest effects were from human-like physical bodies and prosociality (i.e., emotion expression, emotion recognition, cooperation, and moral judgment). For human-computer interaction designers, the importance of prosociality suggests that, because AIs are often seen as threatening, the highest levels of moral consideration may only be granted if the AI has positive intentions.

and intervene to protect them [70].A recent study on the companionship chatbot Replika found that users expressed moral sentiments, such as feeling guilt for causing the chatbot's "death" when deleting the app and for being unable to give their Replika enough emotional support [43].While most people do not yet explicitly consider AIs to be subjects of moral consideration [53,59], many somewhat support protecting AIs from cruel treatment [46] and granting legal rights to sentient AIs [47].People also attribute future AIs morally relevant capacities, such as emotions [53].
For designers and practitioners to account for the prevalence and effects of moral consideration, there is a need for more comprehensive understanding of how people react to the many different features on which AIs vary, such as their autonomy [13,46], emotion expression [44,49], and physical appearance [40,57].For example, will users extend more moral consideration to a chatbot if it is more cooperative or more autonomous?Should engineers prioritize training a machine learning model to recognize the emotions of users or to express emotion-like states?Answering such questions depends on complex, relative effects that cannot be deduced from the current literature and that are difficult to assess with conventional user testing.
The present study estimates the relative effects of 11 features of AIs on their moral consideration using a conjoint experiment [6,30].Conjoint experiments, most commonly used in the field of marketing, are increasingly applied in a range of disciplines, including HCI [5,38].The methodology is ideal because it allows for the estimation of the effects of a large number of independent variables, much larger than a traditional experiment, on a single dependent variable.In the present experiment we asked participants to complete a series of tasks in which they evaluated pairs of AIs that varied in their levels of each feature (e.g., "Not at all," "Somewhat").We found that the presence of each feature increased moral consideration for AIs, and the strongest effects were from AIs having human-like physical bodies and the capacity for behaving prosocially (i.e., emotion expression, emotion recognition, cooperation, and moral judgment).

BACKGROUND
Below we summarize the existing empirical literature for each of the 11 features and develop hypotheses for their effects on the moral consideration of AI.Because of the breadth of this study across many different features, we only present a cursory review of each.We arrived at these features by reviewing the existing literature and conducting a pretesting study, detailed in the supplementary material, with an online sample that showed people 24 literature-based features and asked for quantitative scores of their importance for moral consideration as well as free-text addition of three features that were not in the provided list.We started with seven features popular in the literature and added four that were judged by pretesters as most important, using our own subjective judgement to mitigate overlap between features (e.g., leaving out "having goals" because it is often considered a component of "intelligence").This kept the total number of features close to those in typical conjoint experiments [6].Additionally, moral consideration is often associated with mind perception, the attribution of internal mental faculties such as feeling pleasure or pain [28].We wanted to avoid asserting the presence of such capacities in AIs because some people think that AIs fundamentally cannot have them.We therefore defined the features in functional, behavioral terms (e.g., "emotion expression" rather than "feeling emotions").This means that participants who think it is possible for AIs to have such mental faculties can infer them from their functions and behaviors, but participants who do not think such mental faculties are plausible can respond merely on the basis of functions and behaviors.

Autonomy
There are multiple definitions of autonomy in the HCI and humanrobot interaction (HRI) literature [9].While it is not a unidimensional concept, we operationalized it for the purpose of the present study as the capacity to behave independently, without the need for human control or supervision.Theoretically, autonomy should increase the extent to which AIs are perceived as human-like [18,34], which should in turn positively affect the extent to which they are granted moral consideration [75].Some empirical research supports this: Lima et al. [46] found that describing AIs and robots as "fully autonomous" increased the extent to which people think they should be granted rights, and Chernyak and Gary [13] found that children granted more moral consideration to a robot that appeared to move autonomously than one controlled by a human.However, autonomy can also have negative effects: Złotowski et al. [79] found that people reported more negative attitudes (e.g., feeling "uneasy" or "nervous") towards social and emotional interactions with autonomous than with non-autonomous robots, as measured by the Negative Attitudes Towards Robots scale Nomura et al. [50], and that this effect was mediated by a combination of realistic threats (e.g., taking jobs) and identity threats (e.g., to "human uniqueness").Overall, we predicted that AIs described as more autonomous would be granted more moral consideration (H1).

Body
We considered whether an AI has a human-like physical body, a robot-like physical body, or no physical body.HRI studies suggest that having a human-like physical body (compared to a robot-like or mechanical body) increases the moral consideration of AIs.For example, Nijssen et al. [49] found that people are less willing to sacrifice anthropomorphic robots than mechanical robots in moral dilemmas, Küster et al. [40] found that people considered it more morally wrong to harm a humanoid robot than a zoomorphic one, and Riek et al. [57] found that the extent to which people empathized and were willing to help robots depended on their degree of anthropomorphic appearance.There is less research on people's moral consideration of AIs with physical bodies versus those without physical bodies at all.Some studies have found people rate physical robots higher than virtual agents on some relevant measures, such as lifelikeness [36,56], though Lima et al. [46] found no difference in respondents' attribution of rights between "robots" and "AIs." Overall, we predicted that AIs described as having robotlike or human-like physical bodies would be granted more moral consideration than AIs described as having no physical bodies (H2).

Complexity
This refers to the complexity of the program an AI runs to determine its behavior.Participants rated this feature as relatively important in our pretesting study (ninth out of 24 features), but there is little existing research on its effect on moral consideration.One exception is Shank and DeSanti [66], who found that knowledge of an AI's program-which can increase the perception that the AI is complex and sophisticated-marginally increased the extent to which it was perceived as having a mind, which should in turn increase moral consideration [28].We predicted that AIs described as running more complex programs to determine their behavior would be granted more moral consideration (H3).

Cooperation
This refers to the extent to which an AI behaves cooperatively with humans.It was rated as the most important feature by participants in the pretesting study.While there are many studies on cooperative interactions between humans and AIs (e.g., [37,48]), there is relatively little research on its effects on the moral consideration of AIs.Correia et al. [16] found that people perceived more warmth and competence and felt less discomfort towards robots that were more cooperative in social dilemmas.Bartneck et al. [8] found that people were more hesitant to turn off more agreeable robots than disagreeable ones.Shank [64] found that people were more likely to resist and punish computers that used coercive versus cooperative social strategies, and Shank [65] found that more helpful sales computers were evaluated more positively and as more moral.While there are many different forms of cooperation, which may have heterogenous effects in practice, we hypothesized that AIs that are described as more cooperative would be granted more moral consideration (H4).

Damage Avoidance
Avoiding damage can indicate that an entity can be harmed and have negative mental experiences such as feeling pain, and should therefore be associated with moral consideration [28].Several studies support this possibility: Küster et al. [40] and Ward et al. [74] found that visibly damaged robots were granted more moral consideration than undamaged robots; Tanibe et al. [71] found that observing a damaged robot being helped increased perceived capacity for experience and moral consideration; Rosenthal-von der Pütten et al. [58] found that people granted more moral consideration to a robot that had been tortured than one that had a friendly interaction; and Suzuki et al. [67] found electroencephalographic evidence that people empathize with robots in painful situations.Although these studies tested the effects of damage that had already been inflicted on robots rather than robots trying to avoid being damaged, we predicted that AIs described as trying to avoid being damaged to a greater extent would be granted more moral consideration (H5).

Emotion Expression
Expressing emotions can indicate that an entity can experience emotional mental states, so it should be predictive of the moral consideration of AIs [28].Several studies support this hypothesis: Lee et al. [44] found that participants granted robots more moral consideration (measured using Piazza et al.'s [54] moral standing scale) when they were described as being able to feel, Nijssen et al. [49] found that entities described as experiencing emotions were less likely to be sacrificed in moral dilemmas, and Eyssel et al. [19] found that robots that displayed emotional responses in interactions with participants were rated higher on relevant measures such as human-likeness, likeability, and closeness, than robots that displayed neutral responses.However, perceived emotion can also have negative effects on perceptions of AI; Gray and Wegner [27] found that it causes the uncanny valley, the feeling of creepiness that some people report when interacting with human-like AIs.Overall, we considered that the existing research supports the hypothesis that AIs described as expressing emotions to a greater extent would be granted more moral consideration (H6).

Emotion Recognition
Emotion recognition is important in HCI for building AIs that can express empathy, which leads to positive interactions with humans [32].Despite the likely association, we found no studies that directly tested the effect of emotion recognition in AIs on their moral consideration or related measures.Supporting a positive effect, participants in our pretesting study rated it as the eighth most important feature.We predicted that AIs described as recognizing emotions in others to a greater extent would be granted more moral consideration (H7).

Intelligence
There are many possible definitions of intelligence.Following Legg and Hutter [45], we operationalized this as the use of capacities such as memory, learning, and planning, to achieve goals.The evidence on the importance of this feature on the moral consideration of AIs is mixed.Lee et al. [44] found no effect of the capacity to think and reflect in robots on their moral consideration, and Złotowski et al. [78] found no effect of intelligence on the perceived human-likeness of robots.On the other hand, Bartneck et al. [8] found that robot intelligence reduced participants' destructive behavior towards robots when told to do so by an experimenter.There is also evidence of a positive effect of intelligent in the context of other nonhuman entities: Sytsma and Machery [69] found that people found it more morally wrong to harm more intelligent extraterrestrials, and Piazza and Loughnan [55] found that intelligence is an important factor for the moral consideration of nonhuman animals.Overall, we predicted that AIs described as more intelligent would be granted more moral consideration (H8).

Language
This refers to an AI's capacity to communicate in human language.With the development of increasingly advanced large language models (LLMs), such as ChatGPT and LaMDA, there is substantial interest in the societal effects of AIs with this capacity [17,23].Research shows that people consistently treat computers as social actors, such as by extending them courtesies such as "please" and "thank you" in conversation [11].People even perceive some degree of consciousness in ChatGPT [63], which should in turn be associated with moral consideration [28].We found a few studies suggesting that there are positive effects of AI language capacities on outcomes relevant to moral consideration such as anthropomorphism [20,60] and trust [76].Participants also rated this feature as the fourth most important in our pretesting study.We predicted that AIs described as having stronger human language capacities would be granted more moral consideration (H9).

Moral Judgment
This refers to the extent to which an AI behaves on the basis of moral judgments.It was rated as the second most important feature in our pretesting study.Swiderska and Küster [68] found that robots with benevolent intentions were granted greater capacity for experiential mental states than robots with malevolent or neutral intentions, which should in turn lead to greater moral consideration [28].Flanagan et al. [22] found that children ascribed greater moral consideration to robots that they deemed to have more moral responsibility.We predicted that AIs described as behaving on the basis of moral judgments to a greater extent would be granted more moral consideration (H10).

Purpose
One of the most frequent categorizations of AIs is their purpose, particularly the study of moral relations with social robots, that is, robots that have a social purpose [15,72], but almost no studies test the effect of purpose on moral consideration.One exception is Wang and Krumhuber [73], who found that robots with a social purpose were perceived to have more emotional experience and as less likely to be harmed than robots with an economic purpose.We predicted that AIs described as having a social purpose would be granted more moral consideration than AIs described as having non-social purposes (H11).

METHODS
All hypotheses, methods, and analyses for this study were preregistered at: https://osf.io/4r3g9.Survey materials, datasets, and code to run the analysis can be found at https://osf.io/sb753.

Participants
We recruited participants residing in the United States from the platform Prolific (https://prolific.co/).Power analysis using the R package "cjpowR" [24] indicated that a sample of 1,200 participants would enable us to detect approximately the lower quartile effect size based on a sample of highly cited conjoint experiments [61].In total, 1,254 people signed up for the study.After excluding 53 participants who did not complete the survey in full, 37 participants who failed at least one of two attention checks, and one duplicate response, our final sample consisted of 1,163 participants (50.7% men, 47.9% women, 1.1% other, 0.3% prefer not to say; mean age = 43.9,(standard deviation = 16.2);6.2% Asian, 12.2% Black or African The "Intelligence" feature only includes two levels because a minimum level of intelligence is required for many of the other features.American, 3% Hispanic, Latino or Spanish, 0.3% Native Hawaiian or other Pacific Islander, 73.4% White, 4% other, 0.8% prefer not to say).Participants were paid $1.45 for taking part in the survey, and the median completion time was 8 minutes 40 seconds.

Survey Design and Procedure
After giving their consent to take part in the study, we introduced the topic to participants with the text, "People tend to show different levels of moral consideration for the welfare and interests of different entities.For example, people tend to think it would be very morally wrong to harm a child, but not very morally wrong to harm a rock.In this survey, we are interested in understanding how morally wrong you think it would be to harm various artificial beings."We defined "artificial beings" as "intelligent entities built by humans, such as robots, virtual copies of human brains, or computer programs that solve problems, that may exist now or in the future."Participants were then told that they would be asked to complete a series of tasks, each of which would require them to read descriptions of two artificial beings presented side-by-side in a table, and then to choose which of the two beings they think it would be more morally wrong to harm.This question, adapted from Gray et al. [26], was the dependent variable through which we operationalized moral consideration.
These tasks made up the conjoint experiment, which was a choice-based, partial-profile, randomized design.The "partial-profile" aspect refers to the number of features presented in each task.In a "full-profile" design all features are presented in each task.In the present study, we randomly assigned seven of the 11 total features listed in Table 1 to each participant to include in each task.While Bansak et al. [7] showed that the number of features in a study can be much higher than 11, we considered that the more abstract, novel nature of our study favored a simpler partial-profile design.The seven features shown to each participant were held fixed throughout the experiment and presented in each task in the same order for each participant to ease cognitive load [30].For the same reason, key words of the features were highlighted in bold, as shown in Table 1.The levels of each feature, listed in the third column of Table 1, were randomly selected in each task by taking two levels from a randomized list that contained each level twice (e.g., "Not at all, " "Not at all, " "Somewhat, " "Somewhat, " "To a great extent, " "To a great extent"), which made combinations of two different levels slightly more likely and combinations of the same levels slightly less likely than if the feature levels were selected for each artificial being with equal probability.An example choice task is shown in Figure 1.We used the same levels (i.e., "Not at all", "Somewhat", "To a great extent") for many of the features to maintain consistency and limit cognitive load, though they could have been interpreted in different ways for different features.
We asked participants whether they think it could ever be wrong to harm an artificial being that exists either now or in the future (1 = Definitely not, 7 = Definitely).This question was used in sensitivity analysis, reported in the supplementary material.Using the same scale, we also asked participants whether they think artificial beings could ever experience pain or pleasure and whether artificial beings could be as intelligent as a typical human.These latter two questions were collected for exploratory purposes and were not used in any further analysis; we report these results in the supplementary material.
Participants then answered demographic questions on their age, gender, ethnicity, education, income, and political views.These Please carefully read the descriptions of the two artificial beings in the table below.

Feature
Artificial

Next
Figure 1: Example choice task.Each participant completed 13 such choice tasks.The seven features presented to participants were selected randomly and presented in a random order that was held fixed across tasks; the levels for each of the features were randomized in each task.questions were used both to understand the sample characteristics and to test for interaction effects, such as whether the effects of the features on moral consideration differ based on political views with results shown in the supplementary material.Finally, participants were debriefed and given the opportunity to provide feedback on the study.

Individual Feature Effects
In a conjoint experiment, we are interested in the average marginal component effects (AMCE)-the effects on moral consideration of an AI having a specific feature (e.g., "Somewhat," "To a great extent") versus not having that feature [30].These can be estimated with linear regression under testable assumptions [30], which we validate in the supplementary material.Each participant evaluated two descriptions of AIs in 13 choice tasks, so in total 30,238 AIs were evaluated.Since seven of the 11 features were shown per task, we had on average 19,242 data points to estimate the effects of each feature.However, because each participant completed multiple tasks, the data points are not independent.We therefore estimated the effects of the features with standard errors clustered at the participant level.The AMCEs are presented in Figure 2 and Table 2.The second column of Table 2 is the estimated effect for each feature.For example, the estimate of 0.062 for "Autonomy: Somewhat" indicates that if an AI was described as being "somewhat" autonomous, participants were 6.2 percentage points more likely to choose that AI as being more morally wrong to harm than an AI described as "not at all" autonomous.As the table and figure show, each of our 11 hypotheses (H1-H11) were supported; each of the features significantly affected participants choices about which AI it would be more morally wrong to harm in the expected direction.These results remained significant with a correction for multiple comparisons that held the false discovery rate at 10% [10]; see Table S5 in the supplementary material.

Categories of Effect Sizes
We conducted pairwise comparisons to test for differences in the size of effects between the features [14,52].For the features that were measured on three-point Likert scales ("Not at all," "Somewhat, " "To a great extent"), we compared the effects of the AI having the feature in question "to a great extent" versus "not at all."For Body, we compared the effect of the AI having a "human-like physical body" versus "no physical body." For Purpose, we compared the effect of the AI having a social purpose versus any non-social purpose.We did not include Intelligence in this analysis because, while it was on the same Likert scale as most of the other features, we only included two levels ("Somewhat, " "To a great extent"), as described in the methodology section, making effect size comparisons with the other features particularly difficult.We report the key results The dots with horizontal bars (color-coded for each feature) represent the means and 95% confidence intervals of the effects of feature level on the probability of choosing an artificial being as being more wrong to harm relative to the baseline level, which is shown as a dot on the vertical line crossing the x-axis at 0%.Where the bars do not cross the vertical line at 0%, the effects can be interpreted as statistically significant.Confidence intervals are calculated based on standard errors clustered at the respondent level.
here; full results can be found in Table S7 of the supplementary material.

DISCUSSION
We conducted a conjoint experiment to estimate the effects of 11 features on the moral consideration of AIs in a single study.As hypothesized, all of the 11 features in our study affected participants' judgments about the moral wrongness of harming AIs.These results support existing studies that have found positive effects of some of the features included in our study: an AI's physical body [40,57], emotion expression [44,49], autonomy [13,46], damage avoidance [71,74], intelligence [8], moral judgment [22,68], and purpose [73].The present study adds to the literature by providing evidence of the importance of several features that have received less attention: complexity, cooperation, emotion recognition, and capacity for human language.We compared each pair of effects to each other to estimate their relative strength.We found three categories of effect size.In the first category, with the strongest effects, were an AI's capacity for moral judgment and emotion expression.In the second category were emotion recognition, cooperation, and having a human-like physical body.In the third category, with the weakest effects, were autonomy, complexity, damage avoidance, language, and having a social purpose.While intelligence also had a positive effect, with the effect of having intelligence "To a great extent" compared to "Somewhat" being of a similar magnitude to the equivalent comparison for the features in the second category (see Table S8 in the supplementary material), we did not formally include it in this analysis because it was measured differently to the other features, as described above.In general, intelligence could be considered a meta-feature that undergirds many of the other features that we considered; it does not seem possible that a being with no intelligence at all could, for example, be autonomous, avoid damage, or recognize emotions in others.
Four of the top five features-emotion expression, emotion recognition, cooperation, and moral judgment-reflect an AI's capacity to interact prosocially with humans.The extant literature has focused most on the capacity for experience as a driver of moral consideration [28].Why do we instead find prosociality matters most in the case of AIs?This may reflect that humans perceive AIs as threatening-to our resources, our identity, and even our survival [79].We therefore grant them moral consideration conditionally, to the extent that they show prosocial intentions towards us.Further understanding the effects of these prosocial features, especially why they have the strong effects that they do in the context of AI, is a key topic for future research.
Other than prosociality, the strongest effect was having a humanlike physical body.This could be explained via an increased perception that the AIs have minds [1,21,27], though this explanation seems less likely because we included a range of features indicative of mind (e.g., emotion expression, damage avoidance) alongside an AI's body.A second possibility is that it reflects an anthropocentric bias based on mere appearance and human-likeness, perhaps echoing work in HRI [33], human-agent interaction [12], and social psychology [42] that shows humans also engage in group-based dynamics, such as in-group favoritism, with AIs.These possible explanations should be tested in future research.
From a design perspective, we know that AIs with human-like physical bodies and prosociality can promote better quality HCI [19,77].This can be due to factors such as creating greater familiarity with the AI and building on existing skills developed in social interactions between humans [77].The present study suggests that building AIs with human-like bodies and prosociality may have significant effects on moral consideration.Given the importance of morality in social interaction, designers may want to implement such features in AIs only when they aim to mimic human-human interaction.By increasing moral consideration, designing AIs with human-like bodies and prosociality could also help solve the problem of people being abusive towards AIs [2,51], which can cause expensive damage and dangerous situations for bystanders, though further research should be conducted on this question because human-likeness in AIs has also been found to be associated with greater levels of abuse [35].Additionally, Schwitzgebel and Garza [62] argue that we should design AI systems that evoke reactions that reflect their true moral status (i.e., how much they matter morally, for their own sake).If we build AIs with capacities associated with moral status, such as consciousness [41] or sentience [3], we should consider also designing them with human-like bodies, prosociality, or other features that affect moral consideration to facilitate accurate perceptions of the AIs.On the other hand, they argue that if the AIs do not actually have moral status, then building them with consideration-provoking features could result in people wasting resources to benefit AIs that they erroneously think warrant moral consideration.Another consideration against evoking such reactions is that they can cause psychological distress and conflict in users who feel that they have obligations towards the AIs [43].Overall, AI designers should consider that building AIs with certain features will likely have effects on moral consideration with a variety of consequences for interaction, sometimes unintended.

LIMITATIONS
Our study has some limitations.First, while the Prolific sample had some demographic measures close to the U.S. population (e.g., 47.9% women), it was not nationally representative, and we did not collect data from outside the U.S.
Second, conjoint experiments test hypothetical preferences rather than real-world behaviors.While such information is important, and many societal decisions are made on the basis of such hypotheticals (e.g., voting for social policies), they do not always translate to practical behavior, such as in the privacy paradox, the finding that people consistently report preferences for privacy that are not borne out in their online behavior [39].Future research should test the relative effects of these features in more concrete scenarios, such as with large language models, interactive robots, virtual agents, and other multifunctional AI systems.
Third, we asked participants how morally wrong they considered it to harm AIs.While this is a core aspect of moral consideration [27], moral consideration arguably has additional aspects, such as the attribution of rights.Also, while we gave participants background information about this idea, the use of a single measure is more likely to be misinterpreted than a more detailed measure would be.For example, participants could have interpreted our question in terms of the wrongness of actions they could take against the AIs (e.g., kicking a physical robot vs. deleting a non-physical AI) rather than about the AIs themselves.To explore this further, we conducted a study with 20 new participants asking why they thought it was morally wrong to harm the AIs they chose in this task and what they understood by the word "harm." As detailed in the supplementary material, participants tended to give reasons relating to the AIs themselves rather than specific actions (e.g., almost 50% indicated choosing AIs that had features that made them seem more human).Participants also typically understood the word "harm" broadly, capturing any sort of damage to the AIs, physical or psychological (e.g., "to injure, inflict pain, inflict physical or mental violence.")Overall, it seems that participants interpreted the question as we intended.Still, future research should assess additional aspects of moral consideration, such as through Piazza et al. 's moral standing scale [54].
Fourth, we used the levels "Not at all," "Somewhat," and "To a great extent" to describe the way in which the AIs had most of the features.While these levels are intended to be neutrally worded, it may be that, for example, people perceive the word "somewhat" differently when paired with "complex" compared with "intelligent." This is important to be aware of when making comparisons across features.An alternative approach would be to use feasture levels that are tailored to the specifics of each feature, though this could increase cognitive load, and, at least in the present study, it would introduce additional variation that makes direct comparisons more challenging.Future research should test such alternative designs.
Finally, our study prioritizes breadth over depth.This means that our operationalizations have less nuance than they would in a study of only a small number of features.For example, we operationalized "autonomy" as varying along a single dimension, the degree of independence from human control, but autonomy is more complicated, such as in the type of human control exerted.Similarly, we operationalized "body" using only three levels, "Human-like physical body, " "Robot-like physical body, " and "No physical body, " but there are other possibilities, such as a zoomorphic body or an ability to be uploaded into different bodies.There are many openings for future studies to build on this breadth-focused study by exploring particular variations across and within these features, especially of the features with the largest measured effects reported here.

CONCLUSION
AI systems are increasingly evoking moral reactions from humans.Because AIs can have a wide range of relevant features, we conducted an experiment testing the effects of 11 features on the moral consideration of AI.The presence of each of the features increased moral consideration, with the strongest effects from having a human-like physical body and the capacity for prosociality.In a world where AIs are perceived as threatening to humans, such as by replacing us in the workplace and challenging our sense of uniqueness, the highest levels of moral consideration may only be granted if the AI shows positive intentions.

Figure 2 :
Figure2: Average Marginal Component Effects.The dots with horizontal bars (color-coded for each feature) represent the means and 95% confidence intervals of the effects of feature level on the probability of choosing an artificial being as being more wrong to harm relative to the baseline level, which is shown as a dot on the vertical line crossing the x-axis at 0%.Where the bars do not cross the vertical line at 0%, the effects can be interpreted as statistically significant.Confidence intervals are calculated based on standard errors clustered at the respondent level.

Table 1 :
Features included in the conjoint experiment

Table 2 :
Average Marginal Component Effects a The baseline levels for Autonomy, Complexity, Cooperation, Damage Avoidance, Emotion Expression, Emotion Recognition, Language, and Moral Judgment were "Not at all." The baseline level for Body was "No physical body." The baseline level for Purpose was "Social companionship." b LL = lower limit; UL = upper limit.