skip to main content
research-article
Public Access

Prosocial Norm Emergence in Multi-agent Systems

Published:07 September 2022Publication History

Skip Abstract Section

Abstract

Multi-agent systems provide a basis for developing systems of autonomous entities and thus find application in a variety of domains. We consider a setting where not only the member agents are adaptive but also the multi-agent system viewed as an entity in its own right is adaptive. Specifically, the social structure of a multi-agent system can be reflected in the social norms among its members. It is well recognized that the norms that arise in society are not always beneficial to its members. We focus on prosocial norms, which help achieve positive outcomes for society and often provide guidance to agents to act in a manner that takes into account the welfare of others.

Specifically, we propose Cha, a framework for the emergence of prosocial norms. Unlike previous norm emergence approaches, Cha supports continual change to a system (agents may enter and leave) and dynamism (norms may change when the environment changes). Importantly, Cha agents incorporate prosocial decision-making based on inequity aversion theory, reflecting an intuition of guilt arising from being antisocial. In this manner, Cha brings together two important themes in prosociality: decision-making by individuals and fairness of system-level outcomes. We demonstrate via simulation that Cha can improve aggregate societal gains and fairness of outcomes.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Major practical applications of information technology can be understood as multi-agent systems viewed from a socio-technical perspective in terms of a social tier (comprising social entities such as people and organizations) and a technical tier (comprising computational and other resources) [39]. The agents represent the social entities computationally and reflect the interests of the entities they represent. As Singh [69] points out, the governance of such systems is naturally characterized by the social norms that apply at the social tier. Indeed, norms help regulate the interactions of autonomous agents [77]. Accordingly, it is not surprising that norms apply in resource-sharing settings broadly [70], such as social media [63] and for road sharing by autonomous vehicles [21]. In general, discussions of norms can be framed in terms of agent behavior; here, we focus on interactions between agents, to clarify that we are interested in agent behaviors that are of concern to others and are visible to others.

The recent burgeoning interest in AI ethics and safety has drawn increasing attention to regulating how intelligent agents act and interact [79]. In this regard, it is helpful to distinguish macro ethics (focused on a system) from micro ethics (focused on an individual agent) [18]. Micro ethics is concerned with how agents interact in light of the norms [16, 34], and macro ethics is concerned with what norms regulate a system, including considerations of fairness [52, 80].

At its heart, a social norm defines sound or “normal” interactions among members of a social group, reflecting their mutual expectations. A norm can be violated by a member though at the cost of suffering either a moral or another sanction [53]. A norm can be descriptive, merely describing a social group, or prescriptive, describing how the social group should function. Moreover, some norms can be constitutive, meaning they provide definitions of what counts as what—in essence, the rules of the game. Or, they can be regulative, meaning they constrain the interactions of the participants [15]. A lot of the work on multi-agent systems approaches them from the standpoint of explicit engineering, where stakeholder requirements are mapped to, among other elements, a set of norms [41].

This article adopts an equally well-established alternative perspective in which the norms are not designed into a system but emerge through the interactions of the agents [37]. Specifically, this article proposes Cha, a framework for norm emergence. Our motivations behind Cha are as follows. First, we would like to enable agents who incorporate prosocial decision-making to achieve norms that avoid conflict and improve fairness. Second, we would like to tackle the challenges of flexibility in that it should be possible for the emerged norms in a multi-agent system to change dynamically. Third, the process of emergence should not have to rely upon a central authority that somehow forces the agents to adopt or respect norms; instead, the norms should arise—and, when appropriate, change—in a decentralized manner solely through the interactions among the member agents.

Next, we abstract out some key desiderata from the above motivations and discuss how Cha addresses these desiderata.

Prosocial.

A prosocial attitude is an attitude to benefit another. Prosocial behavior has been well studied in sociology and social psychology [23, 66]. A prosocial behavior is when an agent performs an action that benefits others even if suboptimal to itself [55, 57, 64]. Whereas existing approaches for norm emergence focus solely on decision-making by agents, we relate social norms both to prosocial decision-making and to societal outcomes such as fairness. We show how to incorporate prosociality into norm emergence to achieve norms that promote fair outcomes and improve social welfare for the members of a multi-agent system.

Flexible and Enduring.

The flexibility and endurance (in other words, the longevity) of an interaction are important aspects of openness [76]. The membership of a multi-agent system can change yet there is continuity in learning norms. An example is Wikipedia, where the users are autonomous and changing but can and do build on each other’s work. In contrast, existing studies of norm emergence [3, 61, 73, 81] apply social learning and assume that agents repeatedly interact in a closed system. Specifically, these approaches assume that a fixed graph is given and the agents interact with their neighbors in that graph.

Dynamic.

Dynamism refers to the idea that norms may change with changes to the environment in which the system operates [38, 58, 76]. Existing works [45, 50] address dynamism inadequately. For example, once norms have emerged, they remain fixed. Existing approaches, such as Dell’Anna et al. [21], ,22], support norm change at runtime but via a centralized sanction revision, which is prone to a single point of failure.

Decentralized.

Decentralization refers to the absence of central authority [51]. In a decentralized system, norms arise solely through agent interactions [45, 46]. Most existing norm emergence approaches, such as Airiau et al. [3], Morales et al. [47], involve a central authority that determines the norms; some existing approaches, such as Mashayekhi et al. [44], use a hybrid scheme. A criticism of hybrid, and especially of centralized, systems is that they assume that an all-knowing authority is present in the system [58]. Whereas a central authority may recommend norms to achieve a societal criterion, such an approach cannot guarantee the autonomy of the participating agents. In settings with unreliable or delayed communication or where components can fail, a centralized system is fragile because it has a single point of failure.

1.1 Contributions

Cha is a general dynamic and flexible framework for norm emergence that promotes prosociality while supporting decentralization. We frame our contributions as an investigation of the following themes.

Efficient resolution of conflicts

captures the idea that norms emerge to avoid conflicts that arise between the members of a multi-agent system. In addition, efficiency indicates that the norms that emerge avoid conflict without damaging performance. In other words, the norms should address the liveness-safety tradeoff [40] without, for example, achieving high safety at the cost of liveness.

Dynamic adaptation

captures the idea that the norms in a multi-agent system must be sensitive to its operating environment. That is, both the conflicts and the performance opportunities that agents face depend upon the environment in which they interact with each other. In other words, in addition to the individual member agents potentially adapting to the environment, the multi-agent system as a whole would adapt through new norms.

Fairness

refers to the idea that disparities in the outcomes across the various member agents are low. In general, there is a well-known tradeoff between the aggregate outcomes for the members of a society and the fairness of the individual outcomes received by its members [1]. An aspect of prosociality as we apply it here is that the agents perform actions that benefit others even if suboptimal to themselves. Therefore, in our approach, the agents who are worse off are likelier to benefit from the prosociality of others than the opposite, and hence fewer agents are poorly off, and the disparity in outcomes across the agents reduces.

Social welfare

captures the idea of how much a society (as a whole) gains. Here, we specifically identify how a particular approach to coordination resource usage may lead to aggregate outcomes. This point can be framed as a question about the feasibility of a decentralized approach in producing social welfare relative to an approach that is fully centralized or incorporates elements of centralization through a distinguished entity to help coordinate.

1.2 Significance and Novelty

This article synthesizes two important perspectives on prosociality. First, the individual decision-making perspective incorporates a notion of guilt based on inequity aversion [27], which posits that people may be self-interested, but their decisions are affected by how relatively poorly others fare. In a multi-agent system, a prosocial agent in Cha would maintain some awareness of the outcomes received by other agents. When those outcomes are especially low for another agent, the first agent would notice the inequity and may feel guilt or exhibit aversion to that inequity. The agent would take decisions that lead to a reduction in that observed inequity.

Second, the societal perspective on prosociality is based on Rawls’ [56] landmark theory of justice that focuses on designing a just society. This perspective is broadly supported by more recent works as well, for example, by Adler [1]. Specifically, Cha supports Rawls’ doctrine of improving the outcome for whoever is the worst off. In other words, this doctrine advocates that we do not maximize throughput if doing so leads to some agents starving. We adopt the Maximin formulation of Rawls’ doctrine as a basis to measure fairness. By bringing in guilt, we can support norms that emerge in a bottom-up and adaptive manner without needing a central society enforcer.

1.3 Organization

Section 2 details the Cha framework. Section 3 describes how we model prosociality. Section 4 describes a simulated traffic intersection for evaluation. Section 5 discusses our results. Section 6 provides a summary of our contributions and an outlook for future work.

Skip 2The Cha Normative Framework Section

2 The Cha Normative Framework

Figure 1 depicts a schematic representation of an agent in Cha and different phases of its normative life cycle.

Fig. 1.

Fig. 1. Components of a Cha agent. Each Cha agent has sensors via which it perceives the environment, a knowledge base consisting of a set of norms (and norm structures), and components for generating, reasoning, updating, and sharing norms.

Algorithm 1 outlines the decision loop for a Cha agent in four phases enacted via the agent’s corresponding components: Norm Generation, Norm Reasoning, Norm Updating, and Norm Sharing (shown with labels in Algorithm 1). In this section, we describe norm representation in Cha and explain each phase of its normative life cycle along with the associated pseudocode that demonstrates the dynamics of Cha.

2.1 Norm Representation

As explained above, a norm in Cha is regulative: it characterizes how the agents ought to interact in specific situations. A “norm structure” is either a norm or a precursor to a norm that is based on a deontic operator proposed by García-Camino et al. [30]. We adopt a continuous notion of deontics [29], ranging from prh to obl with may in the middle. Initially, a norm structure is neutral. We express its neutrality using the operator may. The idea is that, as interactions take place, a norm structure may strengthen in one or the other direction or weaken back toward neutral. In cases of interest, a norm structure would strengthen enough in one direction or the other. Specifically, we say a norm emerges when the operator strengthens to obl or prh. Table 1 gives Cha’s syntax.

Table 1.
Norm::=\( \langle \)Antecedent, Consequent\( \rangle \)
Antecedent::=Condition
Consequent::=Operator(Action)
Operator::=may\( \mid \)obl\( \mid \)prh
  • Antecedent is a condition on the system state; Consequent is a deontic operator applied on an action.

Table 1. Syntax of a Norm Structure

  • Antecedent is a condition on the system state; Consequent is a deontic operator applied on an action.

In Cha, given a state, as it perceives, an agent applies any norm structure whose antecedent is true in that state (Norm Generation). It performs the action in the norm if its deontic is obl, does not perform it if the deontic is prh, and chooses either for may (Norm Reasoning). As the agent gains experience, obl or prh may begin to dominate, indicating a norm being learned (Norm Updating). Each agent passes on its experience to incoming members of the same type (Norm Sharing). Sections 2.22.5 describe these phases of the normative lifecycle in detail.

2.2 Norm Generation Phase

An agent perceives its environment through its sensors and receives a local view (Line 2). A local view is a snapshot of the system from the agent’s perspective. An agent’s view is in general partial. The sensors are potentially specialized to each domain: a sensor is a domain-dependent function to map the state of the environment to a description of the state in the agent’s representation. We express an agent’s views using predicates over the state of the system. For example, in a traffic scenario, a possible view is the traffic in one part of the road that identifies positions and directions of movement of the other vehicles in reference to the agent’s vehicle.

For each view, the agent checks whether a conflict would happen using the function \( f_{\text{conflict}} \) based on the possible next states, relying on domain information to determine which outcomes are undesirable. The agent gets the view \( v_{t} \)—the agent’s view at time \( t \), that would lead to conflict, and conflictAgent (Line 3). conflictAgent is the other agent involved in the conflicting view \( v_{t} \).

Here normSet is the set of norms represented by the agent (Line 4). A norm applies to an agent if its antecedent matches the agent’s current view. If no norm in normSet is applicable (Line 5), the agent generates a norm structure based on the current view (\( v_{t} \)) as its antecedent, initially with a may operator applied to the action that would lead to the conflict (Line 6). The generated norm structure is added to normSet (Line 7).

2.3 Reasoning Phase

The agent retrieves an applicable norm; selects an action according to its learning algorithm and an applicable norm, and performs the action (Lines 8–10). Next, the agent senses its conflicting agent’s action, \( a_c \) (Line 11).

Cha agents apply reinforcement learning with the \( \epsilon \)-greedy strategy for exploration and exploitation of the environment [74]. The \( \epsilon \)-greedy approach offers two choices for each agent—select a random action (Exploration, with probability \( \epsilon \)), or follow the applicable norm (Exploitation, with probability \( 1-\epsilon \)). An agent estimates \( \epsilon \) via an exponential function (\( e^{-Em} \)), where \( E \) is a constant, and \( m \) is the number of times that the same situation has arisen before. Consequently, \( \epsilon \) is high early (more exploration) and low later (more exploitation). Decaying exploration has been considered widely in the Reinforcement Learning literature [71]. The decay method is traditionally a linear or exponential decay. Exponential decay has been shown to lead to faster convergence [11].

2.4 Updating Phase

The agent updates its utility for the applied norm. After reasoning and selecting an action, based on the joint action of its conflicting agent, it receives a reward according to a payoff matrix (Line 12) and updates its utility via Equation (1) (Line 13). (1) \( \begin{equation} U(n,t,a)= (1-\alpha)\times U(n,t-1,a)+\alpha \times r(n,t,a). \end{equation} \)

Here, \( U(n,t,a) \) and \( U(n,t-1,a) \) stand for the utility of following norm structure \( n \) by performing action \( a \) at times \( t \) and \( t-1 \), respectively; \( 0\le \alpha \le 1 \) is a parameter to tradeoff exploitation with exploration, and \( r(n,t,a) \) indicates the reward based on a payoff matrix (examples in Section 4).

Each agent assigns two utility values to each norm structure for when the norm structure is, respectively, fulfilled (followed) and violated. Note that following or violating the norm structure eventually would lead to one of the operators with normative force, obl or prh, respectively, depending on whichever has the higher utility.

2.5 Sharing Phase

In Cha, each agent passes on its experience to incoming members of the same type (Line 14). Here, the experience is given by the utilities associated with different states and action outcomes, and agents of the same type are those with the same goals. Thus, an incoming member obtains experience from members of the same type who have some experience to share. This approach fits in with technologies such as Vehicle-to-vehicle (V2V) communication. We assume that agents pass on their experience truthfully; false information [72] is out of our scope.

Skip 3ACTING PROSOCIALLY IN CHA Section

3 ACTING PROSOCIALLY IN CHA

The above approach may lead to unfair outcomes. For example, imagine that in a road traffic scenario, agent \( i \) has the right of way, and agent \( j \) (conflicting with agent \( i \)) must stop. In other words, the norm for agent \( i \) is initially obl(Go) and the norm for agent \( j \) is prh(Go). If there is heavy traffic in agent \( i \)’s direction, agent \( j \) may have to wait for an arbitrarily long time. Long delays for some agents but not for others indicate unfairness.

In Cha, the agents act prosocially through inequity aversion [27]. An agent incorporates another’s cumulative costs in its utility to help the latter. That is, instead of following an applicable norm that would benefit it, the agent performs an action that benefits the other party. In the traffic domain, such an action might be to wait and let another vehicle cross when you truly have the right of way. A benefit of using the cumulative cost from when an agent enters the system until it gets the opportunity to benefit from the resource is that it can help prevent starvation. We assume that agents know about each other’s costs to be able to accommodate them in their decision-making. This assumption is consistent with real-world scenarios, in which people need to be aware of others’ condition first, to act prosocial. In human societies, people can figure out each other’s valuations enough to develop a sense of prosociality [65]. For artificial agents, this information must be either hardwired or explicitly communicated by the agents [32].

We incorporate guilt [43] as a guilt disutility to realize concessions. Please note that “guilt” here is not meant to capture psychosocial realism but is a convenient metaphor for a feeling that leads to prosocial behavior. Here, \( \delta _{i, j} \), as computed in Equation (2), expresses the guilt perceived by agent \( i \) with respect to \( j \) in state \( s \). (2) \( \begin{equation} \delta _{i, j}(s)= -\beta _{i} \left(\max \left(f_{j}(s)-f_{i}(s)- {c}, 0\right)\right)\!. \end{equation} \)

Here, \( f_x(s) \) computes the total cost paid by agent \( x \) until the present. An example of the calculation of \( f_x(s) \) is discussed in Section 4 (Equation (3)). Agent \( i \)’s propensity toward guilt is captured in \( \beta _{i} \): \( \beta _i=0 \) means no guilt, and \( \beta _i=1 \) means maximal guilt. Here, \( c \) is the threshold of inequity at which guilt kicks in.

Algorithm 2 adds prosociality to Algorithm 1 on Line 9 (action Selection). For simplicity, we enable prosocial learning once the system has converged—otherwise, the system may never stabilize, especially with high values of \( \beta _{i} \). Note that convergence is defined in Section 5.1. \( U_i^P \), the utility incorporating prosociality is initialized to the converged utility (Lines 2–4). Below, \( \lnot n \) is the complement of norm \( n \): it has the same antecedent but obl instead of prh or vice versa. That is, \( \lnot \langle p, obl(q)\rangle = \langle p, prh(q)\rangle \). Here, \( n \) is preferred by agent \( i \) and \( \lnot n \) by its conflicting agent \( j \).

Agent \( i \) receives agent \( j \)’s cost (\( f_j(s) \), Line 5). If the difference in costs is below constant \( c \), \( i \) follows norm \( n \) (Line 16). Otherwise, if agent \( i \) has not learned to concede (\( U_i^P(n,t,a) \gt U_i^P(\lnot n,t,a) \)) (Line 6), \( i \) follows norm \( n \) (Line 10), adds its guilt disutility (a negative value) to its prosocial utility (Lines 7–8). Eventually, \( U_i^P(n,t,a) \) would fall and agent \( i \) concedes to agent \( j \) (Line 13).

Skip 4TRAFFIC SETTING AS ILLUSTRATION Section

4 TRAFFIC SETTING AS ILLUSTRATION

We evaluate Cha in a simulated intersection understood as a multi-agent system, where vehicle agents continually arrive and depart. We select this setting because it is powerful enough to illustrate Cha but not so powerful that details of the setting overwhelm our main points. Related works, for example, Mashayekhi et al. [44] and Morales et al. [47], adopt similar settings as in this article. Please note that our focus is not to model the complexities of real-world traffic but to demonstrate the working of Cha. Cha could be potentially applied in more complex settings, for example, with a network of intersections.

As Figure 2 illustrates, we map an intersection and its vicinity to a grid. Traffic flows in four directions (north, south, west, and east). The intersection zone (i-zone) in the middle is composed of eight cells, highlighted in Figure 2. The light grey cells, including the center, are to be ignored. Cars travel along the grid at the speed of one cell per time tick. A vehicle may continue moving on a straight path or may randomly turn left or right in the i-zone. Agents of the same type are those traveling in the same direction, for example, all vehicles that are traveling east.

Fig. 2.

Fig. 2. The modeled intersection is mapped to a grid of cells. Traffic flows in four directions (northbound, southbound, westbound, and eastbound). We consider the possibility of conflict within the i-zone. The i-zone is composed of eight cells in the middle (excluding the middle grey cell). From the perspective of the car specified as the Agent (shown in the bottom right of the inset picture), its view is composed of three cells that it must observe to detect and avoid conflicts. L refers to the cell to its left, F to the cell directly to its front, and R to the cell to its right. The car specified as Conflicting Agent is in conflict with the Agent.

A conflict arises when two conflicting agents are about to occupy the same i-zone cell together. Conflicting agents observe each other’s actions without access to each other’s internal policies (Algorithm 1 Line 11). If there is no norm in a conflicting situation, each agent creates a norm structure using the neutral operator, may. As in Section 2, a norm emerges as the agents gain experience and the deontic operator becomes stronger, namely, obl or prh.

The antecedent of a norm refers to the values of three cells (Left \( L \), Front \( F \), Right \( R \)) identified in Figure 2 with respect to the vehicle (agent) entering from the bottom of the intersection. \( L \), \( F \), and \( R \) constitute the view of that vehicle. Our grammar can specify cells with one of these six values: \( \rightharpoondown \) (vehicle heading east), \( \leftharpoonup \) (vehicle heading west), \( \downharpoonleft \) (vehicle heading south), \( \upharpoonright \) (vehicle heading north), \( \oslash \) (empty), and \( \star \) (any of four directions). An example norm, \( L(\rightharpoondown) \) = prh(Go), means that an agent is prohibited from proceeding if it observes a vehicle in cell \( L \) heading east (\( \rightharpoondown \)). Figure 2 shows a similar case.

Table 2 shows our payoff matrix, which represents a social dilemma game, with the added twist of dynamism. Its constants are typical [3, 73]; the unselfishness term (that is, \( u \)) is novel to Cha. In a single round, the best positive payoff refers to the situation where one agent chooses a selfish action (Go) and the conflicting agent chooses the unselfish action (Stop). However, in a multi-round setting where the game is repeated, if the value of \( u \) is low, it is more rewarding to be cooperative than to be selfish, and if the value of \( u \) is high, it is more rewarding to be selfish than to be cooperative. Our flexible payoff matrix allows for the representation of various societies of agents.

Table 2.

Table 2. Payoff Matrix ( \( u \) is the unselfishnessCost)

The worst negative payoff relates to the situation when both agents are selfish, meaning that both decide to Go, thereby causing a collision. Equation (3) defines the payoff of an unselfish action for agent \( x \) in state \( s \), that is, \( f_x(s) \) in Section 3. The \( \max \) ensures that the payoff of stopping is never worse than of a collision, and as a result, the value of unselfish action ranges from 0 to \( -6 \). (3) \( \begin{equation} {\it unselfishnessCost}_x(s) = \max (-d_x(s)^p,-6). \end{equation} \)

We use a power function to calculate the delay. Here, \( d_x \), the delay experienced by agent \( x \), equals the number of ticks that \( x \) must stop before entering the i-zone. The cost increases exponentially with delay. Specifically, we require the exponent of the delay term to be greater than 1, that is, \( p\gt 1 \). Intuitively, when the delay (for a specific agent) is high, the penalty that the agent assigns to stopping should be higher. Equation (3) has this property.

Skip 5RESULTS: TESTING THE HYPOTHESES Section

5 RESULTS: TESTING THE HYPOTHESES

To evaluate Cha in the traffic setting detailed in Section 4, we propose the following hypotheses based on the four themes—efficient resolution of conflicts, dynamic adaptation, fairness, and social welfare—that we motivate in Section 1. For brevity, we omit the corresponding null hypotheses indicating there are no gains under Cha.

Hypothesis Hefficient: Cha leads to the emergence of norms that resolve conflicts while improving system-level performance outcomes.

Hypothesis Hdynamic: Cha leads to the emergence of norms that can adapt based on changes to the environment. That is, an emerged norm can fade out once the environment changes and be replaced by a new norm.

Hypothesis Hfairness: Cha leads to the emergence of norms under which fairness is achieved. Specifically in Cha, the agents observe the outcomes of others and act toward the benefit of those poorly off, leads to improved fairness.

Hypothesis Hsocial: Cha leads to the emergence of norms that yield higher societal gains than both a representative central and a representative hybrid approach to sharing resources.

We now describe our simulation setup to evaluate Cha and discuss each of the hypotheses, that is, Hconflict, Hefficient, Hdynamic, Hfairness, and Hsocial, in greater detail.

5.1 Simulation Setup

We model a traffic intersection (19 cells per lane: 72 cells in all with eight cells in the i-zone) as an environment in Repast [54].

Our results are averaged over 1,000 trials. Convergence is considered to happen if the utilities associated with the norms converge to within \( 10^{-3} \) (our convergence parameter).

We set the initial utility values to 0 at \( t=0 \), \( \alpha =0.2 \) in Equation (1) [47]. We set \( E=0.05 \) in the exponential function (\( e^{-Em} \)) used to set the exploration probability in the \( \epsilon \)-greedy exploration approach in the reasoning phase [25].

The impact of delay, that is, \( p \), the exponent of the delay term in Equation (3) is set to 1.1 for all agents. We choose this value somewhat arbitrarily because adjusting it would affect only the rate of convergence in our experiments, not whether convergence takes place.

For the fairness experiment to test Hypothesis Hfairness, we need to set the following parameters.

The propensity or weight of guilt, that is, \( \beta \) in Equation (2). Note that \( \beta =0 \) means no guilt and leads to no fairness and \( \beta =1 \) means maximal guilt, which leads to repeated yielding by agents in favor of others. For simplicity of evaluation, we set the same value of \( \beta \), specifically, \( \beta = 0.5 \), for all agents.

The threshold of inequity tolerated by the agents, that is, \( c \) in Equation (2). For simplicity, we set the same value of \( c \), specifically, \( c = 4 \), for all agents.

Note that the fairness parameters’ values (i.e., \( c \) and \( \beta \)) do not affect the convergence, since as mentioned in Section 3, we enable prosocial learning once the system has converged—otherwise, the system may never stabilize, especially with high values of beta. Fairness parameters’ values affect the rate at which the agents become prosocial.

5.2 Testing the Efficient Resolution Hypothesis

Hypothesis Hefficient states that emergent norms improve system-level goals—yield lower total average delays across an intersection. The corresponding null hypotheses indicate that there are no improvements in the system-level goals under Cha.

We consider a static setting in which we fix the traffic flows for the north-south and east-west directions. We first set the traffic flow distributions as the same for north-south and east-west directions. We observe that approximately half of the time (508 out of 1,000 simulation runs), north-south vehicles learn to Go, and east-west vehicles learn to Stop in conflict situations. In the remaining 492 times, the reverse norm arises. The population converges to one or the other norm depending on whether Go or Stop is more common for east-west than for north-south vehicles and thus minimizes collisions.

Our next setting is also static but with unequal traffic: the north-south orientation has a (30%) higher traffic volume than the east-west orientation. Figure 3 shows the total number of collisions per 1,000 ticks. Since there are four cells in the simulated intersection that have the potential of conflict, the maximum number of collisions is four per tick. After about 20,000 ticks, the number of collisions decreases dramatically. After 25,000 ticks (not shown here for brevity), the changes in the average utility converge to within \( 10^{-3} \), our convergence parameter.

Fig. 3.

Fig. 3. A total number of collisions per 1,000 ticks.

Based on the asymptotic reduction in collisions and the convergence in utilities, we conclude that vehicle agents have learned new norms to avoid a collision. Table 3 shows the emergent norms: east-west vehicles learn to Stop in conflicting situations with north-south vehicles. Since the north-south orientation has higher traffic volume, the converged norms are efficient—average delay is lower when vehicles in the direction with the lower volume Stop and in the direction with the higher volume Go, than the other way around. Norms that emerged in this experiment provide evidence to support the Hypothesis Hefficient.

Table 3.
PreconditionModality
Eastbound andWestbound\( L(\star)\wedge F(\star)\wedge R(\leftharpoonup) \)\( \mathit {prh}(\text{Go}) \)
\( L(\rightharpoondown)\wedge F(\star)\wedge R(\star) \)\( \mathit {prh}(\text{Go}) \)
Southbound andNorthbound\( L(\star)\wedge F(\star)\wedge R(\leftharpoonup) \)\( \mathit {obl}(\text{Go}) \)
\( L(\rightharpoondown)\wedge F(\star)\wedge R(\star) \)\( \mathit {obl}(\text{Go}) \)

Table 3. Emerged Norms for Static Setting with Equal and Unequal Traffic for the First Half and All the Time, Respectively

5.3 Testing Dynamic Adaptation Hypothesis

The Hypothesis Hdynamic states that Cha adapts to environmental changes—changes in traffic flow through an intersection. The corresponding null hypothesis indicates that Cha does not adapt to environmental changes. In this experiment, we start with a fixed traffic flow distribution where the north-south orientation has 30% higher traffic than the east-west orientation, let it converge, and then reverse the pattern (that is, the east-west orientation ends up with 30% more traffic). Doing so helps us determine whether norms learned in one setting persist when the traffic changes.

To test Hypothesis Hdynamic, we (1) measure the root mean square deviation (RMSD) of average utility in a sliding window of 1,000 ticks, computed as RMSDt = \( {\sqrt {\frac{\Sigma _{t=1}^{n} (x_i - \bar{x})^2}{n-1}}} \), where \( t \) is the current tick; \( x_t \) is the utility in the current tick; \( \bar{x} \) is the average utility in the current sliding window; and \( n \) is the window size, and (2) perform a two-sample Kolmogorov–Smirnov (KS) test on successive sliding windows.

Figure 4 shows the change in average utility (for a sliding window of 1,000 ticks) for westbound vehicles. By \( t\approx \text{25,000} \) (RMSD25,000 = 0 for east-west vehicles Stop; \( p\lt 0.01 \)), the norm learned by east-west vehicles is to Stop in case of conflict, just as in Table 3 and Figure 3. We highlight this norm in Figure 4 with the shaded box in the middle.

Fig. 4.

Fig. 4. Utilities of westbound vehicles for the dynamic setting. The reported figures are averaged over a window size of 1,000 ticks.

After \( t= \text{44,000} \), the traffic pattern is reversed, and by \( t\approx \text{50,000} \) (RMSD50,000 = 0 for east-west vehicles Go; \( p\lt 0.01 \)), the new norm is for east-west vehicles to Go in case of conflict. We highlight this norm in Figures 4 and 5 with boxes in the right parts. Eastbound and northbound vehicles have the same outcomes as westbound and southbound vehicles, respectively.

Fig. 5.

Fig. 5. Utilities of southbound vehicles for the dynamic setting. The reported figures are averaged over a window size of 1,000 ticks.

Table 4 shows how norms change when we reverse the traffic pattern, thus supporting Hypothesis Hdynamic.

Table 4.
PreconditionModality
Eastbound andWestbound\( L(\star)\wedge F(\star)\wedge R(\leftharpoonup) \)\( \mathit {obl}(\text{Go}) \)
\( L(\rightharpoondown)\wedge F(\star)\wedge R(\star) \)\( \mathit {obl}(\text{Go}) \)
Southbound andNorthbound\( L(\star)\wedge F(\star)\wedge R(\leftharpoonup) \)\( \mathit {prh}(\text{Go}) \)
\( L(\rightharpoondown)\wedge F(\star)\wedge R(\star) \)\( \mathit {prh}(\text{Go}) \)

Table 4. The Norms as Emerged in the Dynamic Setting When the Traffic Flow Has Been Reversed

Note that, given that the agents have developed a notion of norms already, they tend to adapt faster to environmental changes, that is it takes \( t\approx \text{25,000} \) ticks to initially learn the norms, and \( t\approx \text{6,000} \) ticks to adapt to new environmental changes and develop new norms. If the flows of cars invert intermittently (e.g., every x ticks), as long as the final traffic difference is significant (e.g., 30%), we still would expect Cha to be adaptive, however, it could take longer to converge.

5.4 Testing the Fairness Hypothesis

Hypothesis Hfairness concerns disparities in resource allocation—excessive delays for some vehicles to enter the intersection while others proceed quickly. We define a fair society as one that supports the Maximin criterion [56, p. 153]. In the present setting, a fair society means one in which no agent is deprived of resources for long periods of time. We understand fairness (or the absence of fairness) as an outcome of different norms.

Consider the prosocial learning strategy given in Algorithm 2. As in Section 5.2 (unequal traffic setting), we set north-south flows to have 30% more traffic than east-west flows. We saw that east-west vehicles learned to Stop in conflicting situations with north-south vehicles. Now, we verify whether a north-south vehicle can act prosocially in a conflicting situation by yielding to an east-west vehicle if it experiences a delay above a certain threshold.

Figure 6 shows the change in the average prosocial utility (\( U_i^P \)) of southbound vehicles. Prosocial learning can be activated after convergence; we activated it at \( t= \text{40,000} \). By \( t\approx \text{51,000} \), the agents have learned to be prosocial. Northbound vehicles (figure omitted for space) show the same trend as southbound vehicles.

Fig. 6.

Fig. 6. Prosocial utilities of southbound vehicles.

Below, Cha Neutral refers to the variant form without prosociality, i.e., without incorporating the guilt disutility. We evaluate Cha Neutral and Cha’s performance in terms of delays. Table 5 shows the percentiles of delays over the population of vehicles. For example, 99.5% of all vehicles are delayed by four or fewer ticks.

Table 5.
Percentiles\( \gamma \)\( \kappa \)
9999.599.799.9100
Cha Neutral345662.074.99
Cha344562.024.55
  • \( \gamma \) = Skewness; \( \kappa \) = Kurtosis.

  • Bold indicates better (or lower) delay.

Table 5. Delays with Cha Neutral and Cha

  • \( \gamma \) = Skewness; \( \kappa \) = Kurtosis.

  • Bold indicates better (or lower) delay.

We adopt percentile values as a metric for fairness because the distribution of latency has a long tail. Specifically, agents who suffer excessively would be those concentrated in the high percentiles even though the mean delay may not vary much between fair and unfair outcomes. We compute skewness (\( \gamma \)) and kurtosis (\( \kappa \)) for the delay distribution. Skewness is defined as a measure of symmetry or lack of it [20]. A distribution is symmetric if it looks the same to the left and right of the center point. A distribution with a lower (closer to zero) skewness value is fairer. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution [20]. A heavy-tailed distribution for the time delay indicates that a small number of agents suffer excessive time delay. A distribution with low kurtosis (light-tailed) for the time delay is fairer than a distribution with high kurtosis (heavy-tailed).

Figure 7 shows the delays of four or more ticks with and without prosociality. We observe that, by reduction, Cha which incorporates prosociality, improves the outcome for 35% of vehicles that experienced a worst-case delay of six ticks. Specifically, 225 (that is, 35%) of the 650 vehicles which experienced the highest delay (six ticks) under Cha Neutral (without prosociality), experienced a delay of five ticks or fewer under Cha with prosociality. Each of these vehicles saw a delay reduction of at least \( \frac{6-5}{6} \) = 16.67%. Cha thus promotes Rawls’ [56] Maximin doctrine. Also, note that, considering the values in Figure 7, we have 425 ticks less delay for Cha with prosociality (i.e., \( [2,\!117\times 4 + 973\times 5 + 425\times 6] - [1,\!917\times 4 + 948\times 5 + 650\times 6] =-425 \)). So not only Cha with prosociality improves the total mean delay, but also it impacts the heavy tail.

Fig. 7.

Fig. 7. Benefit from prosociality. Delays ( \( \ge \) 4) with Cha Neutral and Cha. 35% of vehicles that experience the highest delay (six ticks) with Cha Neutral experience an improvement of \( \ge \) 16.67% with Cha.

We also compute the Gini coefficient (Gini) to measure the degree of disparity in a distribution. A lower Gini indicates a lower degree of disparity. Table 6 lists Gini for the vehicles in the 99, 99.5, 99.7, and 99.9 percentile populations. We observe that Gini for 99, 99.5, and 99.7 percentile populations is lower for Cha compared to Cha Neutral, indicating lower disparity. However, Gini for 99.9 percentile population of vehicles is higher in Cha compared to Cha Neutral. Under Cha Neutral, all vehicles face the worst-case delay (six ticks), and thus the disparity is lower, whereas under Cha, the delay is either five ticks or six ticks.

Table 6.
Percentiles9999.599.799.9
Gini
Cha Neutral0.18962775080463990.201604271355843360.157058229065221135.867613055776399e-17
Cha0.166533546516381280.170986121884292160.1481500772283660.10910542249352645

Table 6. Disparity with Cha Neutral and Cha. Lower is Better

5.5 Testing the Social Welfare Hypothesis

Hypothesis. We take Hypothesis Hsocial as stating that Cha wins (in aggregate societal gains) over representative central and hybrid approaches. The corresponding null hypothesis indicates no gains under Cha.

Metric. Our metric is the mean travel time for the whole intersection. It is calculated by adding the best-case travel time (that is, 19 ticks, one for each cell in each lane) to the mean delay (mean number of stops for all vehicles in each tick.

Baselines. There are two main traffic control paradigms, pretimed (in which the signals change in a fixed pattern) and actuated (in which signals change in response to vehicles arriving at the intersection from different directions). The actuated paradigm is known to deliver higher performance than pretimed. Therefore, we adopt a variant of the actuated paradigm, called the fully actuated control strategy [75] as one of our baseline approaches. To compare to previous multiagent systems research, we also adopt the Silk [44] approach as another baseline because Silk is a hybrid approach and studies a similar social dilemma situation in a similar traffic setting. We highlight the difference between Cha and Silk in Section 6.1.

Fully actuated control is a traditional approach to regulating traffic. In fully actuated control, each movement, for example, the northbound traffic flow at an intersection, has a detector. A phase is a combination of nonconflicting movements. We pair northbound and southbound flows to create one phase and eastbound and westbound flows to create the other. We assume fully actuated control alternates between these phases. Each phase has a minimum green (set to one in our experiment); thereafter, the green can be extended indefinitely provided a vehicle keeps on being detected at the intersection.

For our comparison with Silk’s framework, we used the payoffs of Table 2 in both Silk and Cha. Since Silk supports only static settings with fixed payoffs, we set the payoff of Stop in Silk to zero.

Elaborating on Hypothesis Hefficient of Section 5.2, we consider a specific hypothesis that Cha yields a lower mean travel time than both Silk and fully actuated control. These experiments are amenable to statistical hypothesis testing: the null hypothesis is that there is no significant difference in mean travel times yielded by Cha, Silk, and fully actuated control. We run these experiments for the case where the north-south orientation has 30% higher traffic than the east-west orientation.

Figure 8 compares the average delay in Cha, Silk, and fully actuated control. Table 7 summarizes the travel time results. It lists the best-case travel time, mean travel time (\( \mu _{travel} \)), and standard deviation (\( \sigma _{travel} \)) for Cha, Silk, and fully actuated control. Table 7 also lists the \( p \)-value from two-tailed \( t \)-test assuming unequal variances, effect size via Glass’ \( \Delta \) [31], which is measured as the difference in means divided by the standard deviation of the control group), and % improvement obtained by Cha over Silk and fully actuated control. We choose Glass’s \( \Delta \) to measure effect size since the standard deviations (\( \sigma \)) for the two groups are different.

Fig. 8.

Fig. 8. Comparing average delay in Cha ( \( \mu =0.98 \) ), Silk ( \( \mu =3.65 \) ), and fully actuated control ( \( \mu =5.53 \) ). The benefits of the prosocial approach are clear.

Table 7.
ChaSilkFully actuated control
Best case19.0019.0019.00
Mean \( \mu _{travel} \)19.9822.6524.53
Stdev \( \sigma _{travel} \)0.551.282.56
% Improvement11.7918.55
\( p \)-value\( \lt \!0.01 \)\( \lt \!0.01 \)
Glass’ \( \Delta \)2.061.76

Table 7. Travel Time: Cha Versus Silk and Fully Actuated Control

We find that Cha reduces the delay and yields significantly less travel time than both Silk (\( p\lt 0.01 \); \( \Delta =2.06 \)) and fully actuated control (\( p\lt 0.01 \); \( \Delta =1.76 \)), reducing it by 11.79% over Silk and 18.55% over fully actuated control. Following Cohen’s [19] guidelines, an effect of over 0.80 is large; the above effect sizes are substantially above Cohen’s guideline.

Skip 6DISCUSSION Section

6 DISCUSSION

Cha is a flexible and dynamic framework for norm emergence in multi-agent systems that supports prosocial outcomes. In Cha, each agent reasons individually about which norms to develop. Our results support our hypotheses.

One, norms that emerge in Cha resolve conflicts efficiently and improve system-level outcomes. Two, Cha is responsive to changes in the environment. Three, more importantly, in light of recent increasing awareness to promote and foster prosociality in artificial intelligence generally and multi-agent systems, Cha supports prosocial outcomes, specifically, fairness in resource allocation [52]. Cha incorporates prosociality based on inequity aversion and naturally respects Rawls’ Maximin doctrine to improve the worst-case outcome for members of the society. To our knowledge, no other approach tackles the concerns of prosociality in the study of norm emergence in multiagent systems. Four, Cha yields higher societal gains than both a representative central approach and a hybrid approach (baselines in our evaluation).

Note that Cha applies reinforcement learning to norms. Unlike work on social learning, Cha shows how incorporating guilt disutility can lead to prosocial decision-making and thus to fair societal outcomes. Also, Cha does not require repeated interactions in a fixed graph. Departing agents can convey their experience to those who come after them, which facilitates norm emergence by giving endurance to the system even though individual agents have short lifetimes. Note that Cha automatically garbage collects useless norms. When the system changes, new norms emerge, and some old norms are unlearned (they lose utility and are no longer selected).

It is worth mentioning that Cha being a dynamic and decentralized framework, not only supports beginning from an initially empty set of norms but can also be used when considering an initial set of existing norms, on top of which the agents can then build new norms.

A potential critique of this work is that even though the agents are decentralized, there is a system-wide notion of conflict that leads to their behaviors being coordinated. Arguably, this setup accords with Elsenbroich and Verhagen’s [2016] idea of how functionality and complexity may shift between an agent and its context. However, the conflicting situations in our study are motivated by the domain of traffic and earn their legitimacy from that domain. In another domain, the actions and conflicts would be different. But the conflicts do not determine the outcomes and other approaches with a similar conflict structure can produce different results.

6.1 Related Work

It is worth highlighting how Cha differs from the recent literature on norm emergence. Note that we do not intend this article to be a survey of norms or normative multi-agent systems in general: see, for instance, Hollander and Wu [36], Nardin et al. [53], and Andrighetto et al. [9].

Static and Closed. Many of the existing game-theoretic approaches for social norm emergence consider agents in a closed system such as a graph structure, and fixed payoffs are considered for the agents’ behaviors [3, 13, 62, 73, 78]. In these types of studies, agents are nodes of a graph, and the interactions between them create the edges of the graph. Agents learn their policies by playing repeated games with random agents. Through the course of these repeated games, new norms emerge in the system. In contrast, Cha considers a system that is open (agents continually leave and enter), dynamic, and addresses prosociality.

Static and Centralized. Other works consider norm emergence in a setting where a central agent with complete knowledge of the systems generates (or dictates) the norms. Morales et al. [47] proposed a mechanism called IRON along with two other variations [48, 49] for the on-line synthesis of norms. IRON, however, could result in unstable normative systems. To mitigate the instability problem, Morales et al. [50] presented an offline method called “System for Evolutionary Norm SynthEsis” (SENSE), which builds on top of IRON. These mechanisms employ a central approach that relies on global knowledge. The central agent observes the interactions of members to synthesize conflict-free norms in a top-down manner. Even though these frameworks share some commonalities with Cha in terms of norm representation and norm generation (albeit static), they are fundamentally different from it. Whereas these mechanisms use a central norm generator and learner, Cha uses a dynamic and decentralized mechanism for its four phases, as described in Section 2. Additionally, Cha agents incorporate prosocial decision-making to achieve fairness. In this manner, Cha brings together two important themes in prosociality: decision-making by individuals and fairness of system-level outcomes.

Static and Decentralized. Hao et al. propose two strategies based on local exploration and global exploration to facilitate norm emergence in a bottom-up manner [33]. Whereas Hao et al.’s strategies focus only on maximizing average payoff for all agents, i.e., social welfare, Cha’s focus is on maximizing both social welfare and fairness. Mihaylov et al. [45] propose a decentralized approach for convention emergence in multi-agent systems for a pure coordination game with fixed payoffs. Once they have learned a convention, agents will not re-learn it (i.e., have a static environment).

Static and Hybrid. Silk [44] is the closest framework to Cha in the literature. Silk is a hybrid framework for the regulation of open normative systems. Briefly, this framework is composed of a central generator, which imposes hard integrity constraints that agents must follow, and recommends norms as soft constraints. Thus, a norm can emerge based on the decision of various agents to accept or deny the recommendations. In addition, Silk’s framework supports only static settings with fixed payoffs. Moreover, Silk tackles the problem of fairness through its central generator which continually monitors the performance and intervenes if necessary to improve fairness, whereas Cha agents learn to be prosocial.

Dynamic. Knobbout et al. [42] propose updating semantics for norm addition to characterize the dynamics of norm addition in a formal way. Updates are parameterized over actions. However, this work does not indicate whether its proposed model can be applied to regulate agent’s actions and satisfy system-level goals. Verhagen [76] puts it worth the need for flexible and decentralized system. The simulation in Verhagen [76] focuses on the spreading and internalization of norms in such systems. However, Verhagen assumes that a top-level entity (say, a normative advisor) is aware of the correct norm, and this group norm (g-norm) does not change. This g-norm is spread to the agents through the normative advice provided using a top-down approach by a centralized authority (i.e., there is a leader in the society). In contrast, Cha does not enforce a need for a central authority and the norms in Cha are not fixed. Savarimuthu et al. (59, 60) propose an architecture where agents can identify norms of the society in a bottom-up fashion. They address how an agent might be able to identify whether a norm is changing in society and how it might react to this situation. However, Cha is different from this work in that Cha is an on-line framework; that is, in Cha, norms are learned rather than inferred through data mining.

Prosocial. Recently there has been increasing interest in designing multi-agent systems that lead to prosocial behaviors in agents [5, 55, 57, 64]. Prosocial behavior is when an agent performs an action that is unfavorable to itself but benefits others. We consider the problem of agents prosocially acting while sharing resources in a multi-agent system. The connection between norms and prosociality, though conceptually important, has not received adequate attention: existing AI approaches on prosociality focus on individual decision-making by agents. In contrast, we relate social norms both to prosocial decision-making and to societal outcomes such as fairness. Specifically, the norms through which agents coordinate their interactions may or may not be fair.

Table 8 summarizes the difference between Cha and some leading frameworks addressing a similar setting.

Table 8.
ModelCreationIdentificationArchitectureOpennessDynamismProsocial
Chaon-lineRLDecentralizedYesYesYes
Morales et al. [47, 48, 49]on-lineCase-based ReasoningCentralizedNoNoNo
Mashayekhi et al. [44]on-lineRLHybridYesNoYes#
Sen and Airiau [62]on-lineRLDecentralizedNoNoNo
Airiau et al. [3]on-lineRLDecentralizedNoNoNo
Villatoro et al. [78]on-lineRLDecentralizedNoNoNo
Savarimuthu et al. (59, 60)on-lineData MiningDecentralizedYesYesNo
Beheshti et al. [13]on-lineGame Theory and RLDecentralizedNoNoNo
Frantz et al. [29]on-lineRLDecentralizedNoYesNo
Hao et al. [33]on-lineRLDecentralizedNoNoNo
  • \( ^{\#} \)Enforced by Central Generator.

  • (Yes – Considered; No – Not Considered; NA – Not Applicable; NS – Not Specified).

Table 8. Comparison of Characteristics of Simulation Works on Norms

  • \( ^{\#} \)Enforced by Central Generator.

  • (Yes – Considered; No – Not Considered; NA – Not Applicable; NS – Not Specified).

6.2 Prosociality in General

Applications of our approach on practical domains would be important to validate it in broader settings.

Once following certain conditions are met, Cha can be successfully applied to other settings besides the traffic one described here:

Agents have asymmetric interests. Cha assumes its population of users has asymmetric interests that may lead to conflicts.

Conflicts must be detectable. Norm generation in Cha is based on detection of the conflicts (e.g., car collision), so Cha assumes conflicts are detectable.

Some agents’ preferences can be assumed at the design time. Cha assumes some domain information such as payoffs are provided by the system designer (e.g., the payoff of car collision).

Agents can observe and communicate. Cha assumes its members are able to share their experience to agents of the same type. Additionally, agents of different types are able to receive and send accumulated costs in order to learn and act prosocially.

A natural family of application scenarios for Cha is the development of what Berners-Lee [14] calls social machines. We can think of social machines as socio-technical systems [17] geared to support interactions between people or people and businesses in reference to some shared resources or coordinated decision-making. Social machines today are largely not formulated to satisfy our desiderata but they could and should be. Today, there is little computational support for prosociality, flexibility, or dynamism and the underlying architecture being based on servers is decidedly centralized. Thus, these desiderata are left to the human participants to achieve informally. But we can envision a situation where decentralized agents assisted the humans and provided computational support for prosociality, flexibility, and dynamism.

Another domain of application for Cha is on-line communities, in which members continually interact by exchanging information (e.g., comments and pictures) in different forums. An on-line community is an open, dynamic system. It is open because the users can enter the system at any time. It is dynamic because users may also change their preferences and behaviors. The goal is to achieve “healthy” communities [35]. Specifically, the goal is to avoid circumstances in which a user expresses concern about the content involving that user that is shared by other users. Such content may include pictures that violate one user’s privacy but may be funny in the eyes of the posting user [28]. Prosociality would be demonstrated by the posting user taking into account the preferences of the people shown in a picture. Flexibility is indicated by the users not having rigid stances for their preferences, which can shift depending on the context. Endurance is indicated by the users interacting multiple times, sometimes over the course of years. Dynamism is indicated by the fact that norms for what is acceptable (e.g., in regards to sensitive information) may change depending on the cultural or political situation. Decentralization is indicated by the users deciding individually on norms without any computational limitations imposed by any sharing platform they may be using. The users could concede to each other and positively or negatively sanction each other.

The domain of Open-Source Software Development (OSSD) communities [6, 10] is particularly compelling because it brings together considerations of ethics along with inertia (projects can last years) and group norms. An OSSD community is open and dynamic. With a defined goal, one or more developers start the development of an open-source software project. Other developers who are interested in the project, contribute to the project’s goal by writing software code. Prosociality in OSSD would be demonstrated by the community of developers (and users outside of the developer community) identifying issues with the ongoing project and voluntarily contributing fixes to resolve the identified issues. Similar to in on-line communities, endurance in OSSD communities is indicated by the users interacting multiple times over the course of years. Flexibility is indicated by the developer community allowing users to submit feature requests and subsequently redefining the project’s goals. Dynamism is indicated by the fact that individual and group norms as well as the project’s goals may evolve over time as the community grows. Decentralization is indicated by the OSSD community developers individually deciding on the norms on how they interact with other developers and users, and how they sanction them.

Another example scenario where the ideas of Cha could apply is in shared transportation. Bardaka et al. [12] describe the setting of public microtransit (i.e., last-mile ride sharing) in which a public van service is used to help members of the public who cannot use a private car (e.g., because of its expense or their poor health). This setting demonstrates the features necessary to achieve the desiderata identified for Cha. Prosociality would be demonstrated by riders adjusting their requested ride’s origin, destination, or timing to accommodate the needs of other riders. Flexibility is indicated by the riders changing how they interact, e.g., in response to environmental events (is it raining) or their health (does one of them need to get to a dialysis appointment?). Endurance is indicated by the riders interacting with each other again and again (e.g., every weekday), sometimes over the course of years, because the people who need such assistance in a neighborhood are likely to be sharing rides more than once. Dynamism is indicated by the fact that norms for what is acceptable may change depending on the public health (e.g., in regards to an epidemic flaring up or dying down) or economic (e.g., during a recession, money is tight and schedules may be more rigid for workers) situation. Decentralization is indicated by the riders deciding individually on norms on how they interact. The users could help each other, thank each other for accommodating each other’s requests, or negatively sanction each other, e.g., through shunning each other’s company.

6.3 Outlook and Directions

An important future direction is to understand the context as a basis for overriding norms. For instance, can we allow vehicles with a sick passenger to Go when the norm suggests Stop? Such overrides may be facilitated by sharing context and explanations to justify the ethicality of deviating from a norm [2, 4].

Another direction is to include more complex norm actions, for example, (1) communicating an obligation to Stop to let an ambulance pass; and (2) if in a rush, delegate (to the vehicle behind) a norm to help a stranded vehicle. One way to model more complex norm actions is through sanctioning, which also corresponds to an additional (state, action) pair in the payoff matrix.

Understanding societal inertia in the sense of how new norms can supersede existing norms is important; Cha is dynamic but how to evaluate and how to reduce the societal friction and inertia in arriving at new norms remains to be studied.

Cha has a flavor of group norms since agents of the same type pass on their experiences with other agents of the same type. But each agent individually reasons whether to follow an applicable norm or not. Another natural line of future research is to explore the generation of group norms as opposed to individual norms [7, 8] and bring together notions of group intentions [24, 68] and group ability [67]. Group norms apply to groups of agents together. For example, in the world of autonomous cars, car agents sharing the same space can share information with each other and can decide actions and move together as a platoon of cars (i.e., creating a group norm). Group norms raise important challenges of how a group can allocate decision-making authority and accountability and whether the actions taken on behalf of a group satisfy ethical criteria both with respect to group members and with respect to outsiders.

REFERENCES

  1. [1] Adler Matthew D.. 2019. Measuring Social Welfare: An Introduction. Oxford University Press, New York.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Agrawal Rishabh, Ajmeri Nirav, and Singh Munindar P.. 2022. Socially intelligent genetic agents for the emergence of explicit norms. In Proceedings of the 31st International Joint Conference on Artificial Intelligence. IJCAI, Vienna, 17.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Airiau Stéphane, Sen Sandip, and Villatoro Daniel. 2014. Emergence of conventions through social learning. Journal of Autonomous Agents and Multi-Agent Systems 28, 5 (2014), 779804.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Ajmeri Nirav, Guo Hui, Murukannaiah Pradeep K., and Singh Munindar P.. 2018. Robust norm emergence by revealing and reasoning about context: Socially intelligent agents for enhancing privacy. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. IJCAI, Stockholm, 2834. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Ajmeri Nirav, Guo Hui, Murukannaiah Pradeep K., and Singh Munindar P.. 2020. Elessar: Ethics in norm-aware agents. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. IFAAMAS, Auckland, 1624. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Al-Amin Shams, Ajmeri Nirav, Du Hongying, Berglund Emily Z., and Singh Munindar P.. 2018. Toward effective adoption of secure software development practices. Simulation Modelling Practice and Theory 85(2018), 3346. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Aldewereld Huib, Dignum Virginia, and Vasconcelos Wamberto W.. 2016. Group norms for multi-agent organisations. ACM Transactions on Autonomous and Adaptive Systems 11, 2 (2016), 15:1–15:31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Alechina Natasha, Hoek Wiebe van der, and Logan Brian. 2017. Fair decomposition of group obligations. Journal of Logic and Computation 27, 7 (2017), 20432062. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Andrighetto Giulia, Castelfranchi Cristiano, Mayor Eunate, McBreen John, López-Sánchez Maite, and Parsons Simon. 2013. Social norm dynamics. In Proceedings of the Normative Multi-Agent Systems. Schloss Dagstuhl, Dagstuhl, Germany, 135170.Google ScholarGoogle Scholar
  10. [10] Avery Daniel, Dam Hoa Khanh, Savarimuthu Bastin Tony Roy, and Ghose Aditya K.. 2016. Externalization of software behavior by the mining of norms. In Proceedings of the 13th International Conference on Mining Software Repositories. ACM, Austin, Texas, 223234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Bab Avraham and Brafman Ronen I.. 2008. Multi-agent reinforcement learning in common interest and fixed sum stochastic games: An experimental study. Journal of Machine Learning Research 9, 88 (2008), 26352675. Retrieved from http://jmlr.org/papers/v9/bab08a.html.Google ScholarGoogle Scholar
  12. [12] Bardaka Eleni, Hajibabai Leila, and Singh Munindar P.. 2020. Reimagining ride sharing: Efficient, equitable, sustainable public microtransit. IEEE Internet Computing 24, 5 (2020), 3844. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Beheshti Rahmatollah, Ali Awrad Mohammed, and Sukthankar Gita Reese. 2015. Cognitive social learners: An architecture for modeling normative behavior. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. AAAI Press, Austin, 20172023.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Berners-Lee Tim. 1999. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web. Harper Business, New York.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Boella Guido and Torre Leendert W. N. van der. 2004. Regulative and constitutive norms in normative multiagent systems. In Proceedings of the 9th International Conference on Principles of Knowledge Representation and Reasoning. AAAI Press, Whistler, Canada, 255266.Google ScholarGoogle Scholar
  16. [16] Chaput Rémy, Duval Jérémy, Boissier Olivier, Guillermin Mathieu, and Hassas Salima. 2021. A multi-agent approach to combine reasoning and learning for an ethical behavior. In Proceedings of AAAI/ACM Conference on AI, Ethics, and Society. ACM, Virtual, 1323. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Chopra Amit K. and Singh Munindar P.. 2016. From social machines to social protocols: Software engineering foundations for sociotechnical systems. In Proceedings of the 25th International World Wide Web Conference. ACM, Montréal, 903914. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Chopra Amit K. and Singh Munindar P.. 2018. Sociotechnical systems and ethics in the large. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society. ACM, New Orleans, 4853. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Cohen Jacob. 1988. Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates, Hillsdale, New Jersey.Google ScholarGoogle Scholar
  20. [20] Croarkin Carroll and Tobias Paul. 2012. NIST/SEMATECH e-Handbook of Statistical Methods. NIST/SEMATECH, Gaithersburg, Maryland. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Dell’Anna Davide, Dastani Mehdi, and Dalpiaz Fabiano. 2019. Runtime revision of norms and sanctions based on agent preferences. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Montreal, 16091617.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Dell’Anna Davide, Dastani Mehdi, and Dalpiaz Fabiano. 2020. Runtime revision of sanctions in normative multi-agent systems. Autonomous Agents and Multi-Agent Systems 34, 2 (2020), 154. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Dovidio John F., Piliavin Jane Allyn, Schroeder David A., and Penner Louis A.. 2017. The Social Psychology of Prosocial Behavior. Psychology Press, New York.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Dunin-Kȩplicz Barbara and Verbrugge Rineke. 2010. Teamwork in Multi-Agent Systems: A Formal Approach. Wiley, Chichester, United Kingdom. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] El-Tantawy Samah, Abdulhai Baher, and Abdelgawad Hossam. 2013. Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): Methodology and large-scale application on downtown toronto. IEEE Transactions on Intelligent Transportation Systems 14, 3 (2013), 11401150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Elsenbroich Corinna and Verhagen Harko. 2016. The simplicity of complex agents: A contextual action framework for computational agents. Mind and Society 15, 1 (2016), 131143. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Fehr Ernst and Schmidt Klaus M.. 1999. A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics 114, 3 (1999), 817868.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Fogués Ricard López, Murukannaiah Pradeep K., Such Jose M., and Singh Munindar P.. 2017. Sharing policies in multiuser privacy scenarios: Incorporating context, preferences, and arguments in decision making. ACM Transactions on Computer-Human Interaction 24, 1 (2017), 29 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Frantz Christopher, Purvis Martin K., Nowostawski Mariusz, and Savarimuthu Bastin Tony. 2013. Modelling institutions using dynamic deontics. In Proceedings of the International Workshop on Coordination, Organizations, Institutions, and Norms in Agent Systems. Springer, St. Paul, 211233.Google ScholarGoogle Scholar
  30. [30] García-Camino Andrés, Rodríguez-Aguilar Juan A., Sierra Carles, and Vasconcelos Wamberto. 2009. Constraint rule-based programming of norms for electronic institutions. Autonomous Agents and Multi-Agent Systems 18, 1 (2009), 186217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Grissom Robert J. and Kim John J.. 2012. Effect Sizes for Research: Univariate and Multivariate Applications. Routledge, Abingdon-on-Thames.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Hao Jianye and Leung Ho-fung. 2016. Interactions in Multiagent Systems: Fairness, Social Optimality and Individual Rationality. Springer, Berlin.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Hao Jianye, Sun Jun, Chen Guangyong, Wang Zan, Yu Chao, and Ming Zhong. 2018. Efficient and robust emergence of norms through heuristic collective learning. ACM Transactions on Autonomous and Adaptive Systems 12, 4 (2018), 23:1–23:20. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Hilpinen Risto (Ed.). 1981. New Studies in Deontic Logic: Norms, Actions, and the Foundations of Ethics, Vol. 152. Synthese Library. Reidel, Dordrecht, The Netherlands.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Hinds David and Lee Ronald M.. 2011. Assessing the social network health of virtual communities. In Proceedings of the Virtual Communities: Concepts, Methodologies, Tools and Applications, Khosrow-Pour Mehdi (Ed.), Vol. 2. IGI Global, Hershey, Pennsylvania, Chapter 49, 715730. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Hollander Christopher D. and Wu Annie S.. 2011. The current state of normative agent-based systems. Journal of Artificial Societies and Social Simulation 14, 2, Article 6 (2011), 24.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Hollander Christopher D. and Wu Annie S.. 2011. Using the process of norm emergence to model consensus formation. In Proceedings of the 5th IEEE International Conference on Self-Adaptive and Self-Organizing Systems. IEEE, Ann Arbor, Michigan, 148157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Huang Xiaowei, Ruan Ji, Chen Quingliang, and Su Kaile. 2016. Normative multiagent systems: A dynamic generalization. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. IJCAI, New York, 11231129.Google ScholarGoogle Scholar
  39. [39] Kafalı Özgür, Ajmeri Nirav, and Singh Munindar P.. 2016. Revani: Revising and verifying normative specifications for privacy. IEEE Intelligent Systems 31, 5 (2016), 815. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Kafalı Özgür, Ajmeri Nirav, and Singh Munindar P.. 2017. Kont: Computing tradeoffs in normative multiagent systems. In Proceedings of the 31st Conference on Artificial Intelligence. AAAI, San Francisco, 30063012.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Kafalı Özgür, Ajmeri Nirav, and Singh Munindar P.. 2020. Desen: Specification of sociotechnical systems via patterns of regulation and control. ACM Transactions on Software Engineering and Methodology 29, 1 (2020), 7:1–7:50. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Knobbout Max, Dastani Mehdi, and Meyer John-Jules Ch.. 2014. Reasoning about dynamic normative systems. In Proceedings of the 14th European Conference on Logics in Artificial Intelligence. Springer, Funchal, Madeira, Portugal, 628636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Lorini Emiliano and Mühlenbernd Roland. 2015. The long-term benefits of following fairness norms: A game-theoretic analysis. In Proceedings of the International Conference on Principles and Practice of Multi-Agent Systems. Springer, Bertinoro, 301318.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Mashayekhi Mehdi, Du Hongying, List George F., and Singh Munindar P.. 2016. Silk: A simulation study of regulating open normative multiagent systems. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. IJCAI, New York, 373379. Retrieved from https://www.ijcai.org/Abstract/16/060.Google ScholarGoogle Scholar
  45. [45] Mihaylov Mihail, Tuyls Karl, and Nowé Ann. 2014. A decentralized approach for convention emergence in multi-agent systems. Journal of Autonomous Agents and Multi-Agent Systems 28, 5 (2014), 749778. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Miralles Jordi Campos, López-Sánchez Maite, Salamó Maria, Avila Pedro, and Rodríguez-Aguilar Juan A.. 2013. Robust regulation adaptation in multi-agent systems. ACM Transactions on Autonomous and Adaptive Systems 8, 3 (2013), 13:1–13:27. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Morales Javier, López-Sánchez Maite, Rodríguez-Aguilar Juan A., Vasconcelos Wamberto, and Wooldridge Michael. 2015. Online automated synthesis of compact normative systems. ACM Transactions on Autonomous and Adaptive Systems 10, 1 (2015), 2:1–2:33. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Morales Javier, López-Sánchez Maite, Rodríguez-Aguilar Juan A., Wooldridge Michael, and Vasconcelos Wamberto Weber. 2014. Minimality and simplicity in the on-line automated synthesis of normative systems. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Paris, 109116.Google ScholarGoogle Scholar
  49. [49] Morales Javier, López-Sánchez Maite, Rodríguez-Aguilar Juan A., Wooldridge Michael, and Vasconcelos Wamberto Weber. 2015. Synthesising liberal normative systems. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Istanbul, 433441.Google ScholarGoogle Scholar
  50. [50] Morales Javier, Wooldridge Michael, Rodríguez-Aguilar Juan A., and López-Sánchez Maite. 2018. Off-line synthesis of evolutionarily stable normative systems. Journal of Autonomous Agents and Multi-Agent Systems 32, 5 (2018), 635671.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Morris-Martin Andreasa, Vos Marina De, and Padget Julian. 2019. Norm emergence in multiagent systems: A viewpoint paper. Autonomous Agents and Multi-Agent Systems 33, 6 (2019), 706749.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Murukannaiah Pradeep K., Ajmeri Nirav, Jonker Catholijn M., and Singh Munindar P.. 2020. New foundations of ethical multiagent systems. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. IFAAMAS, Auckland, 17061710. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Nardin Luis G., Balke-Visser Tina, Ajmeri Nirav, Kalia Anup K., Sichman Jaime S., and Singh Munindar P.. 2016. Classifying sanctions and designing a conceptual sanctioning process model for socio-technical systems. The Knowledge Engineering Review 31, 2 (2016), 142166. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] North Michael J., Collier Nicholson T., Ozik Jonathan, Tatara Eric R., Macal Charles M., Bragen Mark, and Sydelko Pam. 2013. Complex adaptive systems modeling with Repast Simphony. Complex Adaptive Systems Modeling 1, 1 (2013), 126.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Paiva Ana, Santos Fernando P., and Santos Francisco C.. 2018. Engineering pro-sociality with autonomous agents. In Proceedings of the 32th AAAI Conference on Artificial Intelligence. AAAI, New Orleans, 79947999.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Rawls John. 1999. A Theory of Justice (2nd ed.). Harvard University Press, Cambridge, Massachusetts.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Santos Fernando P., Pacheco Jorge M., Paiva Ana, and Santos Francisco C.. 2019. Evolution of collective fairness in hybrid populations of humans and agents. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI, Honolulu, 61466153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Savarimuthu Bastin Tony Roy and Cranefield Stephen. 2011. Norm creation, spreading and emergence: A survey of simulation models of norms in multiagent systems. Multiagent Grid Systems 7, 1 (2011), 2154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Savarimuthu Bastin Tony Roy, Cranefield Stephen, and Purvis M.. 2009. Norm emergence in agent societies formed by dynamically changing networks. Web Intelligence and Agent Systems: An International Journal 7, 3 (2009), 223232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Savarimuthu Bastin Tony Roy, Cranefield Stephen, Purvis Maryam, and Purvis Martin K.. 2010. Obligation norm identification in agent societies. Journal of Artificial Societies and Social Simulation 13, 4 (2010), 19 pages. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Sen Onkur and Sen Sandip. 2010. Effects of social network topology and options on norm emergence. In Proceedings of the Coordination, Organizations, Institutions and Norms in Agent Systems V. Springer, Berlin, 211222.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Sen Sandip and Airiau Stéphane. 2007. Emergence of norms through social learning. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., Hyderabad, 15071512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Sen Sandip, Rahaman Zenefa, Crawford Chad, and Yücel Osman. 2018. Agents for social (media) change. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Stockholm, 11981202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Serramia Marc, López-Sánchez Maite, Rodríguez-Aguilar Juan A., Rodriguez Manel, Wooldridge Michael, Morales Javier, and Ansotegui Carlos. 2018. Moral values in norm decision making. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Stockholm, 12941302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Shaw Alex and Olson Kristina. 2014. Fairness as partiality aversion: The development of procedural justice. Journal of Experimental Child Psychology 119 (2014), 4053.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Simpson Brent T. and Willer Robb. 2015. Beyond altruism: Sociological foundations of cooperation and prosocial behavior. Annual Review of Sociology 41 (2015), 4363.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Singh Munindar P.. 1991. Group ability and structure. In Proceedings of the Decentralized Artificial Intelligence, Volume 2Proceedings of the2nd European Workshop on Modeling Autonomous Agents in a Multi Agent World.Demazeau Yves and Müller Jean-Pierre (Eds.), Elsevier/North-Holland, Amsterdam, 127145.Google ScholarGoogle Scholar
  68. [68] Singh Munindar P.. 1998. The intentions of teams: Team structure, endodeixis, and exodeixis. In Proceedings of the 13th European Conference on Artificial Intelligence. John Wiley, Brighton, United Kingdom, 303307.Google ScholarGoogle Scholar
  69. [69] Singh Munindar P.. 2013. Norms as a basis for governing sociotechnical systems. ACM Transactions on Intelligent Systems and Technology 5, 1(2013), 23 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Singh Munindar P., Arrott Matthew, Balke Tina, Chopra Amit K., Christiaanse Rob, Cranefield Stephen, Dignum Frank, Eynard Davide, Farcas Emilia, Fornara Nicoletta, Gandon Fabien, Governatori Guido, Dam Hoa Khanh, Hulstijn Joris, Krüger Ingolf, Lam Ho-Pun, Meisinger Michael, Noriega Pablo, Savarimuthu Bastin Tony Roy, Tadanki Kartik, Verhagen Harko, and Villata Serena. 2013. The uses of norms. In Proceedings of the Normative Multi-Agent Systems, Andrighetto Giulia, Governatori Guido, Noriega Pablo, and Torre Leendert W. N. van der (Eds.), Number 4 in Dagstuhl Follow-Ups. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Wadern, Germany, Chapter 7, 191229. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Singh Satinder, Jaakkola Tommi, Littman Michael L., and Szepesvári Csaba. 2000. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38 (2000), 287308. .Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Staab Eugen, Fusenig Volker, and Engel Thomas. 2008. Trust-aided acquisition of unverifiable information. In Proceedings of the 18th European Conference on Artificial Intelligence. IOS Press, Patras, 869870.Google ScholarGoogle Scholar
  73. [73] Sugawara Toshiharu. 2014. Emergence of conventions in conflict situations in complex agent network environments. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Paris, 14591460.Google ScholarGoogle Scholar
  74. [74] Sutton Richard S. and Barto Andrew G.. 2018. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, Massachusetts.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. [75] Urbanik Tom, Tanaka Alison, Lozner Bailey, Lindstrom Eric, Lee Kevin, Quayle Shaun, Beaird Scott, Tsoi Shing, Ryus Paul, Gettman Doug, Sunkari Srinivasa, Balke Kevin, and Bullock Darcy. 2015. Signal Timing Manual – 2nd Edition. NCHRP Report 812. National Cooperative Highway Research Program, Transportation Research Board, Washington, DC. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Verhagen Harko. 2001. Simulation of the learning of norms. Social Science Computer Review 19, 3 (2001), 296306.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Verhagen Harko, Neumann Martin, and Singh Munindar P.. 2018. Normative multiagent systems: Foundations and history. In Proceedings of the Handbook of Normative Multiagent Systems, Chopra Amit, Torre Leendert van der, Verhagen Harko, and Villata Serena (Eds.). College Publications, London, Chapter 1, 325. Retrieved from http://www.collegepublications.co.uk/downloads/handbooks00004.pdf.Google ScholarGoogle Scholar
  78. [78] Villatoro Daniel, Sabater-Mir Jordi, and Sen Sandip. 2011. Social instruments for robust convention emergence. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence. AAAI Press, Barcelona, Catalonia, Spain, 420425.Google ScholarGoogle Scholar
  79. [79] Winikoff Michael and Sardelić Julija. 2021. Artificial intelligence and the right to explanation as a human right. IEEE Internet Computing 25, 2 (2021), 108112. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Woodgate Jessica and Ajmeri Nirav. 2022. Macro ethics for governing equitable sociotechnical systems. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Online, 18241828. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Yu Chao, Zhang Minjie, Ren Fenghui, and Luo Xudong. 2013. Emergence of social norms through collective learning in networked agent societies. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Saint Paul, 475482.Google ScholarGoogle Scholar

Index Terms

  1. Prosocial Norm Emergence in Multi-agent Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Autonomous and Adaptive Systems
        ACM Transactions on Autonomous and Adaptive Systems  Volume 17, Issue 1-2
        June 2022
        128 pages
        ISSN:1556-4665
        EISSN:1556-4703
        DOI:10.1145/3543994
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 September 2022
        • Online AM: 6 June 2022
        • Accepted: 1 May 2022
        • Revised: 1 March 2022
        • Received: 1 July 2020
        Published in taas Volume 17, Issue 1-2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!