Abstract
The concept of conventions has attracted much attention in the multi-agent system research. In this article, we study the emergence of conventions from repeated n-player coordination games. Distributed agents learn their policies independently and are capable of observing their neighbours in a network topology. We distinguish two types of information representation about the observations: gist trace and verbatim trace. We conjecture that learning based on the gist trace, which overlooks the details and focuses only on the general choice of action of a neighbourhood, should achieve efficient convention emergence. To this end, a novel learning method that makes use of the gist trace is proposed. The experimental results confirm that the proposed method establishes conventions much faster than the state-of-the-art learning methods across diverse settings of multi-agent systems. In particular, the use of gist trace derived at a low level of abstraction further improves the efficiency of convention emergence.
1 INTRODUCTION
Conventions (or social norms) have shaped every aspect of our daily life. They are the solutions to coordination problems, which, over time, turns normative [30, 31]. For instance, by economic conventions, people have consensus on the values of goods and thus are able to exchange them on bartering systems [31]. For another instance, by traffic conventions, drivers in the same country drive on the same side of the roads to avoid collisions [68]. In human society, conforming to conventions simplifies people’s decision-making processes, reduces social conflicts, and coordinates people’s behaviours efficiently [9, 30, 37].
A multi-agent system concerns a set of interacting agents each of which is rational and autonomous [66]. The study of coordination and cooperation among agents has been at the heart of the multi-agent system research [28] since its inception [12, 29]. Conventions, which may be a potential solution to coordination problems in multi-agent systems, has thus attracted much attention [1, 19, 26, 33, 42, 48, 49, 56]. On the one hand, the prescriptive view usually assumes that conventions a priori exist in multi-agent systems [49] and focuses on the identification, enforcement, and synthesis of conventions [5, 11, 40, 52, 55, 58]. On the other hand, the emergent view addresses how conventions naturally emerge from and sustain through the self-organisation of distributed agents [16, 21, 24, 34, 36, 51, 71].
In the seminal works [54, 68], convention emergence is defined in a game-theoretic framework, such that a convention is a social law that restricts a system of agents to choose a particular action in specific games. We note that prior works primarily focus on two-player (or bilateral) coordination games—agents are paired up to coordinate in the process of convention emergence [13, 25, 41, 54, 57, 59, 62, 68]. However, \(n\)-player coordination games have not been adequately investigated despite that they are common in the real world. Consider a group of wireless devices coordinating to use the same channel for ad hoc communication [38]; and a team of agents deciding a common vocabulary of a subject, which allows them to communicate without ambiguity. In these scenarios, multilateral coordination is needed, because the outcome depends on the joint coordination of all of the agents. Such multilateral coordination in general cannot be reduced to a sequence of bilateral coordination. Imagine that after coordination between a pair of agents is achieved, such coordination might become useless, or it might even become an obstacle, when one of these two agents tries to achieve coordination with a third agent. Hence, in the scenarios requiring multilateral coordination, it is more efficient for conventions to emerge directly from agents’ playing \(n\)-player coordination games if it is possible.1
The fact that agents interact in the process of convention emergence leads to a highly dynamic and stochastic environment. In such an environment, it is non-trivial to define appropriate behaviours for each individual agent in advance, and thus learning becomes crucial [4]. Previous approaches [17, 51, 57, 62, 63, 71] address this issue by equipping individual agents with reinforcement learning techniques. As such, agents gradually converge to take the same action—which is the convention of the system—based on the experience, since taking the same action as other agents results in the highest cumulative payoff for each agent. Recent advances [17, 41, 59, 62, 71] enhance the learning capability of agents so as to improve the efficiency of convention emergence by allowing agents to observe some other agents, which are typically the neighbours of each agent in a network topology. For example, Yu, Zhang, Ren, and Luo [71] propose a two-stage learning process in which each agent first learns a best response toward each of its neighbours based on its observation on their actions and then combines the best responses into its final choice of actions. As another example, Wang, Lu, Hao, Wei, and Leung [62] assume that an individual agent can observe how each of its neighbours evaluates different choices on actions and that the agent utilises such information to refine its own evaluation. However, do the ways of utilising observations in these learning processes resemble how people memorise and reason about the information (considering that human-beings are arguably the most efficient learners)?
Fuzzy trace theory [7, 45, 46] is a well-established theory of cognition. According to this theory, people in parallel form two types of mental representation for information: verbatim trace and gist trace. The verbatim trace people form is the exact and detailed information in their experience, which preserves much richer details of information and better supports precise analysis. In contrast, the gist trace is the fuzzy representation of information that contains only the essence (e.g., its bottom-line meaning) and preserves no redundant detail. Nonetheless, research shows that people tend to think, reason, and remember using gist traces instead of verbatim traces [7], as people consider gist traces to be the more reliable basis for generating responses for these tasks [45, 46]. Moreover, the outcomes of such gist trace-based judgement, reasoning and decision-making look more natural and more easily comprehensible by humans.
Inspired by this theory, we are interested in investigating the effects of using gist trace on convention emergence from agents’ playing \(n\)-player coordination games in multi-agent systems. We distinguish gist trace and verbatim trace for agents in a networked multi-agent system, assuming that agents are capable of observing their neighbours. We define the gist trace of an individual agent to be the perceived proportion of neighbours taking each action. Intuitively, gist trace reflects the general choice on action within a neighbourhood rather than the specific choice of actions or the precise evaluation on actions of each neighbour. As such, gist trace captures the essence of an agent’s experience in \(n\)-player coordination games—the multilateral coordination is achieved if and only if the agent takes the exact same action as the entire neighbourhood, not as one specific neighbour. Hence, using gist traces in decision-making will largely avoid the effects of transient instabilities and irregularities that present in verbatim information of an agent’s experience.
To this end, we propose a gist trace-based \(Q\)-learning method for individual agents. Over time, an agent maintains a \(Q\)-value of each action, which largely depends on its current gist trace. Modelling the concept of level of abstraction in the fuzzy trace theory, the gist trace can be derived at various levels—a higher level means that fewer details of experience are captured and agents rely their learning on fuzzier representation of experience. In our experiments, we compare gist trace-based \(Q\)-learning with the baseline and other learning approaches that make use of verbatim information. It is shown that gist trace-based \(Q\)-learning always achieves much more efficient convention emergence from \(n\)-player coordination games under various experimental settings, while the other learning approaches may not. In particular, as the number of players involved in coordination games increases, the outperformance of gist trace-based \(Q\)-learning becomes more significant. Moreover, we find that compared with gist trace derived at a low level, the gist trace derived at a moderately higher level of abstraction can further improve the efficiency of gist trace-based \(Q\)-learning, and accelerate convention emergence.
The remainder of the article is organised as follows. Section 2 reviews the related work. Section 3 presents the convention emergence framework that we focus on. Section 4 formally defines gist trace and presents the gist trace-based \(Q\)-learning. Section 5 presents our experiments. Finally, Section 6 concludes the article with some directions for future work.
2 RELATED WORK
There exist generally two classes of approaches to investigate convention emergence in multi-agent systems. In the typical spreading-based approaches [13, 14, 18, 27, 38, 44, 47, 50, 53], each agent propagates its own choice of actions rigorously complying with some well-designed spreading strategies to prompt other agents to adopt the same choice of actions as itself. Traditionally, many works focus on imitation [13, 27, 38, 53, 61]; agents imitate the action of one another and, over time, converge to the same choice of actions. More recently, to tackle the scenarios that involve a large number of choices on actions, Salazar, Rodriguez-Aguilar, and Arcos [48] incorporate the principles of evolutionary algorithms; Franks, Griffiths, and Jhumka [16] insert a number of influencer agents with specific convention seeds; and Hasan, Raja, and Bazzan [18] make use of a network rewiring mechanism. We note that the aforementioned spreading-based approaches mostly exhibit superior performance under the specific scenarios that they design for. However, carefully designing the strategy for each individual agent beforehand is non-trivial and may be infeasible for an open and dynamic environment.
The learning-based approaches [17, 24, 25, 41, 51, 60, 62, 67, 70, 71], however, emphasise that agents should be able to establish a convention from the repeated local interactions. Shoham and Tennenholtz [54] in the multi-agent system literature, and Young [68] in the economics literature independently propose a game-theoretic framework for convention emergence around the same time. The local interactions between agents are framed as two-player normal form games, and a convention is a social law that restricts agents in the system to choose a particular action in the repeated games. By a widely adopted convention emergence metric proposed by Kittock [27], a convention is said to have emerged in a system if the actions of at least 90% of the agents are compliant to the convention.
Sen and Airiau [51] propose social learning interaction protocol for a well-mixed population of learning agents. At each timestep, individual agents are randomly paired up to play a two-player normal form game. Based on the immediate payoff an agent receives, each agent updates its policy for the next timestep independently. They report that with social learning, populations of \(Q\)-learning agents [64] are the quickest to establish a convention, followed by populations that use a Win-or-Learn-Fast (WoLF) Policy Hill Climbing [6] and populations that use Fictitious Plays [8]. Based on social learning, some works study the effects of the underlying network topology on convention emergence [24, 35, 41, 59, 60]. In a networked multi-agent system, agents can interact only with their neighbours in a network and have no interaction with faraway agents. It is found that a convention generally emerges in various kinds of networks including those that exhibit small-world and scale-free properties, except few notable exceptions [24, 59].
The speed (or efficiency) of convention emergence has been one of the major concerns since the start of this line of research [13, 53]. To accelerate convention emergence in learning-based approaches, many recent works [17, 62, 63, 67, 70, 71] have modified \(Q\)-learning [64] in various ways to utilise an agent’s observation on its neighbours. Hao and Leung [17] extend the concept of joint action learner [10] to convention emergence under social learning. For an individual agent, instead of learning the \(Q\)-values of its own actions, the agent learns the \(Q\)-values of the joint actions of itself and its opponents. It is shown that populations of joint action learners establish conventions faster than populations of \(Q\)-learners. Yu, Zhang, Ren, and Luo [71] present collective learning, with which an agent performs decision-making in two stages. First, an agent speculates a best response for each of its neighbours. Second, all of these best response are aggregated into a final choice of actions, which is actually used in playing the games. After receiving an immediate payoff, the agent updates the best responses accordingly. They show that the collective learning establishes conventions faster than the social learning in networked multi-agent systems even if different specific algorithms (e.g., \(Q\)-learning and WoLF Policy Hill Climbing) are used for learning the best responses. This renders the collective learning particularly suitable for achieving convention emergence from two-player coordination games (with a small set of available actions) in networked multi-agent systems. Wang, Lu, Hao, Wei, and Leung [62] extend double \(Q\)-learning [20] to convention emergence and propose multiple-\(Q\) learning to improve the estimation of \(Q\)-values. Multiple-\(Q\) learning replaces the estimated maximum future value in \(Q\)-function with the weighted average of the corresponding \(Q\)-values of the neighbours. The efficiency of multiple-\(Q\) learning is evaluated under the setting of language coordination games that feature a large set of available actions. It is shown that multiple-\(Q\) learning outperforms the state-of-the-art spreading-based approaches [16, 18, 48], suggesting its strength of establishing conventions in multi-agent systems with a large action space.
Yu, Chen, Lv, Ren, Ge, and Sun [69] implement each agent as a neural system, in which the joint actions of an agent and its neighbours are encoded in the input spike train of the neural network, such that learning takes place in the synapses in terms of changing its firing rate. While conventions may emerge, the emergence speed is not comparable to other learning-based approaches.
In general, previous learning-based approaches propose different ways to leverage an agent’s observations on its neighbours. However, gist trace, a fuzzy representation of experience preferred by human in decision-making, is a concept that has not been investigated in this context. Intuitively, gist trace captures the essence of information and may largely circumvent the transient instabilities and irregularities present in verbatim trace. This is particularly true in the setting requiring agents to have multilateral interactions, such as their playing \(n\)-player coordination games, in which coordination is fully achieved when all \(n\) agents choose to play exactly the same action. We note that such a setting has not been considered explicitly in the approaches previously reported in the literature, in which interaction models are typically experimented under the setting of bilateral coordination. We conjectured that the use of gist traces, instead of verbatim traces, in \(Q\)-learning could improve the emergence speed of conventions. Such a conjecture is confirmed by the research presented in this article. In fact, the use of gist traces is found to generally significantly speed up convention emergence, particularly in the setting of multilateral agent interaction modelled as \(n\)-player coordination games.
It is worth mentioning that to achieve coordination and reduce social conflicts in multi-agent systems, convention emergence is not the only solution. For example, negotiation is another solution [15, 22, 43]. Typically, a negotiation proceeds by agents’ offers of proposal and ends when a proposal is accepted by all agents or the deadline is reached. During a negotiation, agents consider the received proposals, decide whether to accept received proposals, and try to come up with counterproposals. Therefore, although negotiations help to reduce social conflicts in multi-agent systems, the research on negotiation is different from the convention emergence problem considered here in terms of (i) the interaction protocols among agents and (ii) the decision-making processes of individual agents. For comprehensive reviews on the negotiation research, we refer readers to References [2, 32].
3 CONVENTION EMERGENCE FRAMEWORK
In this section, following the game-theoretic framework of convention emergence [54, 68], we present an interaction protocol for \(n\)-player coordination games in networked multi-agent systems in Section 3.1. In Section 3.2, we formally describe the \(n\)-player coordination games.
3.1 A Multilateral Interaction Protocol in Networked MASs
Consider a networked multi-agent system \(G\), which is represented by an undirected graph \(G=\langle N, E \rangle\). \(N\) is the set of agents. \(E\subset N \times N\) is the set of edges, each \(e_{i,j} \in E\) connecting any two agents \(i\) and \(j\). The network topology represents some constraints2 on the interactions within this system—an agent can interact only with its neighbours. For any agent \(i \in N\), we denote the set of its neighbours by \(N_i =\lbrace j \in N | e_{i,j} \in E\rbrace\). Agent \(i\)’s neighbourhood, which consists of all of its neighbours and itself, is thus denoted by \(N_i \cup \lbrace i\rbrace\).
In this article, we focus on a scenario, in which agents pursue multilateral coordination with their neighbours. While multilateral coordination is a common scenario, we note that in general a multilateral coordination of \(n\) agents cannot be achieved by a sequence of bilateral coordination between pairs of these agents, because the coordination an agent achieves with one neighbour may easily fail when it comes to another neighbour. Hence, it is more efficient for convention to emerge if it is possible to directly learn over \(n\)-player coordination games, which involves all agents in an agent’s neighbourhood at the same time.
The pseudocode of the multilateral multi-agent coordination is presented in Algorithm 1. For every timestep, each agent first chooses an action based on its current policy. Then, each agent initiates to play an \(n_i\)-player coordination game within its neighbourhood, where \(n_i=\vert N_i \vert +1\) is the size of its neighbourhood. Note that for every timestep, an agent not only participates in the game it initiates but also the games initiated by each of its neighbours. In every game it plays in a timestep, it sticks to the action it chooses for that timestep. Each agent then receives an immediate payoff, which is averaged over all payoffs received from playing the games at that timestep. Finally, each agent performs learning and updates its policy.

3.2 \(n\)-Player Coordination Games
Multilateral coordination game describes the situations in which the only goal is to reach consensus within a set of three or more players. This can be formally modelled as an \(n\)-player coordination game, where \(n\ge 3\), in which players are rewarded if and only if all of the \(n\) players simultaneously take the same action. However, such ideal case of full convergence to the same action is practically infeasible at the early stage, if agents have no prior knowledge about the games, because it is likely that they take actions largely randomly by receiving no reward at all before full coordination is achieved, probably by chance. Therefore, in order that full coordination can emerge, here we assume that if a significant portion of these \(n\) players simultaneously take the same action, then these coordinating players will be rewarded, though the reward will be moderate. Formally, the \(n\)-player coordination games we consider is defined as follows.
(\(n\)-player Coordination Game)
An \(n\)-player coordination game is a tuple \(\langle P, A, (r_i)_{i\in P}\rangle\), where \(P\) is the set of \(n\) players; \(A\) is the set of available actions for each player; \(r_i\) is the payoff function of each player \(i\in P\), such that \(r_i(a_i,\mathbf {a}_{-i})\) is agent \(i\)’s payoff of the joint action profile \(\mathbf {a}=(a_i,\mathbf {a}_{- i})\). The value of \(r_i(a_i,\mathbf {a}_{- i})\) is given by the following: \[\begin{equation*} r_i(a_i,\mathbf {a}_{- i})= {\left\lbrace \begin{array}{ll} 1, & { \frac{1}{n-1} \sum _{j\in P\setminus \lbrace i\rbrace }{ \mathbb {1}(a_i,a_j)} =1} \\ \gamma , & {\left. \frac{1}{n-1} \sum _{j\in P \setminus \lbrace i\rbrace }{\mathbb {1}(a_i,a_j)} \in \left[\frac{1+\frac{1}{|A|}}{2}, 1\right)\right.}\\ - 1, & {\left. \frac{1}{n-1} \sum _{j\in P\setminus \lbrace i\rbrace }{\mathbb {1}(a_i,a_j)} \in \left[0, \frac{1+\frac{1}{|A|}}{2}\right)\right.}\\ \end{array}\right.}, \end{equation*}\] where \(\gamma \in (0,1)\) is the moderate reward, \((1+\frac{1}{|A|})/2\) is the threshold for receiving the moderate reward, and \(\mathbb {1}\) is an indicator function such that \(\mathbb {1}(a_i,a_j)=1\) if \(a_i=a_j\) or \(\mathbb {1}(a_i,a_j)=0\) otherwise.
Note that there is no inherent superiority for a particular action. A player’s payoff depends only on how well it coordinates with the other \(n-1\) players on the choice of actions. Therefore, for any agent, it is rational to choose the same action as the majority choice, because this will yield the highest payoff.
As the number \(n\) of players increases, intuitively, it becomes more challenging for a player to coordinate with all of the others. Consequently, it becomes more difficult for a convention to emerge.
4 GIST TRACE-BASED LEARNERS
In this section, we formalise an agent’s gist trace about its observations in networked multi-agent systems and present a learning approach that makes use of gist trace.
4.1 Gist Trace of Observations in Networked MASs
We assume that agents are capable of observing their neighbours. Remember that an agent’s payoffs depend on the joint actions of itself and the other neighbours in an \(n\)-player coordination game. Hence, what is essential to the agent’s reasoning and decision-making in the observed information should be the general choice of actions in the agent’s neighbourhood. Such a fuzzy representation of observation is known as the gist trace in the Fuzzy Trace Theory [7, 45, 46].3
The idea of gist traces can be illustrated by an example. Consider that given a box of seven red balls and three blue balls, people are asked to predict whether a random draw from the box will return a red or blue ball. The verbatim trace people form about the box for reasoning is the exact numbers of red and blue balls, namely seven red balls and three blue balls. In contrast, the gist trace used by people in reasoning may be that “there are more red balls and less blue balls.” At first glance, the gist trace seems to suffer a loss of exactness. However, the gist trace has indeed preserved the essential information required for decision-making—the number of red balls is larger than the number of blue balls, for which people can readily predict that the random draw from the box should return the red ball.
According to the fuzzy trace theory, people may form gist trace with various levels of abstraction. Compared with the gist trace that “there are more red balls and less blue balls,” a gist trace with a higher level of abstraction can be that “there are mostly red balls but almost no blue balls.” It is important to note that a higher level of abstraction causes greater loss of exactness though, the essential information (the number of red balls is larger than the number of blue balls) is preserved, and it indeed stands out more. As a counter-example, “there are red balls and blue balls” is even more inexact and fuzzier; however, it is not a gist trace, because the essential information that discriminates between red balls and blue balls no longer exists.
Obviously, the verbatim trace preserves much richer details of information and better supports precise analysis. Brainerd and Rayna [7] point out that for people “there is a pervasive inclination to think, reason, and remember by processing fuzzy rather than verbatim traces.” The most important reason that accounts for such preference is that, by preserving only the essential information, gist traces are more stable in long-term memory and thus a more reliable basis for generating responses [45, 46].
Consider a particular agent. Let \(p^{t}_j\) denote the actual proportion of the agent’s neighbours using action \(a_j\) at time \(t\). Intuitively, a larger value of \(p^{t}_j\) indicates that action \(a_j\) is more widely used by its neighbours and is thus more popular among its neighbours. We denote the level of abstraction by \(\theta\), the value of which ranges from 0 to 1. Agents with a larger value of \(\theta\) perceive the widely used actions to be even more popular but the less widely used actions to be even less popular. Formally, we define the gist trace for an agent as a vector of its perceived degrees of popularity of various actions.
(Gist Trace).
Given a set \(A=\lbrace a_1,\dots ,a_m\rbrace\) of \(m\) actions and the level \(\theta\) of abstraction of gist trace. At time \(t\), the gist trace \(\mathbf {x}^{t}\) of an agent is a vector of its perceived popularity of each action, i.e., \(\mathbf {x}^{t}=[x_1^{t} \ \ldots \ x_m^{t}]^\intercal\), such that \(x_j^{t}\ge 0, \forall j: a_j\in A\) and \(\sum _{j:a_j\in A}x_j^{t}=1\). The perceived popularity \(x_j^{t}\) of any action \(a_j\in A\) is given by \[ x_j^{t}=F(p_j^{t}, \mathbf {p}^{t}_{- j} , \theta), \] where \(p_j^{t}\) is the proportion of agent \(i\)’s neighbours using action \(a_j\) at time \(t\), and function \(F\) maps the actual proportion of using each action in the neighbourhood and the level of abstraction to the perceived popularity of each action.
The function \(F\), which preserves and highlights the information of essence, will satisfy the following condition.
For any agent with a level \(\theta\) of abstraction, given any two actions \(a_j\) and \(a_k\), if \(p^{t}_j\gt p^{t}_k\), then \(F(p^{t}_j, \mathbf {p}^{t}_{- j} , \theta) \gt F(p^{t}_k, \mathbf {p}^{t}_{- k} , \theta).\) If \(\theta \gt \tilde{\theta }\), then \(F(p^{t}_j, \mathbf {p}^{t}_{- j} , \theta) - F(p^{t}_k, \mathbf {p}^{t}_{- k} , \theta) \gt F(p^{t}_j, \mathbf {p}^{t}_{- j} , \tilde{\theta }) - F(p^{t}_k, \mathbf {p}^{t}_{- k} , \tilde{\theta }).\)
This assumption captures the idea that, with a higher level of abstraction, the more widely used action in the neighbourhood will be perceived to be more popular than the less widely used one, and the difference in perceived popularity between any two actions will be amplified. In general, any function that satisfies this assumption can be used as the function \(F\). One possible form, which we adopt in this article, is shown as follows: (1) \[\begin{equation} x_j^{t}= F(p^{t}_j, \mathbf {p}^{t}_{- j} , \theta) = {\frac{{(p^{t}_j)}^{\frac{1}{1-\theta }}}{\sum _{\forall k: a_k\in A} {{(p^{t}_k)}^{\frac{1}{1-\theta }}}}}. \end{equation}\] By this equation, the perceived popularity \(x_j^{t}\) of an action \(a_j\) is monotonically increasing in its actual proportion \(p_j^t\) of usage in the neighbourhood. When the value of \(\theta\) equals 0 (i.e., without any abstraction), the perceived popularity of an action is exactly the actual proportion of usage among the neighbours, i.e., \(x_j^{t}=p_j^{t}, \forall j: a_j \in A\). However, when the value of \(\theta\) is positive (i.e., with a level of abstraction), the perceived popularity of an action is different than its actual frequency of usage in the neighbourhood. For a relatively widely used action (with a large value of \(p_j^t\)), the perceived popularity \(x_j^t\) will become even higher, i.e., \(x_j^t\gt p_j^t\). On the contrary, for a less widely used action (with a small value of \(p_j^t\)), the perceived popularity \(x_j^t\) will become even lower, i.e., \(x_j^t\lt p_j^t\). Moreover, as the value of \(\theta\) increases (i.e., with a higher level of abstraction), the difference between the perceived popularity \(x_j^t\) of an action and its actual frequency \(p_j^t\) of usage will be amplified.
4.2 Gist Trace-based Q-Learning: A Learning-to-Learn-Better Approach

We propose in this section a novel learning approach modified from \(Q\)-learning [64] to make use of the gist traces.
Consider an arbitrary agent in a multi-agent system. In each state, it maintains a \(Q\)-value for each action, which estimates the expected payoff of performing that action. We use gist trace as the basis of reasoning and decision making, and the \(Q\)-value for each action largely depend on the agent’s current perceived gist trace \(\mathbf {x}^{t}\). Formally, at time \(t\), an agent’s \(Q\)-value \(Q_j^{t}\) for any action \(a_j \in A\) takes the following form: (2) \[\begin{equation} Q_j^{t} =b_j + \mathbf {c}_{- j}^\intercal \mathbf {x}^{t}_{- j}, \end{equation}\] where \(b_j\) reflects inherently how good playing action \(a_j\) is, and \(\mathbf {c}_{- j} =[c_{1,j}\ \ldots \ c_{j- 1,j} \ c_{j+1,j}\ \ldots \ c_{m,j}]^\intercal\) is a vector consisting of \(m-1\) elements, such that the element \(c_{k,j}\) reflects the correlation between the perceived popularity \(x^{t}_k\) of action \(a_k\) and the \(Q\)-value of action \(a_j\). Note that for the \(Q\)-value \(Q_j^{t}\) of action \(a_j\), we exclude the perceived popularity \(x_{j}^{t}\) of action \(a_j\) in Equation (2) to avoid multicollinearity [39].
Update of the Parameters. A valid question to ask at this point is what the values of \(b_j\), \(c_{1,j}, \ldots , c_{j- 1,j}, c_{j+1,j}, \ldots\), and \(c_{m,j}\) should be. In our learning model, instead of setting them to some arbitrary fixed values, agents learn to find the appropriate values of these parameters during interactions. In other words, we employ a novel learning-to-learn-better approach using gist trace. Algorithm 2 shows how an agent updates the \(Q\)-values for each action \(a_j\), refining the values of \(b_j\) and each element in \(\mathbf {c}_{-j}\) in the process. Suppose that at timestep \(t\), the agent performs action \(a_j\), receives immediate payoff \(u_j^{t}\), and perceives gist trace \(\mathbf {x}^t\). First, the agent calculates the \(Q\)-value (i.e., the estimated payoff) of action \(a_j\) based on the perceived gist trace and the received payoff using Equation (2) (line 1). Here we measure the discrepancy between the estimated payoff \(Q_j^{t}\) and the actual received payoff \(u_j^{t}\) by the mean-squared error \(\delta =\frac{1}{2}(u_j^{t}-Q_j^{t})^2\). To minimize the discrepancy, the agent updates the values of \(b_j\) and each element in \(\mathbf {c}_{-j}\) using a gradient descent method, namely \(b_j\) is incremented by \(\eta \delta ^{\prime }\) and \(\mathbf {c}_{-j}\) by \(\eta \delta ^{\prime } \mathbf {x}^t_{-j}\) (lines 2 and 3), where \(\delta ^{\prime }\) represents the gradient and \(\eta\) is the learning rate for gradient descent. Then, based on the updated values of \(b_j\) and \(\mathbf {c}_{-j}\), the agent recalculates the \(Q\)-value of action \(a_j\) by Equation (2) (line 4). Since Algorithm 2 is performed every time an agent updates \(Q^t_j\), the values of \(b_j\) and \(\mathbf {c}_{-j}\) are updated repeatedly by gradient descent. Intuitively, the agent is able to learn better and better by using more and more fine-tuned values of \(b_j\) and \(\mathbf {c}_{-j}\). For the other actions that are not used at this timestep, the agent updates their \(Q\)-values based on the current gist trace by Equation (2) (lines 5 and 6). This ensures that for these actions, their \(Q\)-values will change as the gist trace changes and hence are kept up to date.
Choice of the Level of Abstraction. For our approach, the only hyperparameter that needs to be determined in advance is the level of abstraction denoted by \(\theta \in [0,1]\). Note that the level of abstraction indicates to what degree agents amplify the difference between actions. When \(\theta = 0\), agents do not amplify the difference though, agents still make use of the gist trace (the perceived popularity of each action) rather than the verbatim trace (e.g., the specific action choice of each agent). As the value of \(\theta\) increases, gist traces are derived at more abstract levels, and agents amplify the difference between actions. Such amplification may drive agents flock to choose the same action and achieve coordination. However, if the amplification is overdone, then a slight fluctuation of the action choice of others may cause agents to change their own actions back and forth, which in return negatively influences other agents (causing them to change their actions frequently) and hinders coordination. Therefore, in general, to achieve optimal performance, the level of abstraction should be moderately high. Moreover, we conjecture that the level of abstraction should be higher for difficult coordination scenarios. For example, as the number of available actions increases, it becomes more difficult for agents to coordinate on the same action choice. However, the amplification induced by a higher level of abstraction can make the slightly more popular action stand out more. This better signals agents to choose that action.
Exploration Mechanism. Finally, we have to mention that each agent adopts exploration in action selection. In our study, each agent adopts \(\epsilon\)-greedy exploration to perform action selection, such that at every timestep, it selects the action with the highest \(Q\)-value with probability \(1-\epsilon\), and explores to randomly choose another action with probability \(\epsilon\).
5 PERFORMANCE COMPARISON WITH BASELINE AND STATE-OF-THE-ART APPROACHES
In this section, we compare the gist trace-based \(Q\)-learning (GQ) with three other reinforcement learning methods, namely, \(Q\)-learning (Q) [64], collective learning (CL) [71], and multiple-\(Q\) learning (MQ) [62]. \(Q\)-learning is the commonly adopted baseline and does not utilise any information about an agent’s neighbourhood. Over time, a \(Q\)-learning agent maintains a \(Q\)-value for each action based on its own received payoffs. In contrast, CL and MQ make use of the verbatim information about an agent’s neighbours. Note that CL and MQ were originally designed for convention emergence from two-player coordination games. Thus, the scenario of \(n\)-player coordination games may not give full play to the strength of these two methods. We chose CL and MQ for comparison, because none of the learning methods in the literature was specially designed for \(n\)-player coordination games; CL and MQ are the only approaches that can be applied to the convention emergence framework presented in Section 3.1.
Collective Learning. CL makes use of the specific choice on actions of each neighbour. With CL, an agent maintains a \(Q\)-value for each neighbour-action pair such that the \(Q\)-value is updated based on the received payoff of using an action when the agent confronts with a particular neighbour in an \(n\)-player coordination game. During decision-making, a CL agent first identifies the best response (i.e., the action with the highest \(Q\)-value) to each neighbour; then the CL agent aggregates the best responses into a final choice of action that is actually used in the \(n\)-player coordination game.4
Multiple-\(Q\) learning. MQ leverages the exact \(Q\)-values for actions maintained by each neighbour. With MQ, an agent maintains a \(Q\)-value for each action, which estimates the reward of using an action in an \(n\)-player coordination game. When a MQ agent updates the \(Q\)-value for an action, it observes the \(Q\)-values of its neighbours and averages the \(Q\)-values for this action over all the neighbours. Then the agent updates its own \(Q\)-value for this action based on the the averaged \(Q\)-value of its neighbours, and its own received payoff in the \(n\)-player coordination game.
We first in Sections 5.1 to 5.3 select 0 to be the value of abstraction level \(\theta\) for gist trace-based \(Q\)-learning, that is, the gist trace is derived without any abstraction, and compare the effects of number \(n\) of players, network topology, population size, and number \(m\) of actions on the methods studied (Q, CL, MQ, and GQ). How the level of abstraction affects the performance of gist trace-based \(Q\)-learning are investigated in Section 5.4. For comparison, we consider the learning rate to be 0.1 in each learning method, and the exploration strategy to be \(\epsilon\)-greedy with the exploration probability \(\epsilon =0.1\).
We use the degree of convergence to evaluate the performance of each method. This metric shows at most how many agents have chosen to make the same choice of actions. Specifically, for every timestep, the degree of convergence in a multi-agent system is the percentage of non-exploring agents that take the most widely used action at that moment. For convenience, hereafter, we refer to the degree of convergence equalling a certain percentage \(p\%\) as \(p\%\) convergence. Note that if a system reaches \(90\%\) convergence, then a convention is considered to have emerged in this system by the traditional metric proposed by Kittock [27].
5.1 Effects of the Value of \(n\) in \(n\)-player Coordination Games
Unless stated otherwise, we consider a system of 1,000 agents, each of which has five available actions. As we mention before, the size of an agent’s neighbourhood determines the value of \(n\) of the \(n\)-player coordination game the agent plays in its neighbourhood. To evaluate how well each approach performs under the influence of \(n\), we place agents on random networks with average neighbourhood size \(\bar{n}\) varying from 20 to 100. The random networks generated by Erdős–Rényi model produce a Poisson degree distribution, such that generally the number of agents within each neighbourhood is around \(\bar{n}\). To smooth out randomness, we generate 100 instance random networks and perform a simulation consisting of 5,000 timesteps on each instance network.
In Figure 1, we compare the dynamics of the degrees of convergence over timesteps using different learning approaches, under different settings of average neighbourhood size \(\bar{n}\). It is shown that as the average neighbourhood size varies from 20 to 100, the gist trace-based \(Q\)-learning method always achieves an extremely high degree of convergence (virtually \(100\%\) convergence) efficiently. On the contrary, for the other methods, the degree of convergence is relatively low. In particular, it remains unchanged at approximately \(20\%\) from the beginning to the end of simulations when the average neighbourhood size is large (say, \(\bar{n}\) is greater than 60). The \(20\%\) convergence that the other methods achieve clearly indicates that agents generally have difficulties in coordinating, and there is no sign of convention emergence. This shows that the use of gist trace in reasoning and decision making greatly promotes convention emergence, as a convention (achievement of \(90\%\) convergence) always emerges within the first 50 timesteps in these experiments.
Fig. 1. Effects of average neighbourhood size on the dynamics of the degree of convergence. Presented results are averaged over 100 simulations.
We also note that, with a small average neighbourhood size (say, \(\bar{n}\) is not greater than 40), the other methods can finally achieve comparably higher degree of convergence; however, the time they take is significantly longer. Specifically, under the setting of \(\bar{n}=40\), all of the other methods do not reach \(90\%\) convergence until 4,000 timesteps. Compared with less than 50 timesteps to achieve \(90\%\) convergence that gist trace-based \(Q\)-learning takes under each setting, the other methods are about 80 times slower. Moreover, we find that with the increase in the average neighbourhood size, gist trace-based \(Q\)-learning needs significantly shorter time to achieve convention emergence. This is clearer if we look at the mean and standard deviation of the number of timesteps required to achieve \(90\%\) convergence under each setting of average neighbourhood size, which are presented in Table 1.
Table 1. Under Different Settings of Average Neighbourhood Size, the Mean and the Standard Deviation of the Number of Timesteps Required to Achieve Convention Emergence Using Gist Trace-based \( Q \)-learning
The rationale behind the above experimental findings can be explained as follows. Recall that the value of \(n\) decides the average neighbourhood size of random networks. With a large value of \(n\), to achieve convention emergence, agents need to coordinate with more agents in the neighbourhood. This may greatly slow down the learning processes of agents utilising verbatim traces, because the amount of detailed experience that agents learn from is proportional to the number of neighbours. With a sufficiently large value of \(n\), the great amount of detailed experience may even bury the essential information—agents should coordinate with the majority of the population on the same action choice. Therefore, it can be observed that as the value of \(n\) increases, it becomes more unlikely for verbatim trace-based learning approaches to establish conventions efficiently. However, the large neighbourhood size does not hamper the performance of the gist trace-based \(Q\)-learning. This is because when there are more neighbours for an agent to observe, it becomes more likely for the gist trace, i.e., the general action choice of a local neighbourhood, to accurately reflect the action choice of the majority in a population. Based on the more accurate gist trace, it becomes easier for agents to achieve coordination, which facilitates the emergence of conventions.
Overall, the results show that gist trace-based \(Q\)-learning empirically guarantees much more efficient convention emergence from \(n\)-player coordination games, particularly those involving a large number of agents, while the baseline \(Q\)-learning and the other methods utilising verbatim information may not. In particular, as the value of \(n\) increases, gist trace-based \(Q\)-learning presents an increasing trend in the speed of establishing conventions, while all of the other methods present a decreasing trend. Therefore, gist trace-based \(Q\)-learning is particularly suitable for efficient convention emergence from \(n\)-player coordination games. As the value of \(n\) increases, the strength of gist trace-based \(Q\)-learning becomes more prominent.
5.2 Effects of Complex Networks
We also investigate the performance of each learning approach in multi-agent systems with a complex network as the underlying network topology. In contrast to simple networks, complex networks usually exhibit either or both of these two properties: scale-free property and small-world property. By scale-free property, the degree distribution in the network follows a power law, such that the probability that a node has exactly \(k\) neighbours is roughly proportional to \(k^{-\gamma }\). Therefore, while most nodes have only few neighbours, there are a few nodes that have a large number of neighbours. By small-world property, the networks feature a high clustering coefficient and a small average path length. Hence, agents tend to cluster together, and they can reach any other agents within a few hops.
Following the de facto standard procedures, we adopt the Barabási–Albert model [3] to generate scale-free networks, with the value of \(\gamma\) inherently around 3, and the Watts–Strogatz model [65] to generate typical small-world networks, with the rewiring probability set to 0.2. We set the average neighbourhood size to 20 in both types of networks for easy comparison of the results presented in Section 5.1, in which random networks with average size of neighbourhood of 20 are used. However, it should be noted that in scale-free networks, given the power-law degree distribution, most of the agents will have 10 to 15 neighbours, while a very small number of agents (the “hubs”) have more than 100 neighbours. For each network type, we generate 100 instance networks. A simulation consisting of 5,000 timesteps is run on each instance network.
In Figure 2, we present the dynamics of the degree of convergence using each learning approach under these two types of complex networks. We can observe that for each learning approach, the degree of convergence eventually exceeds \(90\%\) in these networks. That is, conventions generally emerge. However, it is worth mentioning that gist trace-based \(Q\)-learning outperforms all of the other approaches, in terms of the speed of achieving convention emergence. Figure 2 clearly shows that it takes gist trace-based \(Q\)-learning significantly less time to achieve \(90\%\) convergence—it takes around 50 timesteps to achieve convention emergence, while the others need 100 to 150 timesteps in small-world networks and even 200 to 250 timesteps in scale-free networks.
Fig. 2. Effects of complex networks on the dynamics of the degree of convergence. Presented results are averaged over 100 simulations.
In fact, it is much faster for gist trace-based \(Q\)-learning to produce conventions in scale-free networks than small-world networks with the same average neighbourhood size. This is in sharp contrast with the baseline and the other methods making use of verbatim information, which take much longer time to achieve convention emergence in scale-free networks than small-world networks.
We conjecture that the above differences in the speed of convention emergence are caused by the following reasons. As we explain earlier, with a larger neighbourhood, the gist trace should more accurately capture the general action choice of the entire population. Thus, at the early stage, the hubs in scale-free networks may promptly learn to take the most popular action in the population. Since a hub is itself the neighbour of a great number of other agents, its action choice affects the gist traces of these agents; this may accelerate convention emergence by biasing these agents to take the same action as the hub for coordination. Consequently, in scale-free networks, conventions quickly emerge with gist trace-based \(Q\)-learning. However, in small-world networks, every agent has approximately \(n\) neighbours, suggesting that there is no hub to influence the gist traces of many agents and accelerate convention emergence. Hence, with gist trace-based \(Q\)-learning, it takes conventions a longer time to emerge in small-world networks than in scale-free networks. For the other methods, as we show in the last section, they do not support efficient learning from \(n\)-player coordination games with a large neighbourhood (i.e., a large value of \(n\)). Hence, for these methods, the hubs are not able to quickly learn to take the appropriate action. The decisions of the hubs are not as persistent as those of their counterparts using gist trace, either. This may deteriorate the learning of its many neighbours. As a result, for these methods, conventions emerge more quickly in small-world networks than in scale-free networks.
In summary, the experiment results indicate when agents are placed on complex networks, gist trace-based \(Q\)-learning establishes conventions significantly faster than the baseline and the state-of-the-art learning methods. Moreover, given a population of gist trace-based \(Q\)-learning agents, a scale-free network is more efficient than a small-world network with the same average neighbourhood size for convention emergence from multilateral interactions.
5.3 Effects of Population Size and Number of Available Actions
We perform two sets of experiments to investigate how well each learning approach scales in response to the growth in population size and the number of available actions. First, we keep the setting of five available actions unchanged but change the population size from the default setting 1,000 to the following values: 2,000, 5,000, and 10,000. Then, fixing the population size at 1,000, we increase the number of available actions from the default setting 5 to the following values: 10, 15, and 20. For each setting, we place the agents on 100 instance random networks with average neighbourhood size of 20. A simulation consisting of 5,000 timesteps is performed on each instance network. The results are shown in Figures 3 and 4.
Fig. 3. Effects of population size on the dynamics of the degree of convergence. Presented results are averaged over 100 simulations.
Fig. 4. Effects of the number of available actions on the dynamics of the degree of convergence. Presented results are averaged over 100 simulations.
It can be observed from Figure 3 that under various settings of population size, gist trace-based \(Q\)-learning is always the fastest method among those achieving \(90\%\) convergence and hence establishing conventions. We also note that population size only marginally influences the speed of convention emergence when gist trace-based \(Q\)-learning is used. This point is clearly illustrated in Table 2 by the mean and standard deviation of the number of timesteps required to achieve \(90\%\) convergence under each setting of population size. Figure 3 also shows that as the population size multiplies, there is only a small change in the average number of timesteps required to establish conventions. It can thus be concluded that gist trace-based \(Q\)-learning scales much better with population size than other methods.
Figure 4 presents how the number of available actions affects the degree of convergence using each learning approach. We can observe that for all methods, increasing the number of available actions greatly slows down the convention emergence process. However, under each setting, gist-based \(Q\)-learning always spends least time in achieving a high degree of convergence. In particular, when agents have more actions to choose from, the gap between gist-based \(Q\)-learning and the other approaches becomes even more noticeable.
The effects of population size and the number of available actions can be explained as follows. Increasing only the population size makes no change to the average neighbourhood size \(n\). For individual agents, they are still playing \(n\)-player coordination games in their local neighbourhoods, and thus the difficulty in achieving coordination with their \(n-1\) neighbours generally does not increase. This leads to the marginal change in the speed of convention emergence. However, as the number of available actions increases, it becomes much more difficult for agents to coordinate on the same choice among the larger number of available actions. Therefore, it is natural to expect that the increase in the number of available actions greatly slows down convention emergence for different kinds of methods.
In general, the aforementioned experimental results again confirm the superiority of gist trace-based \(Q\)-learning in comparison with the other approaches in convention emergence from \(n\)-player coordination games. In particular, gist trace-based \(Q\)-learning enjoys the property of scalability, especially in response to the growth in population size.
5.4 Effects of Level of Abstraction
To see how the level \(\theta\) of abstraction of gist traces affects convention emergence, we repeat all of the aforementioned experiments with the value of \(\theta\) varying from 0.1 to 0.9. The experimental results show that the change in the level of abstraction generally does not affect the final degree of convergence. Eventually, all of the simulations manage to approach \(100\%\) convergence and hence achieve convention emergence. However, the speed of convention emergence is significantly influenced by the level \(\theta\) of abstraction.
As shown in Figure 5, we notice a general pattern on how the level \(\theta\) of abstraction affects the number of timesteps required to achieve convention emergence. Under diverse experimental settings, as the value of \(\theta\) increases from 0 to 0.9, the number of timesteps required to establish a convention first experiences a noticeable drop and then gradually increases.
Fig. 5. Effects of the level \( \theta \) of abstraction on the speed of convention emergence. Presented results are averaged over 100 simulations.
There are two conclusions that can be drawn from this pattern. First, the gist traces derived at a low level of abstraction (i.e., with a small positive value of \(\theta\), or with a lower level of exaggeration) can significantly improve the efficiency of gist trace-based \(Q\)-learning. Nonetheless, for the gist traces derived at a higher level of abstraction (large value of \(\theta\)), such improvement may become less significant. Actually, when the level of abstraction is sufficiently high, the efficiency of gist trace-based \(Q\)-learning deteriorates.
The above results lead us to investigate that, under different settings, what the optimal level of abstraction should be, in the sense that such an optimal level of abstraction will lead to the fastest emergence of conventions. Experimental results show that the optimal value of \(\theta\) is actually insensitive to the population size or average neighbourhood size in random networks. However, as the number of available actions increases, there is also an increasing trend in the optimal level of abstraction. Specifically, the optimal values of \(\theta\) are 0.1, 0.3, and 0.4 for the number of available actions equals 10, 15, and 20, respectively.
In all, for gist trace-based \(Q\)-learning, the use of gist trace derived with a low level \(\theta\) of abstraction can further accelerate the emergence of conventions. However, to ensure the benefits, the level \(\theta\) should not be too high. The optimal value of \(\theta\) increases if agents have more actions to choose from, while it is generally not related to the population size or the average neighbourhood size in random networks.
5.5 Discussion
As shown in the previous sections, gist trace-based \(Q\)-learning establishes conventions from multilateral interactions much faster than the baseline and the state-of-the-approaches using verbatim information; however, the outperformance of gist trace-based \(Q\)-learning becomes less significant with a suffciently high level of abstraction. We hypothesise that the rationale behind such interesting phenomena can be explained as follows.
Intuitively, gist trace reflects the general choice on action within a neighbourhood rather than the verbatim choice (or evaluation) on actions of each neighbour. Thus, gist trace captures the essence of an agent’s experience in \(n\)-player coordination games—the multilateral coordination is achieved if and only if the agent takes the exact same action as the entire neighbourhood, not as one specific neighbour. Moreover, as one can imagine, over time, the general choice on action of a neighbourhood tends to be more stable and regular than the specific choice (or evaluation) on actions of each individual neighbour. Hence, the use of gist trace also largely avoid the effects of non-stationarity (i.e., transient instability and irregularity) that present in verbatim information of an agent’s experience.
As gist traces are derived with abstraction, agents exaggerate the difference in perceived popularity between any two actions. Consequently, the actions that are already more widely used will be perceived to be even more popular. With a small amount of exaggeration (low level of abstraction), agents are more motivated to choose the most popular action in its neighbourhood, although that action may be just slightly more popular than the others. This facilitates the coordination of a neighbourhood of agents, which eventually leads to a faster emergence of conventions. However, if the amplification is overdone, then it is easy for an agent to be misled to believe that an action that is just slightly popular in its neighbourhood is very popular. Hence stubborn local conventions tend to emerge, which hinders the emergence of a global convention. Worse, the slight fluctuation of the choices of an agent will greatly destabilise the gist traces other agents in its neighbourhood perceive. Non-stationarity that appears in the learning process of agents slows down convention emergence.
6 CONCLUSIONS
In this article, we focus on convention emergence from \(n\)-player coordination games. Borrowing the concepts from the fuzzy trace theory, we formalise an agent’s gist trace to be a vector of perceived popularity of each action in its neighbourhood, under the influence of its level of abstraction. We put forth a novel learning approach—gist trace-based \(Q\)-learning—that makes use of gist trace in decision-making. Extensive experiments on various settings verify that gist trace-based \(Q\)-learning establishes conventions much faster than the baseline and the state-of-the-art learning approaches using verbatim information. As the number of players involved in coordination games increases, the outperformance of gist trace-based \(Q\)-learning becomes more significant. In particular, with a large number of players involved, while gist trace-based \(Q\)-learning establishes conventions from coordination games efficiently, other learning approaches fail to do so. We also observe that for gist trace-based \(Q\)-learning, scale-free networks produce faster convention emergence than small-world networks. Moreover, gist trace-based \(Q\)-learning scales well with respect to the number of available actions and the population size. With a moderately high level of abstraction, the efficiency of gist trace-based \(Q\)-learning can be further improved. As future work, it would be interesting to explore the use of gist trace in more practical real-world coordination situations and the effects of heterogeneous levels of abstraction on convention emergence.
Footnotes
1 We note that there exist other scenarios in which agents coordinate through taking different actions. This type of scenarios is out of the scope of this article, because these scenarios cannot be modelled as the coordination games considered in this article.
Footnote2 The constraints may be caused by the disconnection or high communication costs between pairs of agents or privacy issues that an agent is not willing to reveal its behaviours to another agent.
Footnote3 The rich yet redundant details, such as the specific choices on action of particular neighbours, are collectively known as a verbatim trace.
Footnote4 As mentioned in Section 2, CL can be incorporated with different specific learning algorithms for agents to learn the best responses to neighbours. Here we consider \(Q\)-learning, since it is shown that the best performance of CL is achieved when agents use \(Q\)-learning [71].
Footnote
- [1] . 2009. Specifying norm-governed computational societies. ACM Trans. Comput. Logic 10, 1 (2009), 1–42. Google Scholar
Digital Library
- [2] . 2016. Learning about the opponent in automated bilateral negotiation: A comprehensive survey of opponent modeling techniques. Auton. Agents Multi-Agent Syst. 30, 5 (2016), 849–898. Google Scholar
Digital Library
- [3] . 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509–512.Google Scholar
Cross Ref
- [4] . 2015. Evolutionary dynamics of multi-agent learning: A survey. J. Artif. Intell. Res. 53, 1 (2015), 659–697. Google Scholar
Digital Library
- [5] . 2003. Norm governed multiagent systems: The delegation of control to autonomous agents. In Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology (IAT’03). IEEE, 329–335. Google Scholar
Digital Library
- [6] . 2002. Multiagent learning using a variable learning rate. Artif. Intell. 136, 2 (2002), 215–250. Google Scholar
Digital Library
- [7] . 1990. Gist is the grist: Fuzzy-trace theory and the new intuitionism. Dev. Rev. 10, 1 (1990), 3–47.Google Scholar
Cross Ref
- [8] . 1951. Iterative solution of games by fictitious play. Activ. Anal. Prod. Allocat. 13, 1 (1951), 374–376.Google Scholar
- [9] . 2015. The spontaneous emergence of conventions: An experimental study of cultural evolution. Proc. Natl Acad. Sci. U.S.A. 112, 7 (2015), 1989–1994.Google Scholar
Cross Ref
- [10] . 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proc. of the AAAI Annual Conference on Artificial Intelligence (AAAI’98). 746–752. Google Scholar
Digital Library
- [11] . 2011. Norm enforceability in electronic institutions? In Coordination, Organizations, Institutions, and Norms in Agent Systems VI. Springer, 250–267. Google Scholar
Digital Library
- [12] . 1983. Negotiation as a metaphor for distributed problem solving. Artif. Intell. 20, 1 (1983), 63–109.Google Scholar
Cross Ref
- [13] . 2002. Emergence of social conventions in complex networks. Artif. Intell. 141, 1 (2002), 171–185. Google Scholar
Digital Library
- [14] . 2001. Learning to be thoughtless: Social norms and individual computation. Comput. Econom. 18, 1 (2001), 9–24. Google Scholar
Digital Library
- [15] . 2002. Multi-issue negotiation under time constraints. In Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1. 143–150. Google Scholar
Digital Library
- [16] . 2013. Manipulating convention emergence using influencer agents. Auton. Agents Multi-Agent Syst. 26, 3 (2013), 315–353. Google Scholar
Digital Library
- [17] . 2013. The dynamics of reinforcement social learning in cooperative multiagent systems. In Proceedings of 23rd International Joint Conference on Artificial Intelligence, Vol. 13. 184–190. Google Scholar
Digital Library
- [18] . 2015. Fast convention formation in dynamic networks using topological knowledge. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2067–2073. Google Scholar
Digital Library
- [19] . 2019. A context-aware convention formation framework for large-scale networks. Auton. Agents Multi-Agent Syst. 33, 1–2 (2019), 1–34. Google Scholar
Digital Library
- [20] . 2010. Double q-learning. In Advances in Neural Information Processing Systems. 2613–2621. Google Scholar
Digital Library
- [21] . 2017. Engineering the emergence of norms: A review. Knowl. Eng. Rev. 32 (2017).Google Scholar
Cross Ref
- [22] . 2008. Opponent modelling in automated multi-issue negotiation using bayesian learning. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 1. 331–338. Google Scholar
Digital Library
- [23] . 2019. To be big picture thinker or detail-oriented?: Utilizing perceived gist information to achieve efficient convention emergence with bilateralism and multilateralism. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2021–2023. Google Scholar
Digital Library
- [24] . 2017. Achieving coordination in multi-agent systems by stable local conventions under community networks. In Proceedings of 26th International Joint Conference on Artificial Intelligence. 4731–4737. Google Scholar
Digital Library
- [25] . 2018. Compromise as a way to promote convention emergence and to reduce social unfairness in multi-agent systems. In Proceedings of Australasian Joint Conference on Artificial Intelligence. Springer, 3–15.Google Scholar
Cross Ref
- [26] . 2018. Do social norms emerge? The evolution of agents’ decisions with the awareness of social values under iterated prisoner’s dilemma. In Proceedings of the IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO’18).Google Scholar
Cross Ref
- [27] . 1993. Emergent conventions and the structure of multi-agent systems. In Proceedings of the 1993 Santa Fe Institute Complex Systems Summer School, Vol. 6. 1–14.Google Scholar
- [28] . 2014. Challenges for multi-agent coordination theory based on empirical observations. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS’14). 1157–1160. Google Scholar
Digital Library
- [29] . 1988. Functionally accurate, cooperative distributed systems. In Readings in Distributed Artificial Intelligence. Elsevier, 295–310. Google Scholar
Digital Library
- [30] . 1969. Convention: A Philosophical Study.Google Scholar
- [31] . 2008. Convention: A Philosophical Study. John Wiley & Sons.Google Scholar
- [32] . 2008. Negotiation among autonomous computational agents: Principles, analysis and challenges. Artif. Intell. Rev. 29, 1 (2008), 1–44. Google Scholar
Digital Library
- [33] . 2004. An architecture for autonomous normative agents. In Proceedings of the 5th Mexican International Conference on Computer Science (ENC’04). IEEE, 96–103. Google Scholar
Digital Library
- [34] . 2017. Establishing norms with metanorms over interaction topologies. Auton. Agents Multi-Agent Syst. 31, 6 (2017), 1344–1376. Google Scholar
Digital Library
- [35] . 2015. Manipulating conventions in a particle-based topology. In Proceedings of the International Workshop on Coordination, Organizations, Institutions, and Norms in Agent Systems. Springer, 242–261. Google Scholar
Digital Library
- [36] . 2017. Limited observations and local information in convention emergence. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS’17). 1628–1630. Google Scholar
Digital Library
- [37] . 2014. Social Conventions: from Language to Law. Vol. 25. Princeton University Press.Google Scholar
- [38] . 2014. A decentralized approach for convention emergence in multi-agent systems. Auton. Agents Multi-Agent Syst. 28, 5 (2014), 749–778. Google Scholar
Digital Library
- [39] . 2012. Introduction to Linear Regression Analysis. Google Scholar
Digital Library
- [40] . 2015. Online automated synthesis of compact normative systems. ACM Trans. Auton. Adapt. Syst. 10, 1 (2015), 1–33. Google Scholar
Digital Library
- [41] . 2008. Norm emergence under constrained interactions in diverse societies. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 2. 779–786. Google Scholar
Digital Library
- [42] . 2010. A classification of normative architectures. In Simulating Interacting Agents and Social Phenomena. 3–18.Google Scholar
Cross Ref
- [43] . 1998. Agents that reason and negotiate by arguing. J. Logic Comput. 8, 3 (1998), 261–292.Google Scholar
Cross Ref
- [44] . 2005. The role of clustering on the emergence of efficient social conventions. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 965–970. Google Scholar
Digital Library
- [45] . 2012. A new intuitionism: Meaning, memory, and development in fuzzy-trace theory. Judgm. Decis. Mak. (2012).Google Scholar
Cross Ref
- [46] . 1995. Fuzzy-trace theory: An interim synthesis. Learn. Individ. Differ. 7, 1 (1995), 1–75.Google Scholar
Cross Ref
- [47] . 2008. An infection-based mechanism for self-adaptation in multi-agent complex networks. In Proceedings of the 2nd IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO’08). IEEE, 161–170. Google Scholar
Digital Library
- [48] . 2010. Robust coordination in large convention spaces. AI Commun. 23, 4 (2010), 357–372. Google Scholar
Digital Library
- [49] . 2011. Norm creation, spreading and emergence: A survey of simulation models of norms in multi-agent systems. Multiagent Grid Syst. 7, 1 (2011), 21–54. Google Scholar
Digital Library
- [50] . 2008. Role model based mechanism for norm emergence in artificial agent societies. In Coordination, Organizations, Institutions, and Norms in Agent Systems III. Springer, 203–217. Google Scholar
Digital Library
- [51] . 2007. Emergence of norms through social learning. In Proceedings of 20th International Joint Conference on Artificial Intelligence, Vol. 1507. 1512. Google Scholar
Digital Library
- [52] . 2020. A qualitative approach to composing value-aligned norm systems. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. 1233–1241. Google Scholar
Digital Library
- [53] . 1992. Emergent conventions in multi-agent systems: Initial experimental results and observations. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR’92), 225–231. Google Scholar
Digital Library
- [54] . 1997. On the emergence of social conventions: Modeling, analysis, and simulations. Artif. Intell. 94, 1 (1997), 139–166. Google Scholar
Digital Library
- [55] . 2016. An approach to verify conflicts among multiple norms in multi-agent systems. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’16). IEEE, 367–374.Google Scholar
Cross Ref
- [56] . 2013. Norms as a basis for governing sociotechnical systems. ACM Trans. Intell. Syst. Technol. 5, 1 (2013), 21. Google Scholar
Digital Library
- [57] . 2011. Emergence and stability of social conventions in conflict situations. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’11). Google Scholar
Digital Library
- [58] . 2004. Implementing norms in multiagent systems. In Proceedings of the German Conference on Multiagent System Technologies. Springer, 313–327.Google Scholar
Cross Ref
- [59] . 2011. Social instruments for robust convention emergence. In Proceedings of 22nd International Joint Conference on Artificial Intelligence, Vol. 11. 420–425. Google Scholar
Digital Library
- [60] . 2009. Topology and memory effect on convention emergence. In Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Volume 02. 233–240. Google Scholar
Digital Library
- [61] . 1995. Understanding the emergence of conventions in multi-agent systems.. In Proceedings of the 1st International Conference on MultiAgent Systems, Vol. 95. 384–389.Google Scholar
- [62] . 2018. Efficient convention emergence through decoupled reinforcement social learning with teacher-student mechanism. In Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems. 795–803. Google Scholar
Digital Library
- [63] . 2018. A cyclical social learning strategy for robust convention emergence. In Proceedings of the International Conference on Agents. 120–125.Google Scholar
Cross Ref
- [64] . 1992. Q-learning. Mach. Learn. 8, 3–4 (1992).Google Scholar
Digital Library
- [65] . 1998. Collective dynamics of a small-world networks. Nature 393, 6684 (1998), 440–442.Google Scholar
Cross Ref
- [66] . 1995. Intelligent agents: Theory and practice. Knowl. Eng. Rev. 10, 02 (1995), 115–152.Google Scholar
Cross Ref
- [67] . 2016. Accelerating norm emergence through hierarchical heuristic learning. In Proceedings of the European Conference on Artificial Intelligence. 1344–1352. Google Scholar
Digital Library
- [68] . 1996. The economics of convention. J. Econ. Perspect. 10, 2 (1996), 105–122.Google Scholar
Cross Ref
- [69] . 2017. Neural learning for the emergence of social norms in multiagent systems. In Proceedings of the IEEE International Conference on Agents. IEEE, 40–45.Google Scholar
Cross Ref
- [70] . 2015. Hierarchical learning for emergence of social norms in networked multiagent systems. In Proceedings of the Annual Conference on Artificial Intelligence (AI’15). 630–643.Google Scholar
Cross Ref
- [71] . 2013. Emergence of social norms through collective learning in networked agent societies. In Proceedings of 12th International Conference on Autonomous Agents and Multiagent Systems. 475–482. Google Scholar
Digital Library
Index Terms
(auto-classified)Gist Trace-based Learning: Efficient Convention Emergence from Multilateral Interactions
Recommendations
Efficient Convention Emergence through Decoupled Reinforcement Social Learning with Teacher-Student Mechanism
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsIn this paper, we design reinforcement learning based (RL-based) strategies to promote convention emergence in multiagent systems (MASs) with large convention space. We apply our approaches to a language coordination problem in which agents need to ...
Manipulating convention emergence using influencer agents
Coordination in open multi-agent systems (MAS) can reduce costs to agents associated with conflicting goals and actions, allowing artificial societies to attain higher levels of aggregate utility. Techniques for increasing coordination typically involve ...
To be Big Picture Thinker or Detail-Oriented?: Utilizing Perceived Gist Information to Achieve Efficient Convention Emergence with Bilateralism and Multilateralism
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent SystemsRecently, the study of social conventions (or norms) has attracted much attention. In this paper, we study the emergence of conventions from agents' repeated coordination games via bilateralism and multilateralism. We assume that agents can perceive the ...











Comments