Information Diffusion Meets Invitation Mechanism

The dissemination of information is a complex process that plays a crucial role in real-world applications, especially when intertwined with friend invitations and their ensuing responses. Traditional diffusion models, however, often do not adequately capture this invitation-aware diffusion (IAD), rendering inferior results. These models typically focus on describing the social influence process, i.e., how a user is informed by friends, but tend to overlook the subsequent behavioral changes that invitations might precipitate. To this end, we present the Independent Cascade with Invitation (ICI) model, which incorporates both the social influence process and multi-stage behavior conversions in IAD. We validate our design through an empirical study on in-game IAD. Furthermore, we conduct extensive experiments to evaluate the effectiveness of our proposal against 6 state-of-the-art models on 6 real-world datasets. In particular, we demonstrate that our solution can outperform the best competitor by up to 5× in cascade estimation and 17.2% in diffusion prediction. We deploy our proposal in the seed selection and friend ranking scenarios of Tencent's online games, where it achieves improvements of up to 170% and 20.3%, respectively.


INTRODUCTION
The invitation-aware diffusion (IAD) describes the process by which information spreads from one user to another via an invitation mechanism, characterized by the behaviors of sending and accepting invitations.IAD is ubiquitous in various real-world social platforms, e.g., WeChat [38], Yahoo Messenger [15], LinkedIn [1], and Tencent's online gaming platforms [48].In contrast to actions such as liking or commenting on a stranger's tweet, the invitation behavior typically occurs between existing friends, thereby spreading through established social relationships.For instance, online gaming platforms often organize events to strengthen friendships [19, 28-30, 34, 35, 48].In these events, users are encouraged to invite their friends to play together and the accepting friends can further invite their friends, hence creating a cascade of invitations.Understanding the mechanism of IAD is an important problem and underpins a variety of applications, such as influence maximization [24], rumor detection [33], diffusion prediction [14], network robustness verification [32] and influencer pricing [21,51].However, previous works about IAD [1,15,38] mainly focus on exploring the macroscopic properties.For example, [15,38] analyze the size and depth of the diffusion tree starting from selected users (called seeds), while [1] highlights that user homophily plays an important role in IAD.
In this work, we aim to design an IAD model that captures the dissemination of invitation behaviors via social connections.Despite the numerous diffusion models [2-4, 9, 10, 16-18, 27, 31] proposed in recent decades, adapting them to encompass the invitation mechanism still poses challenges.The first lies in the unclear nature of the social influence process of IAD, i.e., how users are activated (or informed) by others.To explain, most existing model derive their influence processes from two traditional ones: Independent Cascade (IC) [16], which assumes that each individual is independently influenced by their active friends, and Linear Threshold (LT) [18], which suggests that a user is influenced only after a sufficient number of friends have been activated.Thus, a critical question arise: does the social influence process of IAD conform to the patterns of IC, LT, or neither?Secondly, the transition of invitation and acceptance behaviors further complicates the dynamics of IAD, rendering existing models inadequate for capturing these processes.
To address these challenges, this work presents an IAD model called Independent Cascade with Invitation (ICI), which categorizes active users into three roles: invitee, acceptor, and inviter, with transitions between these roles occurring progressively.Specifically, an uninformed user has the chance to become an invitee based on the independent inviting process.Then, the user may transition from invitee to acceptor and subsequently to inviter with specific transition probabilities.This contrasts with IC, which assumes that users will accept and invite others unconditionally once influenced.To validate the design of ICI, we conduct an empirical study focused on the in-game IAD scenario.It not only confirms that the influence process of IAD aligns with IC, but also reveals that deconstructing IC's influence process into an independent inviting procedure coupled with multi-stage behavior transitions is statistically close to the dynamics observed in IAD.Furthermore, we integrate ICI into four key applications namely cascade estimation [4,26], target recommendation [48], diffusion prediction [7,14], and influence maximization [24].For each application, we offer detailed analyses regarding the correctness and computational complexity, highlighting ICI's applicability across a spectrum of scenarios.
We experimentally evaluate the proposed ICI model against 6 representative competitors across 6 real-world datasets.In particular, we demonstrate that the proposed model outperforms all competitors in terms of RMSE, AUC, and MAP scores while (i) estimating the macroscopic features of diffusion [26] and (ii) predicting the activation of each user [7] across all tested datasets.Furthermore, we deploy our solution to two real-world scenarios in online gaming platforms of Tencent.Here, we leverage the model's ability to estimate the number of acceptors for two critical tasks: seed selection and target friend recommendation, achieving improvements of up to 170% and 20.3% in their respective evaluation metrics.
To summarize, we make the following contributions in this work: • We devise an IAD model ICI, which has been validated through an empirical study and applied in four applications.• We conduct experiments to show the superiority of ICI over competitors on cascade estimation and diffusion prediction tasks.• We deploy ICI to seed selection and friend recommendation in online games, achieving significant improvements.

PRELIMINARIES
This section introduces diffusion models and the in-game invitationaware diffusion (IAD), and then highlights the goal of this work.

Diffusion Models
Let G = (V, E) be a social network, where V is a set of nodes representing users and E is a set of edges representing relationships.We assume that each edge (, ) ∈ E is directed, indicating that  is a follower of and can be influenced by .We call  (resp.) the inneighbor (resp.out-neighbor) of  (resp.) and use N   (resp.N   ) to denote the set of in-neighbors (resp.out-neighbors) of .Given a G and a set S of chosen users (called seeds), a diffusion model assumes that each  ∈ V has two possible states, inactive or active, and captures the diffusion of a given item from S in a stochastic manner.Initially, seeds are in active states, and subsequently, active users attempt to influence their inactive out-neighbors through an influence process.Most of existing models [2-4, 9, 10, 16-18, 27, 31, 42] extend the Independent Cascade (IC) [16] and Linear Threshold (LT) [18] models, whose influence processes are as follows.IC model.Given a G and an S, IC introduces the influence probability  , for each edge (, ) ∈ E, which indicates the likelihood that  is successfully activated by the in-neighbor .A diffusion instance of the IC model first tags each node in S to be active and leaves the rest inactive at step 0. At the following step  > 0, each user  activated at step  − 1 has one chance to independently activate the inactive out-neighbor  with probability  , .LT model.Unlike IC, the influence process of LT follows the intuition that an inactive user will switch to be active when a sufficient number of his/her in-neighbors have been activated.Formally, given a G and an S, LT assumes that each edge (, ) ∈ E is associated with an edge weight  , satisfying  ∈N    , ≤ 1.In LT, the threshold   ∈ [0, 1] is uniformly sampled and assigned to each user  ∈ V.For any step  > 0, an inactive user  is activated if  (, ) ≥   , where  (, ) is 's threshold function and is the summation of  , w.r.t.'s in-neighbor  activated before step .
During a diffusion instance of IC or LT, once a node is activated, it remains active in all subsequent steps.The instance terminates if no more users can be activated, and the influence spread  G (S) is defined as the expected number of active users from S.

In-Game Invitation-Aware Diffusion
Event description.On Tencent's online gaming platforms, the service provider regularly conducts friendship-enhancing events to foster interactions among friends.Before an event, the service provider selects a set V  of source users and a set V  of target users based on historical activeness and the event requirement.For each source user  ∈ V  , a limited number of target friends are selected from V  ∩ N   in terms of specific recommendation strategies.As the event begins, each source user, upon their login, receives detailed information about the event and a list of recommended target friends they are encouraged to invite.Upon receiving an invitation from a source user , a target user  is notified and decides whether to accept the invitation.If  accepts and interacts with , both users are rewarded with the event's incentives.Due to the intersection between V  and V  , if the target is also designated as a source user, it can invite its own target friends, thereby further propagating the event within G. To summarize, the propagation of this event encompasses two primary elements: (i) a social influence process through inviting relationships and (ii) two stand-alone user behaviors, namely source invitation and target acceptance.Dataset cleaning.The logs of a friendship-enhancing event consist of two parts: (i) an invitation dataset with tuples (, , , ), representing that the source user  ∈ V invited the friend  ∈    at timestamp  , , and (ii) an acceptance dataset with tuples (,  ), representing that the target  accepted the invitation from one of the source friend and engaged in this event at timestamp   .For a better understanding of event dynamics, we clean the logs by retaining only the earliest timestamp for each distinct invitation relationship and each accepting invitee in the datasets.Additionally, we find that the invitation behavior is cascading on the social network, and hence construct the diffusion trees from the invitation dataset.In particular, each tree is initialized to the invitation relationships, starting from the seed inviter who spontaneously Figure 1: A running example of ICI (uninformed nodes: light grey, invitees: yellow, acceptors: orange, inviters: red).engages in the event without receiving any friends' invitations.Subsequently, we add the directed edge (, ) to the tree if (i) there exists an edge (, ) on the tree satisfying  , <  , and (ii) the tree is still acyclic after insertion.

Problem Formulation
We aim to devise an IAD model that captures the invitation dissemination via social connections of the given G.Moreover, we assume that the IAD model is a progressive and single-item model.In particular, the progressive model means that an active (or informed) node will not be deactivated at the later step.The single-item model describes the propagation of a single item, as the friendship-enhancing event is unique on the platform in a specific period.

PROPOSED MODEL
As per the above-said description of IAD, we elaborate on the proposed model: Independent Cascade with Invitation (ICI), followed by conducting an empirical study to justify the design choices in ICI.

Formulation
User roles.We first define the following user roles to distinguish the components in the IAD.In particular, the uninformed is the user who has not received the invitation about the event (i.e., inactive user).Moreover, the active user is classified into three roles: (i) invitee, the user who has received the event invitation from friends; (ii) acceptor, the invitee who has accepted the invitation; (iii) inviter, the acceptor who has sent the invitation to friends.Diffusion procedure.Given a social network G, a seed set S, and constants  and , a diffusion instance of the ICI model unfolds in the discrete steps, where the inviting process and further behavior conversions are independent.In particular, seed users are set to inviters, and other users remain uninformed at step 0. For any step  > 0, denote V ( −1)  as the set of users who become inviters at step  − 1.For an uninformed user  at step  − 1, the role transition at step  is illustrated as follows: (1) Each friend inviter  ∈ V ( −1)

∩ N 𝑖𝑛
has one chance to independently invite  with probability  , .If there exists a friend inviter  successfully invites , then  will become an invitee.
(2) If  successfully becomes the invitee, then  will become an acceptor with probability , or remain an invitee.(3) If  successfully becomes the acceptor, then  will turn into an inviter with probability , or still act as the acceptor.A diffusion instance repeats these at each step until no new inviters exist.Notably, for each active user, the roles invitee, acceptor, and inviter are in ontological priority, e.g., an inviter is also an acceptor.In addition, ICI will degrade to IC when  =  = 1.Running example.Figure 1 illustrates a diffusion instance of ICI.Given an undirected graph with nodes  1 ,  2 , . . .,  5 , we first pick  1 as the seed (inviter) at step 0 (Figure 1(a)).At step 1,  1 successfully sends invitations to friends  3 ,  4 ,  5 (Figure 1(b)).After being invited, all invitees flip the coin with a head probability of .Among them,  3 and  5 achieve heads and become acceptors (Figure 1(c)).Subsequently, the same coin-flip operation occurs for each acceptor with a success probability of , which results in  3 becoming a new inviter (Figure 1(d)) and inviting  2 (Figure 1(e)).Model outputs.We focus on the acceptor role in ICI as it can reflect the user engagement w.r.t. the information and is paramount to many real-world applications, such as signing up to join LinkedIn [1] or logging in to play with others [48].In light of the live-edge graph in [24], we define the invitation snapshot of G, reflecting the edge status and the user roles when an instance stops.
is the node set associated with an acceptor set V  ⊆ V  and an inviter set V  ⊆ V  , and E  ⊆ E is the set of inviting relationships.The sampling procedure of L ∈ Ω is as follows: (1) Include each edge (, ) ∈ E into E  with  , probability, i.e., Pr ( As each step is independent, the probability of sampling an L is Additionally, we define a reachable set Γ L (S) as the set of acceptors directly invited by S or reachable from S by a path of inviters in L, Based on Definition 3.1 and the principle of deferred decisions [36], we introduce two outputs for applications in Section 4.
Definition 3.2 (Accepting Spread).Given a G = (V, E, ), a seed set S, and ICI with constants  and , let L ∈ Ω be any invitation snapshot in Definition 3.1.The accepting spread from S under ICI is defined as the expected number of acceptors in L:

Justifications
To justify the design of ICI, we collect the logs in a friendshipenhancing event TXG-A of Tencent's role-playing game, where users are from different isolated servers and only connected to others on the same server.Hence, IAD only happens among users on each stand-alone server.After preprocessing, TXG-A returns 20.4 thousand invitation tuples and 12.8 thousand acceptance tuples.In the sequel, we first show the existence of IAD and behavior conversions related to invitation and acceptance.Next, we elucidate the rationale for choosing IC as the base model.At last, we explain the reason for separating IC's influence process into the independent inviting procedure and two stand-alone states.Notably, similar outcomes can be observed in other gaming datasets of Section 6.
Figure 2(a) displays the distribution of diffusion tree depths originating from all seed inviters in TXG-A.It reveals that the depth of the diffusion tree adheres to an exponential distribution, with only about 12% of diffusion trees exceeding a depth of three.A similar trend has been observed in other real-world diffusion scenarios [8,15,38].In Figure 2(b), we report the distributions of conversion rates for acceptance and invitation behaviors within servers.Specifically, the conversion rate for the acceptance (resp.invitation) behavior signifies the proportion of acceptors among all invitees (resp.inviters among all acceptors) within each server.As shown in Figure 2(b), the conversion rates that a user transitions into an acceptor and further into an inviter tend to concentrate around 0.9 and 0.6, respectively.This observation highlights the presence of multi-stage behavior conversion, indicating that once informed, a user is likely to accept an invitation and subsequently engage as an inviter with certain probabilities.
Figure 2(c) reports the distribution of the number of invitations invitees receive and the number of invitations they receive after accepting.Specifically, we find that a user can receive invitations from multiple distinct friends, with about 11% of invitees being invited by more than two distinct friends.Furthermore, users continue to receive invitations from other friends even after accepting the invitation from one friend.Notably, among users invited more than once, 76% experience repeated invitations twice or three times after acceptance.This phenomenon can be explained by the fact that, in the context of the friendship-enhancing event, two source users are permitted to invite a common friend from their recommendation lists.Consequently, a target user may receive multiple inviting notifications from different friends.This observation suggests that the invitation procedure resembles the influence process of IC.Specifically, each user can be independently invited by friends who have become inviters.It is worth noting that we provide a quantitative comparison between the ground-truth and model-predicted diffusions in Section 6, demonstrating that the IC-predicted diffusion aligns more closely with the ground-truth than LT.
Unlike the IC model, which consolidates the inviting procedure and acceptance behavior into the influence probability of each edge, the ICI model differentiates these processes by positing that the transition from invitee to acceptor does not depend on the number of invitations received.To justify this distinction, we introduce the notation  (, ), representing the actual acceptance rate after being invited  times in server  on TXG-A, and compute  (, ) for each server  and  > 0. Subsequently, we leverage  (, 1) and different  Furthermore, we conduct a two-sample t-test, yielding a one-sided p-value of 0.016, signifying that the smaller error of ICI is statistically significant.Additionally, we evaluate this assumption through a sensitivity analysis, as elaborated in Section 6.Similar findings have also been reported by a recent study [47].

APPLICATIONS
This section outlines the utilization of ICI in various applications.Section 4.1 and Section 4.2 introduce the utilization of accepting spread (Definition 3.2) and accepting probability (Definition 3.3) for macroscopic and microscopic tasks, respectively.At last, we aim to apply the ICI model to the well-studied influence maximization problem.Following IC, we define the graph as G = (V, E, ), where the inviting probability  , is associated with each edge (, ) ∈ E.
For ease of exposition, we defer all proofs to Appendix A.

Macroscopic Tasks
Problem 1 (Cascade Estimation [4,26]).Given a social network G and a seed set S, the cascade estimation problem aims to predict the size and growth of the diffusion tree starting from S.
Problem 2 (Target Recommendation [48]).Given a social network G, a budget , a source set V  and a target set V  , the objective of target recommendation is to select at most  target neighbors from V  ∩ N   for each source user  ∈ V  , such that the likelihood that the user engagement among all returned pairs is maximized.
Due to the #P-hardness [12] of evaluating  G (•), we leverage the Monte-Carlo (MC) simulation to estimate the accepting spread, which simulates the discrete propagation step of ICI starting from S and takes the average number of acceptors in  trials as the estimation.Initially, the estimation σG (S, , ) is set to 0. For each of  trials, at step  = 0, MC simulation adds the seeds in S to the invitee set V  , acceptor set V  , and inviter set V (0)  .At the following step , each user  ∈ V ( −1)  that became an inviter at  − 1 flips a coin with the head probability  , to invite each uninformed friend  ∈ N   \V  and  will be marked as an invitee if the result is head.The new invitee  will have the probability  to be an acceptor and be included in V  .After becoming an acceptor,  will further have the probability  to become an inviter and be included into V ( )  .This trial terminates at step  if V ( ) = ∅, and the estimated spread σG (S, , ) receives an increment of |V  |/ .The following theorem shows the correctness and time complexity.By MC simulation, the size and growth of the diffusion tree in Problem 1 can be estimated by the overall accepting spread and the number of acceptors at each step, respectively.Regarding Problem 2, inspired from the prior work [11], we take the estimated single accepting spread σG ({}, , ) as the influence centrality of each user , and select the friends with top  largest single accepting spread to recommend.

Microscopic Task
Problem 3 (Diffusion Prediction [7,14]).Given a social network G and a seed set S, the diffusion prediction problem aims to identify the users in G that are directly or indirectly influenced by S.
Solving Problem 3 by MC simulation resembles the above-said solution.However, MC simulation has two distinctions for diffusion prediction.In particular, we initialize the estimation â G (, S, , ) as 1 for each seed  ∈ S and 0 for all other  ∈ V\S.During each iteration of the  simulation trials, if a user  becomes a new acceptor, the estimated value â G (, S, , ) increments by 1/ .The following theorem shows the correctness and time complexity.It is worth noting that Problems 1-3 can be efficiently solved by MC simulation by setting  = 1, 000-10, 000 [7,24], which is sufficient for these tasks.In addition, the whole MC simulation procedure is only invoked once for Problems 1 and 3.As for Problem 2, the simulation can be parallelly invoked from target users.

Influence Maximization
Problem 4 (Influence Maximization [24]).Given a social network G, an integer , and a diffusion model M, the influence maximization problem asks for a seed set S with cardinality |S| =  to maximize the overall influence spread  G (S) under M.
The crux of taking ICI (resp.accepting spread) as the input model M (resp.the objective function) of Problem 4 is showing that the accepting spread is monotone and submodular.Specifically, denote the set function on nodes of G as  : 2 | V | → R, which is (i) monotone if  (S) ≤  (T ) for any S ⊆ T ; and is (ii) submodular if  (S ∪ { }) −  (S) ≥  (T ∪ { }) −  (T ) for any S ⊆ T and  ∈ V\T .The following theorem shows that the accepting spread also satisfies monotonicity and submodularity.Theorem 4.3.Given a social network G = (V, E, ) and ICI with constants  and , the accepting spread function on any seed set S ⊆ V is monotone and submodular.
In light of Theorem 4.3 and the fact that ICI is orthogonal to influence maximization, a plethora of existing approximation solutions [25,43,44,50] can be applied to select a size- seed set S such that  G (S, , ) is (1 − 1/ − )-approximate.Considering that the worst-case complexity by MC simulation reaches state-of-the-art solutions [5,43,44] estimate the spread based on random Reverse-Reachable (RR) sets.To construct an RR set  L () under ICI, we first sample a snapshot L as per Definition 3.1, and then check whether the node  is an acceptor.If  ∈ V  , we will include  and all nodes directly pointing or reachable to  via a path of inviters into  L (), otherwise  L () = ∅.In practice, we do not need to materialize L, and can employ a breadthfirst search starting from  instead.By definition, for fixed  and S, we can obtain that  L () where  is randomly selected from V and denote R G as a set of random RR sets.Based on the proof in Borgs et al.

RELATED WORK
In this part, we illustrate how other typical diffusion models CT-IC [10,17,31], IC-N [9], LT-C [4] and F-TM [2] extending IC and LT, and justify their differences from ours.We skip other variants [3,27,42] as the required features are unavailable in our problem, rendering them degraded to IC or LT.Other related work focuses on learning-based models for specific downstream tasks, such as inferring influence probabilities from known diffusion trees [2,3,6,7,14,17,39] and predicting the next-activated user by sequential models [23,40,45,46], which are outside the scope of this study.
CT-IC model.Continuous-Time IC (CT-IC) [10,17,31] extends IC by introducing time delays in information transmission.Once activated, users start to communicate with inactive neighbors using an independent meeting probability.If a meeting occurs at the step before the deadline , CT-IC follows the influence process of IC, providing one opportunity for user  to activate .The spread in CT-IC is the expected number of active users before .IC-N model.IC with Negative Opinions (IC-N) [9] introduces opinion diversity by distinguishing between positive and negative LT-C model.In LT with Colors (LT-C), user ratings for a product lead to three active states: adopted, promoted, and inhibited.After randomly assigning seeds to these states, LT-C follows LT's influence process, but the threshold function for an inactive user  combines active in-neighbor 's rating and edge weight  , .Upon activation, a user either (i) becomes adopted with a probability or (ii) serves as a message bridge, with a chance of (ii-a) becoming promoted with a probability or (ii-b) becoming inhibited otherwise.
LT-C defines the spread as the expected number of adopted users.F-TM model.F-TM [2] also follows the influence process in LT but integrates more information into the threshold function, including the edge weight, user's positive feeling w.r.t. each feature of the given item, and internal resistance to being influenced.At last, the threshold function is wrapped into a logistic function.
Remarks.In contrast to ICI, none of the prior IC-based models considers inviter and adopter states.Among LT-based models, while LT-C distinguishes between awareness and adoption, it is still inadequate in capturing IAD due to the mismatch of influence process.

EXPERIMENTS
In this part, we first introduce the experimental settings and then evaluate the performance of the proposed model on cascade estimation and diffusion prediction tasks.All of our experiments are conducted on an in-house cluster consisting of hundreds of machines, each of which runs CentOS, and has 16GB memory and 12 Intel Xeon Processor E5-2670 CPU cores.For reproducibility, the source code is available at: https://github.com/jeremyzhangsq/ICI.

Experiments Setup
Datasets.We use four friendship-enhancing events from Tencent's online games and preprocess the logs of invitation relationships and user behaviors as explained in Section 2.2.Furthermore, we take the snapshot of G before the release time as the input graph, since G for a particular online game evolves when new users are registered, or friendships are modified.Notice that all datasets have been anonymized to avoid any leakage of private information.
Besides the dataset about invitation diffusion, we also choose two other types of diffusion datasets Diggs [20] and Twitter [13].In particular, the Diggs contains the diffusion of vote behaviors w.r.t a given story on the platform, and the Twitter dataset records the diffusion of retweet behaviors on Twitter about the announcement   of the discovery of a new particle.For public datasets, we preserve the behaviors on the social connections and treat the user who first posted the story or tweet as the seed.The dataset statistics, including the graph, seeds, and actual spread, are shown in Table 1.
Models and parameter settings.We compare the proposed ICI model with six representative diffusion models as mentioned in Section 5: (i) IC-based models: IC [16], CT-IC [10,17,31], IC-N [9]; (ii) LT-based models: LT [18], LT-C [4] and F-TM [2].Recall that LT-C and F-TM require extra information about user ratings and feelings for the product, respectively, which are not available in the provided datasets and are set to 1 for a fair comparison.Moreover, we follow the basic setting in [4,24,44,50] and assign 1/|N   | to each edge (, ) as the edge probability and weight for IC-based and LT-based models, respectively.For other parameters, we follow the default parameter settings for all competitors, and set the conversion constants in ICI to  = 0.9 and  = 0.6 as explored in Section 3.2.Notice that this setting is derived from TXG-A, the results of which may overstate ICI's actual capabilities.

Cascade Estimation
In this section, we evaluate the performance of ICI and the competitors on the cascade estimation task (Problem 1).Given the actual seed set S, we conduct 1,000 Monte-Carlo simulations [7] to estimate the spread for each model.After that, we follow the prior work [4] and compute the Root Mean Square Error (RMSE) between the model-predicted spread and the ground-truth spread.Regarding the model-predicted spread, ICI employs the accepting spread in Definition 3.2, and each competitor utilizes the specific spread function as explained in Section 5. We report the average RMSE after  10 repeated trials and skip the standard derivation as it is always three orders of magnitude smaller than the average.Spread estimation.We first report the RMSE of each model in predicting the overall spread starting from S. As shown in Table 2, the proposed ICI model outperforms all competitors on the invitation and other types of diffusion datasets.In particular, the RMSE score of ICI is up to 5 × better than that of the best competitor CT-IC.Furthermore, we find that the IC-based models perform much better than the LT-based models in each invitation diffusion dataset, coherent to the observation illustrated in Section 3.2.Growth estimation.We next evaluate the performance of each model in predicting the spread growth.Akin to the actual diffusions, the diffusion predicted by each model can also be preprocessed into a diffusion tree, as described in Section 2.2.We compare the RMSE between the number of predicted and true active users in each hop , in which the active users have the same shortest distance  from the seed set S. As reported in Figure 3, ICI has the best RMSE in most hops of each dataset.This is because the multi-stage role transition makes spread converge faster than competitors as the hop increases.Notice that the RMSE of CT-IC dramatically decreases from hop 1 to 2, as the activation process of CT-IC is postponed by the communication probability.We also find that F-TM has a similar trend of RMSE as ICI, but the RMSE in the smaller hop is always worse than ICI.To explain, the logistic format threshold function makes numerous users infected at the earlier step and leaves almost no reachable inactive users at the later step.

Diffusion Prediction
In this part, we evaluate the performance of different models in the micro-level diffusion prediction task (Problem 3).Given a social network G and an actual seed set S, let   ∈ {0, 1} be the label of  ∈ V, where   = 1 if  is directly or indirectly infected by any user in S and   = 0 otherwise, and let ŷ ∈ [0, 1] be the activation likelihood of , which is the number of activitions of  over 1,000 simulations.Following previous works [3,6,7,14], we repeat each approach 10 times and report the Area Under Curve (AUC) and Mean Average Precision (MAP).
Overall evaluation.We first report the average and the standard deviation of AUC and MAP scores for the proposal and all competitors.In Table 3, the proposed ICI outperforms all competitors on TXG-A, TXG-B, TXG-C, TXG-D, and Twitter, and is the second-best approach on Diggs in terms of both evaluation metrics.For example, ICI is 6.3% and 22.4% better than the best competitor CT-IC on TXG-D in terms of AUC and MAP, respectively.In addition, ICI improves the best competitor IC-N (resp.CT-IC) by 7.0% (resp.17.2%) on Twitter in terms of AUC (resp.MAP), demonstrating that our proposal is effective in capturing various types of diffusion.
Case study.We next conduct a case study to compare the diffusion visualization induced by ICI and the leading competitor IC-N.We randomly select a server on TXG-D with 33 seeds as the test graph.We conducted 1,000 simulations, starting from the seeds, under both models.We use the state-of-the-art solution PPRviz [49] to visualize the diffusion trees generated by both models, where true (resp.false) positive infected nodes are marked in orange (resp.red).Figure 4 shows that both models produce a similar number of true positives.However, IC-N produced 30.3% more false positives than ICI, highlighting ICI's higher precision in diffusion prediction.Ablation study.At last, we justify the effectiveness of the user roles defined in ICI.For the fair comparison, we extend the conventional IC model by involving the conversion rates into the influence probability, i.e., assigning  , =  •/|N   | to each edge (, ) ∈ E, and call this variant the IC+ model.The only difference between ICI and IC+ is that the IC+ merges the operations in the role transitions of ICI into the social influence process.As reported in Table 3, we find that the new variant can outperform all competitors on all datasets except for Diggs due to utilizing the information of behavior conversion.However, the IC+ model is still beaten by ICI in all

DEPLOYMENT
We have deployed the proposed ICI model for the seed selection and target recommendation scenarios in several online games of Tencent, as illustrated in the sequel.The system setting for the online deployment follows that in Section 6. Due to the network effect, we follow [41] and partition all users into communities with high connectivity and profile homophily.We then conduct the online A/B testing that randomly assigns the live traffic in the same communities to a treatment group.Each approach is initially computed based on the graph instance ahead of the event and is then updated daily by using the latest graph snapshot.

Seed Selection
Tencent's online gaming platforms often organize viral marketing events, where a set of influential players (called seeds) are selected and treated as initial lucky users with a virtual incentive.Each lucky user  can invite its friend  following the invitation mechanism introduced in Section 2.2. will also become a lucky user after accepting the invitation and playing with .This leads to the spread of the luck privilege throughout the social network.Accordingly, the seed selection is paramount to the effectiveness of the event.
We deploy (i) the degree centrality, (ii) the proposed ICI model, and (iii) the competing IC model to separately select  = 5000 seeds for a viral marketing event on Tencent's battle royale game X with 227 million quarter-active users and 4 billion relationships.Specifically, the solution degree centrality is a well-adopted baseline for various Tencent's viral marketing events [22], by which the users with top- largest degree centrality are selected as seeds.Regarding the diffusion model ICI (resp.IC), we follow the influence maximization (Problem 4) and greedily select  seeds by the state-ofthe-art solution OPIM-C [43], such that the spreads of selected seeds under ICI (resp.IC) are maximized.We evaluate each approach by (i) spread, the number of lucky users excluding seeds, and (ii) invite rate, the fraction of lucky users who invite friends.The higher spread and invite rate indicates better quality.As reported in Figure 5, the approaches based on influence maximization are better than the degree centrality in both evaluation metrics, demonstrating the usefulness of the influence maximization problem in real-world viral marketing.Furthermore, ICI outperforms competitors in both metrics.Notably, ICI improves IC (resp.degree) by 15.6% (resp.170%) in terms of spread, and improves IC (resp.degree) by 15% (resp.37.5%) in terms of invite rate.

Target Recommendation
Recall in Section 2.3 that a user can only invite the target friend from a recommendation list with a limited size of  during the friendshipenhancing event.Therefore, judiciously recommending  target friends for each user is pivotal to the event's performance, which motivates the target recommendation task (Problem 2).We deploy (i) Intimacy [34,48], (ii) ICI and (iii) IC to a monthly friendshipenhancing event on a Tencent's role-playing game Y with 2.5 million monthly active users and 6.5 million relationships.Specifically, Intimacy is the well-adopted score in the target recommendation, which records the number of historical activities/interactions between friends, e.g., co-playing and gifting.Following the explanation in Section 4.1, we estimate the spread starting from  under a diffusion model (i.e., ICI or IC) for each target  ∈ V  , and take it as the influence centrality score of .We sort each score in descending order and select the top- target nodes to recommend.We evaluate the effectiveness of each treatment group by the click rate and pay rate.In particular, the click rate is the fraction of acceptors that invite friends, and the pay rate is the fraction of invitees that pay for this event.The higher rates indicate better quality.

CONCLUSIONS AND FUTURE WORK
In this work, we introduce a diffusion model, ICI, to capture the information dissemination process in the friend invitation scenario and evaluate its performance through extensive experiments on six different types of diffusion datasets.Our results show that ICI outperforms six state-of-the-art methods in terms of effectiveness in both cascade estimation and diffusion prediction.Additionally, the deployment of ICI in seed selection and friend ranking scenarios results in significant improvement.In future work, it would be interesting to learn personalized parameters  and  for each user to enhance performance in other tasks, such as diffusion prediction.  .

Definition 3 . 3 (
Accepting Probability).Given a G = (V, E, ), a seed set S, and ICI with  and , let L ∈ Ω be any invitation snapshot in Definition 3.1 and I(•) be an indicator function.The accepting probability of  is  G (, S, , ) = E L∼Ω I  ∈ Γ L (S) .

Figure 2 :
Figure 2: The histograms on TXG-A: (a) the depth of each diffusion tree; (b) the conversion rate from the acceptor to the inviter (cyan) and from the invitee to the acceptor (red); (c) the number of invitations invitees receive (cyan) and invitees receive after accepting (red).
[5], given a G = (V, E, ) and an R G , we can derive that|V | • Λ R ( S) | R G |is an unbiased estimator of  G (S, , ), where Λ R (S) is the number of random RR set  L () ∈ R G satisfying S ∩  L () ≠ ∅.Motivated by this connection, we can leverage existing solutions based on OPIM-C [43] to return a seed set satisfying (1 − 1/ − )approximate with the probability at least 1 −   in the expected time of   ln |V | + 1  2 ln 1   • (|V | + |E |) .

Figure 3 :
Figure 3: The RMSE of estimating spreads in each hop.

Figure 4 :
Figure 4: The diffusion visualization on a server of TXG-D (seed: pink, true positive: orange, false positive: red).

Table 1 :
Dataset statistics ( = 10 3 ,  = 10 6 ).Each seed in S is initially activated and becomes positive with probability  or negative otherwise.Subsequently, when a user  is activated by an in-neighbor ,  is converted as follows: (i)  becomes negative if  is negative; (ii) otherwise,  becomes positive with probability  and negative otherwise.IC-N's spread is measured by the expected number of positive active users.

Table 3 :
The AUC (%) and MAP (%) of different models in diffusion prediction (the best is bold and the second best is italic).

Table 4 :
Performance of varying  on TXG-D.

Table 5 :
Performance of seed selection in Game X.For example, the MAP score of ICI is 30.5% better than that of the IC+ model on Diggs.In addition, to validate the design choice of the independent acceptor role, we evaluate the performance of ICI and IC+ by varying .As reported in Table4, as  decreases, ICI and IC+ become more distinguishable, and the effectiveness of ICI grows more significant.Specifically, when  = 0.3, ICI improves IC+ by 4.1% and 10.7% in terms of AUC and MAP, respectively, underscoring the necessity of having the stand-alone probability .

Table 6 :
Performance of target recommendation in Game Y.
Table6reports the performance of each solution in August and September.Most notably, the treatment group ICI outperforms all competitors on both metrics and two monthly events.In August, ICI improves the best competitor Intimacy (resp.IC model) by 20.3% (resp.6.8%) in terms of invite rate (resp.pay rate).In addition, ICI improves the best competitor IC (resp.Intimacy) by 6.2% (resp.3.7%) on invite rate (resp.pay rate) in September.