Maximizing the Diversity of Exposure in Online Social Networks by Identifying Users with Increased Susceptibility to Persuasion

Individuals may have a range of opinions on controversial topics. However, the ease of making friendships in online social networks tends to create groups of like-minded individuals, who propagate messages that reinforce existing opinions and ignore messages expressing opposite opinions. This creates a situation where there is a decrease in the diversity of messages to which users are exposed (diversity of exposure). This means that users do not easily get the chance to be exposed to messages containing alternative viewpoints; it is even more unlikely that they forward such messages to their friends. Increasing the chance that such messages are propagated implies that an individuals’ susceptibility to persuasion is increased, something that may ultimately increase the diversity of messages to which users are exposed. This article formulates a novel problem which aims to identify a small set of users for whom increasing susceptibility to persuasion maximizes the diversity of exposure of all users in the network. We study the properties of this problem and develop a method to find a solution with an approximation guarantee. For this, we first prove that the problem is neither submodular nor supermodular and then we develop submodular bounds for it. These bounds are used in the Sandwich framework to propose a method which approximates the solution using reverse sampling. The proposed method is validated using four real-world datasets. The obtained results demonstrate the superiority of the proposed method compared to baseline approaches.


INTRODUCTION
Online social networks allow users to create connections and broadcast messages expressing their opinions and beliefs much more easily than ever before.The effectiveness of these networks as a communication platform has essentially altered the way in which people form opinions and make decisions.However, as these platforms tend to connect primarily like-minded individuals, 42:2 A. Zareie and R. Sakellariou they often create so-called bubbles where users are presented with messages that predominantly match their existing opinions [2,39].This phenomenon is known as social bubbles (or filter bubbles).It has been argued that the strength of one's opinion towards a controversial topic increases when interacting with like-minded friends [47].Thus, social bubbles may reinforce users' existing opinions, polarize them and eventually place them into non-interacting groups with opposite opinions around a topic [8,22].Polarization has detrimental effects on societies; communicating diverse viewpoints or reaching consensus is difficult in polarized societies [36].As polarization is rooted in the lack of diversity of the messages circulated in social bubbles, it is the exposure of users to messages with diverse viewpoints that makes users aware of the stance of others in the network [28].This awareness may smooth out differences and moderate the intensity of opposite opinions [34,35].
Breaking social bubbles to minimize polarization in social networks has recently attracted considerable research attention.Some studies adopt an opinion dynamics model to find a solution for moderating opposite opinions.In these studies, the problem is tackled by linking users with opposing opinions [5,6,19,25,49] or convincing a set of users to change their opinions [34,35,37,41,48].Other studies adopt a spreading process for moderating opposite opinions.These studies [3,20,33,42,46] aim to maximize the diversity of messages to which users are exposed (diversity of exposure).For this purpose, a small set of influential users are identified to spread a set of messages with diverse viewpoints.
Motivated by the challenges associated with social bubbles and polarization, we address the problem of maximizing the diversity of exposure in social networks but from an entirely different and novel point of view.Consider a scenario where users with opposing opinions around a controversial topic spread messages to promote their opinion by influencing other users in the network; the opinion of each user can be obtained by applying opinion mining and sentiment analysis techniques [30].Although the propagation of messages with diverse viewpoints may help break social bubbles and moderate polarization, the tendency of users not to forward messages, if they do not match their existing opinions, is an obstacle for further propagation.An increase of susceptibility to persuasion of users may alleviate the effects of this obstacle.The property 'susceptibility to persuasion' refers to the extent to which a user is willing to accept messages received from their friends; susceptibility to persuasion can be increased by different strategies [7,16].A problem which arises here is how to identify a small set of users whose increased susceptibility to persuasion maximizes the diversity of exposure in the network; this article addresses this problem.In this problem, a social network contains a set of users whose opinion around a controversial topic is modelled as a value in [0, 1], where 1 generally indicates a favourable stance.We assume a set of users (with opposing opinions) initiates the propagation of messages to spread their opinion.When a user receives a message, they are exposed to the opinion expressed by this message.Then, depending on the user's susceptibility to persuasion, the user decides whether to propagate the message further or not.The diversity of the opinions to which a user is exposed determines the diversity of exposure of this user.The diversity of exposure of a network refers to the sum of the diversity of exposure of all users in the network.The goal of this article is to propose an algorithm to identify a set of users whose increased susceptibility to persuasion maximizes the diversity of exposure in the whole network.
The contributions of this article are listed below: • This article defines a novel problem to maximize the diversity of exposure from an angle hitherto not considered in the literature.The problem is to identify a set of users whose increased susceptibility to persuasion helps the propagation of messages with opposing viewpoints in a social network.To the best of our knowledge, this article is the first to define the 42:3 problem of increasing susceptibility to persuasion to counteract the effect of social bubbles and polarization.• We prove that this newly defined problem is NP-hard and show that an optimal solution to the problem cannot be approximated within a ratio of (1 − 1/e + ϵ ) (for any ϵ > 0) in polynomial time, unless P = NP.It is also proved that the problem is neither submodular nor supersubmodular.• A lower bound and an upper bound for the defined problem are developed and the submodularity of the bounds is proved.• To obtain an unbiased estimation of the problem, we describe a sampling method and apply the Sandwich framework [32] to get an approximate solution.• A set of experiments is designed and carried out to evaluate the proposed method.The results of the experiments demonstrate the superiority of the proposed method against compared baselines.
The rest of tjhis article is organised as follows: We start by reviewing the literature in Section 2. Section 3 formulates the problem.The properties of the problem are examined in Section 4. The details of the proposed method to solve the problem are described in Section 5. Section 6 reports experimental results to evaluate the method.Finally, Section 7 concludes this article and suggests future directions to the problem.

RELATED WORK
The impact of online social networks on users' opinion formation has led to an analysis of how controversial debates take place in such networks [9,21].Different studies have been dedicated to assessing the effect of social bubbles around these debates [15,23].The problem of breaking social bubbles has been addressed using two underlying models: opinion dynamics models [13,17] and information diffusion models [26].
In opinion dynamics models, every user is exposed to the opinion of all its neighbours; every user updates its opinion gradually by taking the weighted average of its own opinion and those of the neighbours.In this model, based on the formation of a users' opinion under the influence of neighbours, two approaches are explored in the literature to break social bubbles and reduce polarization: linking users with opposite opinions [5,6,19,25,49] and changing the opinion of a set of users [34,35,37,41,48].Changing the susceptibility to persuasion of a small set of users to maximize (or minimize) the opinion in a network towards a specific topic is discussed in [1] but without considering polarization or the overall impact on diversity.Furthermore, some studies [14,18] assess the problem from an adversary's side to maximize polarization in a network.
In information diffusion models, a small number of users are selected to spread messages in a network through interaction between the network's users.Spreading messages with diverse viewpoints may increase diversity of exposure in a network and break social bubbles.In [20], [42], and [46], the problem is defined as the identification of two separate sets of users to spread messages with two opposing viewpoints in the network with the aim of maximizing the number of users who are exposed to either both or none of the viewpoints.The problem is generalized in [3] by considering the identification of more than two separate sets of users to spread messages with opposing viewpoints with the aim of maximizing the number of viewpoints to which users are exposed.In [33], the authors aim to assign messages with diverse viewpoints to a set of users to maximize the diversity of exposure in a network.They discuss the problem of identifying a set of users and assigning them a set of messages with different viewpoints; the goal is to initiate the spread of messages with diverse viewpoints towards users.The authors prove the problem to be NP-hard, monotone, and submodular; they develop a sampling approach to solve the problem with an approximation guarantee.
The problem proposed in this article considers an information diffusion model as the underlying model.However, there is a key difference with previous work that aims to identify a number of users to spread messages with diverse viewpoints.In our article, we aim to identify a set of users who are exposed to different viewpoints.It is their increased susceptibility to persuasion that maximizes the diversity of exposure in the network.We believe that this assumption is realistic, since, in controversial debates, users share and spread their opinions and these opinions may be blocked by some users depending on how messages align with their opinion or conform to their biases.
The formulation of the problem in this article is related to the well-studied problem of influence maximization [26], which aims to identify a set of users to maximize the spread of a message in a network.This problem was first formulated as an optimization problem in [26].The authors proposed two information diffusion models (linear threshold and independent cascade models) to extend a greedy solution for the problem.In order to improve the efficiency of this method, a sampling approach was proposed in [4] to approximate the optimal solution within a ratio; this sampling approach was further improved in [38], [44], and [45].The information diffusion models are also applied in some studies [24,31] to maximize the opinion of users towards a specific topic.The problem discussed in this article differs from the abovementioned papers, as we aim to identify a small set of users for whom increasing the susceptibility to persuasion maximizes the diversity of exposure to messages.This problem has similarities with the problem of boosting information spread discussed in [11], [27], and [29]; this is defined as the problem of adding a set of edges or increasing the weight of a set of edges to maximize the number of users influenced by a given set of initial spreaders.However, the problem defined in this article is far more challenging, as it considers susceptibility to persuasion and multi-diffusion of messages with diverse viewpoints as well as the aim of maximizing diversity of exposure rather than simply maximizing the number of influenced users.
In the rest of this article, for the sake of simplicity, we often use susceptibility and exposure level to refer to susceptibility to persuasion and diversity of exposure, respectively.Note that we also use the terms opinion and viewpoint alternatively.

PROBLEM FORMULATION
Consider a social network modelled by a weighted directed graph G = (V , E), where V is a set of nodes, corresponding to the users of the network, and E is a set of edges, corresponding to the relationships between pairs of users.Assume every node v i ∈ V has an opinion s i towards a controversial topic, which can be quantified by a number in [0, 1], where a higher value corresponds to a more favourable opinion towards the topic.Also, every node v i ∈ V has a certain degree of susceptibility to be persuaded by an opinion s r , which is denoted by w (r )  ii .Every edge e i j ∈ E, from node v i to v j , has a weight w i j ∈ [0, 1] indicating the spreading probability form v i to v j ; in this case, v j is an outgoing neighbour of v i and v i is an incoming neighbour of v j .The notation N + i and N − i is used to denote the set of outgoing neighbours and incoming neighbours of v i , respectively.Table 1 lists the most frequently used notations in this article.

Diffusion Model
The Independent Cascade (IC) [26] is one of the most popular diffusion models used to simulate the spreading process in social networks.In this model, each node can be in either active or inactive states; a node exposed to a message is activated and forwards the message to its inactive outgoing neighbours.In the problem proposed in this paper, a node may be exposed to a message but is not activated when the node's opinion disagrees with the viewpoints expressed in the message.Thus, a node i and its opinion around a topic, respectively N + i , N − i the set of outgoing and incoming neighbours of v i , respectively e i j , w i j an edge from node v i to node v j and the spreading probability of the edge, respectively w (r )   ii the degree of susceptibility of v i to be persuaded by an opinion s r C, C r a set of cascades and a cascade spreading opinion s r , respectively M j , M L j the set of opinions to which node v j is exposed, without persuading any node or when persuading a set L of nodes, respectively the exposure level of a node exposed to opinion set the expected exposure level of a network by selecting a set L as the persuaded node set k, α the number of nodes for persuasion and the persuasion rate, respectively we extend the IC model to propose an extended IC (EIC) diffusion model which differentiates between exposed and active states.Given a set of initial spreaders A, EIC supposes that every spreader v i ∈ A initiates the spread of a message expressing an opinion; the initial spreaders with the same opinion s r form a cascade C r .We use C and |C | to denote a set of cascades and the number of cascades, respectively.In EIC, a set of cascades C independently spreads messages in discrete and successive steps; in every cascade C r ∈ C, every node v j can be in either inactive, exposed or active state.Initially, in step t = 0, every initial spreader with opinion s r is activated by cascade C r ; all other nodes are set to inactive state in all cascades.In every step t > 0, every node v i , activated by cascade C r in step t − 1, goes to an inactive state after its attempt to forward the message (corresponding to cascade C r ) to every inactive node v j ∈ N + i with probability w i j .If v i forwards the message to v j , then v j is exposed to cascade C r (and consequently opinion s r ).Every node v j exposed to cascade C r is activated by the cascade with probability w (r ) j j denoting the susceptibility to persuasion of node v j by cascade C r .This spreading process continues until there is no node in an active state in any cascade.At the end of the process, the exposure level of every node v j is determined based on the opinion of the cascades that the node is exposed to during the spreading process.
In order to calculate the exposure level of node v j at the end of the spreading process, we apply Equation (1), which is inspired by information theory [43].Suppose M j is a set containing the opinion of the cascades to which v j is exposed.First, we add values 0 and 1 (which are extreme opinions) to the set M j and sort the values in the set in ascending order.Then, the exposure level of node v j is calculated by function f (M j ) in Equation (1).
where n and m r denote the number of elements (values) and the r -th element in the sorted set M j , respectively; the expression (m r +1 − m r ) indicates the difference between two consecutive opinions in the sorted set.The value of f (M j ) falls in [0, 1], where a higher value expresses a greater diversity of the opinions in M j and accordingly a greater exposure level of node v j .

Problem Modelling
Given a weighted directed graph G (V , E), a set of initial spreaders A forming a set of cascades C with diverse opinions, an integer value k and a value α ∈ (0, 1), the problem is defined as the identification of a set L * containing at most k nodes (called persuaded node set in the rest of this article) whose increased susceptibility to persuasion by α (called persuasion rate in the rest of this atticle) maximizes the expected exposure level of the network (which is the sum of the exposure level of all nodes in the network).If v i ∈ L * , it means v i is selected for persuasion.This problem, called Maximizing the Diversity of Exposure by increasing Susceptibility to persuasion (MDES), is formally defined as: where the function B(L) is the expected exposure level of a network when the susceptibility of the nodes in L to every cascade C r ∈ C is increased by α (i.e., w (r ) j j = w (r ) j j + α for all C r ∈ C and all v j ∈ L).The value of B(L) is calculated using Equation (3): where M L j denotes a set containing the opinion of the cascades to which v j is exposed when the set L is the persuaded node set and f (M L j ) denotes the exposure level of v j when the set L is the persuaded node set; recall that f (M L j ) can be calculated using Equation (1).To estimate the exposure level under the EIC diffusion model in what follows, we describe the concept of a sample.Given a graph G = (V , E), we create a loop-graph G (V , E ) in which every node ii is associated with a weight w (r ) ii + α if v i ∈ L, otherwise with a weight w (r )  ii .Recall that w (r )  ii denotes the susceptibility of node v i to opinion s r .For the sake of simplicity, we indicate every edge in E by e i j with a corresponding weight w i j ; this edge may originate from node v i to a distinct node v j (i.e., i j) or may originate from a node v i to itself (i.e., i = j).A sample д of G is obtained by independently sampling every edge e i j ∈ E at random with probability w i j .Let Pr[д] be the probability that a sample д is generated.Pr[д] is calculated by Equation (4): Let an indicator variable be P (r ) i j (д) = 1 if there is a path from node v i to v j in sample д such that for every node v k in this path e (r )  kk ∈ д; otherwise, P (r ) i j (д) = 0.The indicator variable P (r ) i j (д) indicates whether in sample д node v j is reachable from v i for cascade C r (with opinion s r ).Given the persuaded node set L, the opinions to which node v j is exposed in sample д are determined as: Now, we can rewrite B(L) as: where G denotes all possible samples of G and f (M L j (д)) denotes the exposure level of v j in sample д when the set L is the persuaded node set.

PROPERTIES OF THE PROBLEM 4.1 Hardness and Approximation
In this section, the NP-hardness of the MDES problem is proved first; then, in order to propose a method which guarantees an approximation of the optimal solution, the monotonicity and modularity of the problem are analysed.If a problem is monotone and modular, a simple greedy algorithm can approximate the optimal solution within a ratio of (1 − 1/e).This greedy algorithm selects a node set L in k iterations; in each iteration, node v i whose addition to L maximizes the exposure level of the network, i.e., B(L ∪ {v i }), is selected for persuasion and is added to L.

Theorem 1. The MDES problem is NP-hard under the EIC diffusion model.
Proof.We prove the theorem by reducing from the k-CNF satisfiability (k-SAT) problem [10], which is well known as an NP-complete problem.Given a boolean expression in the conjunctions of clauses ), the k-SAT problem aims to establish whether there is an assignment to the literals for which the expression is true.
Consider a restricted instance of MDES, as shown in Figure 1.In this instance, there are n cascades i contains the outgoing neighbours of the nodes in cascade C r , and the set Y containing all the nodes V \ A. Suppose the spreading probability is 1 on the edges and the susceptibility to persuasion to every cascade is 0 for every node in sets {X 1 • • • X n } and suppose we can increase the susceptibility to 1 (i.e., α = 1).We use the literal x (r ) i = 1 to express that the i-th node in set X r is selected as a persuaded node; otherwise, x (r ) i = 0. Every node v j ∈ Y \X r is exposed to cascade C r if and only if there is a node v i in set X r for which x (r )  i = 1 and v j ∈ N + i .This statement can be denoted as a clause or not.This conjunction can be extended as a boolean expression ∧ determine whether all nodes in Y are exposed to all cascades or not.Now, the problem changes to whether there is a selection of nodes in X r for r = 1 • • • n so that the Boolean expression is true.This implies that the MDES problem can be reduced to the k-SAT problem; hence, the theorem is proved.

Theorem 2. The function B(L) is monotone under the EIC diffusion model.
Proof.The function δ (•) is monotone if δ (S ) ≤ δ (T ) for all S and T such that S ⊆ T .It is clear that adding a new node to the persuaded node set does not lead to decreasing the exposure level of any node v i ∈ V .Formally, B(S ) ≤ B(T ) for all S ⊆ T .This implies that B(L) is monotone.
for all S ⊆ T .We use a counter example to show MDES is not submodular.Take for example the graph shown in Figure 2; let us suppose nodes 1 and 8 (with opinion 0.33 and 0.66, respectively) are the initial spreaders, i.e., A = {1, 8}, and also suppose the weight of all edges is 1.Node 1 forms cascade C 1 and node 8 forms cascade C 2 ; assume the susceptibility to persuasion of all nodes v i ∈ V \ A to both cascades is zero (i.e., w (r )  ii = 0 ∀v i ∈ V \ A ∧ ∀C r ∈ C), and that we can increase the susceptibility of each selected node to 1 (i.e., α = 1) for all cascades.The exposure level of a node is 0.58 if the node is exposed to only one cascade (see Equation ( 1)); clearly, the exposure level of a node is 1 if the node is exposed to both cascades and the exposure level of a node is 0 if the node is exposed to none of the cascades.Assume S = {}, T = {2} and v i = 3.In this case, B(S ) = 1.58 (see Equation ( 3)), because node 2 is exposed to only cascade 1, and node 7 is exposed to both cascades; none of these nodes are activated because their susceptibility to persuasion by both cascades is zero.Adding Since B(L) is neither submodular nor supermodular, the greedy procedure cannot guarantee an approximation of the optimal solution for the MDES problem.In the next section, to explore the approximability of the MDES problem, we define submodular lower and upper bounds for the problem.

Submodular Bounds
As can be seen from Theorem 3, each node has an exposure level in the original network (i.e., without persuading any node).According to this exposure level, persuading the node has an impact on the exposure level of the network.However, selecting a node for persuasion (node persuasion) may affect the exposure level of other nodes and result in changing the impact of their persuasion on the exposure level of the network.This effect of node persuasion, called persuasion effect in the rest of this article, results in non-modularity of function B in Equation (3).In this section, we develop a lower bound B and an upper bound B of function B by ignoring the persuasion effect.

42:9
In order to ignore the persuasion effect to develop a lower bound B, while selecting a new node v j for persuasion (adding to set L), the effect of the already selected nodes in L on the exposure level of v j is ignored.This supposition counteracts the persuasion effect of the already selected nodes on the impact of persuading node v j .In the same way, in order to get an upper bound B, while selecting a new node v j for persuasion (adding to set L), it is supposed that all other nodes, i.e., v i ∈ V \ {A∪v j }, have been already selected for persuasion; again, this supposition counteracts the persuasion effect of already selected nodes on the exposure level of v j .Proof.Given a node set L, suppose that every node v j ∈ V \ {A ∪ L} is exposed to every cascade C r ∈ C with probability p(j |L).To prove the submodularity of the function, we need to prove for every node v i and every set L ⊆ T ⊆ {V \ A} that the following inequality holds:

We can write p(j |L + {v
. Thus, we need to prove Due to the definition of the lower bound, which ignores the effect of the already selected node set L on the exposure level of v i , the value of p(j |{v i }) is equal in both sides of the inequality.Given the monotonicity property of the problem: Thus, the submodularity of the lower bound B is proved.The submodularity of the upper bound B can be proved in a similar way.
In the next section, the upper and lower bounds are exploited to design a data-dependent scheme which guarantees an approximation of the optimal solution for the MDES problem.

PROPOSED METHOD
In order to propose a method to solve the MDES problem, we use the Sandwich framework [32], which exploits the bounds of the problem to guarantee an approximation solution.This framework demands an approximation solution for each of the lower bound, upper bound and objective functions of the problem.In Section 5.1, we provide some details of the Sandwich framework.Then, in Section 5.2, an approach to find an approximation solution for a lower bound, upper bound and objective functions of the problem is given.

The Sandwich Framework
In general, there is no standard procedure to guarantee an approximation solution for nonsubmodular optimization problems.Lu et al. [32] suggested the Sandwich framework to get an approximation solution for non-submodular problems for which we can find a submodular upper bound and/or lower bound.The framework applies a modular lower bound and/or upper bound of the problem to obtain a data-dependent approximation solution to the problem.It first finds a solution for each of the lower bound, upper bound and objective functions.Then, the solution which maximizes the objective function is returned as the solution of the problem.How the Sandwich framework can be used to get an approximation solution for the MDES problem is shown in Algorithm 1. Algorithm 1, in lines 1-3, uses a sampling approach to find efficiently an approximation solution for each of the lower bound, upper bound and objective function.Since the sampling approach guarantees approximation solutions for both bounds and objective, Algorithm 1 returns a solution L such that: where L * is the optimal solution which maximizes the objective function B of the problem [32].

Finding Approximation Solutions
In this section, we first explain the sampling approach to determine an approximation solution.
Then, we describe the details about how a sample is generated and applied to determine the impact of persuading each node on the exposure level of the network.

5.2.1
The Sampling Approach.The authors in [4] proposed an efficient sampling approach to find an unbiased estimator of the objective function for the influence maximization problem.This approach includes two steps to generate a sample д: (i) in the first step, a target node v j ∈ V is randomly chosen; (ii) in the second step, starting from v j , the diffusion process in a reverse direction is simulated using breadth-first search (BFS) to determine a set of nodes which can reach v j ; this set is called reverse reachable set RR. Node v i covers set RR if v i ∈ RR; covering RR by v i indicates that v i can influence v j .By generating a set G of samples, the set of nodes, which can cover the maximum number of the RR sets, is taken as the set with the most influential nodes.If the number of samples is sufficiently large, this sampling approach provides a solution with an approximation guarantee.
In the MDES problem, given a loop-graph G (V , E ), a set of initial spreaders A which forms a cascade set C, and a set L of persuaded nodes, a sampling approach can estimate B(L) (the impact of persuading the node set L on the exposure level of the network).The sampling approach randomly chooses a target node v j ∈ V .Then, the EIC diffusion process is simulated in the reverse direction to determine a sample д and generate a reverse reachable set RR.According to the reverse reachable set RR, if v i ∈ {A ∩ RR} and s i = s r , node v j is exposed to cascade C r .We use Equation ( 5) to determine the set M L j (д) and use Equation (1) to calculate the exposure level of v j in sample д.Let G denote a set of samples generated for G ; the notation д ∼ G and v j ∼ V indicates that sample д and node v j are randomly picked from G and V , respectively.Theorem 5.For any seed set where f (M L j (д)) denotes the exposure level of target node v j in sample д when L is the persuaded node set.
is an unbiased estimator of B(L), where θ denotes the sampling size which is the number of samples in G.In a similar way, this theorem can be proved to show that the lower and upper bounds of the problem can be estimated using a sampling approach.Thus, we utilize a sampling approach to guarantee an approximation solution for the lower bound, upper bound and objective function of the MDES problem.
Algorithm 2 shows the pseudo-code of the sampling approach.As the first step of the sampling approach, in lines 3-6, a set G containing θ samples is generated.As the second step of the sampling approach, in lines 8-10, a greedy approach is applied to iteratively select the node with the largest marginal gain based on the sample set G. Note, δ can be B, B or B, which correspond to the lower bound, upper bound and objective function of the MDES problem, respectively.The sample set G is used in these functions to determine the impact of persuading each node on the exposure level of the network, i.e., the marginal gain of the node (explained in Section 5.2.2).
If the sampling size θ is sufficiently large, the sampling approach can approximate a solution for the lower and upper bounds of the problem.To determine the sampling size, we apply the approach proposed by Tang et al. [44] which, given the values ϵ and γ , returns a (1 − 1/e − ϵ )approximation solution with probability 1 − |V | −γ .This approach initializes the sampling set G to empty and iterates at most log 2 |V | − 1 times.In each iteration, a set of samples is generated and added to G to find a tight lower bound of the optimal solution.Then, the value of θ is determined based on the lower bound.

Generating a
Sample.This section describes in detail how to generate a sample and determine the impact of persuading each node on the exposure level of the network.For each sample, we store the edges and the nodes of the sample.For each node v i in a sample, three vectors I i , T i and D i with |C | entries are stored.The r th entry in vector I i , corresponding to loop edge w (r )  ii , indicates that if node v i is exposed to cascade C r , the node can be in one of three states: (i) can be activated (live in cascade C r ); (ii) can be activated if persuaded (live-upon-persuasion in cascade C r ), or (iii) cannot be activated (blocked in cascade C r ).We use the values 1,0 and −1 to express the three states: loop edge is live, live-upon-persuasion or blocked, respectively.The r th entry of vector T i corresponds to the paths from node v i to the target node for cascade C r .The value T ir = 1, if there is a path from v i to the target node such that every node on the path is live in cascade C r .The value T ir = 0, if there is a path from v i to the target node such that every node on the path is live or live-upon-persuasion in cascade C r .Otherwise, T ir = −1.The r th entry of vector D i corresponds to the paths from an initial spreader in the r th cascade to node v i .The value D ir can be 1, 0, or −1, according to the nodes on the paths from the initial spreader in the r th cascade to node v i , which is determined in the same way described for the value T ir .
Algorithm 3 shows the pseudo-code to generate a sample д for a target node v j .In lines 4-9, the algorithm initiates every node to be live or live-upon-persuasion or blocked.Then, in lines 12-31, the BFS traversal algorithm is applied to simulate the EIC diffusion model in a reverse direction from target node v j to the initial spreader nodes in set A. In this traversal, every live edge e if , every visited node v i and its vectors (T i , I i , D i ) are calculated and stored in template sample дT .Then, the values in vector D of the stored nodes, which correspond to the paths from an initial spreader to the nodes, must be updated.For this purpose, the sample is traversed from the initial spreader nodes to the target node (lines [32][33][34][35][36][37][38][39][40].In this traversal, starting from initial spreaders, the set of nodes which have no impact on the exposure level of the network is removed from дT to determine the sample д.Note that the function rand () in Algorithm 3 (lines 5 and 21) generates a random value in [0, 1] with uniform distribution.
Consider a sample д ∈ G with a target node v j ∈ V .
To calculate the impact of persuading each node and determine the node with maximum marginal gain (line 9 of Algorithm 2) for finding an approximation solution for each of the lower bound, upper bound and objective function the following rules are used, respectively: • for the lower bound, the rule used is: persuading node v i results in the exposure of v j to cascade C r if D jr = 0, I ir = 0, T ir = 1 and D ir = 1.• for the upper bound, the rule used is: persuading node v i results in the exposure of v j to cascade C r if D jr = 0, I ir = 0, T ir ∈ {0, 1} and D ir ∈ {0, 1}.• for the objective function, the rule is the one for the lower bound; then, after adding the node v i with maximum marginal gain to L, every sample д ∈ G which contains node v i must be updated.After adding v i to L, a sample д is updated in three steps: (i) every liveupon-persuasion loop edges w (r )  ii is set to live (i.e., for every C r ∈ C, if I ir = 0, I ir is set to 1); (ii) vector T of the nodes in the sample is updated using BFS traversal algorithm starting from v i to the initial spreaders (in the same way in lines 12-30 of Algorithm 3); (iii) vector D of the nodes in the sample is updated using BFS traversal algorithm starting from v i to the target node (in the same way in lines 31-39 of Algorithm 3).

EVALUATION 6.1 Settings
Datasets.We use four real-world datasets to evaluate the performance of our method.The properties of these datasets, namely the number of nodes |V |, the number of edges |E|, the graph density D = |E|/(|V | • |V | − 1) and the average weight of the edges w , are reported in Table 2.These  [12], is a network containing a set of users with an opinion around the Delhi legislative assembly elections of 2013.BRT, IPN and USE, collected by Garimella et al. [20], are networks containing a set of users with an opinion around Brexit, iPhone/Samsung cellphones and the US election, respectively.In these four datasets, the opinion of each user is estimated based on the probability that a user retweets the contents expressing different viewpoints.The weight of every edge e i j is set to 1/N − j [26].Methods.We denote our proposed method by the acronym SS (Sandwish-Sampling).To the best of our knowledge, there is no algorithm directly applicable to the MDES problem, thus, we use five baseline methods for comparison.These baseline methods apply simple strategies to determine a persuaded node set L containing k nodes: (i) Outgoing Degree (OD) selects the set of nodes with the maximum number of outgoing neighbours; (ii) Incoming Degree (ID) determines the incoming degree of every node v i as ID i = v j ∈N − i ∧v j ∈A w ji and selects the set of nodes with maximum incoming degree; (iii) Betweenness (BT) selects the set of nodes with highest betweenness on the paths from initial spreaders to the other nodes; (iv) Lower Extremity (LE) in which no node is persuaded; and (v) Upper Extremity (UE) in which all nodes are considered as persuaded.Clearly, the last two methods act as a lower bound and an (ideal) upper bound for the performance of different methods.
Parameters.To determine the sampling size in the sampling approach (line 2 of Algorithm 2) of the proposed method, we set the value of γ and ϵ to 1 and 0.5, respectively, following [44].In each experiment, the initial spreader nodes are selected randomly according to the outgoing degree distribution of nodes, as it is more likely that users with more followers may spread messages.In order to estimate the susceptibility to the persuasion of a user by a cascade, it makes sense to consider the similarity extent of a user's opinion and the viewpoint of the message associated with the cascade [40].Thus, the susceptibility to persuasion of node v j by cascade C r is set to , where s j and s r denote the opinion of v j and the viewpoint of the message associated to cascade C r , respectively.Evaluation Strategies.Three sets of experiments are conducted to assess the performance of the proposed method.In the first set, the effectiveness of the proposed method in terms of maximizing the exposure level of the network is evaluated and compared against the baseline methods.In the second set, the efficiency of the proposed method in terms of running time is evaluated and compared against the baseline methods.Finally, in the third set, the quality of the solutions approximated by the Sandwich framework of the proposed method is investigated.All experiments are repeated for 20 different sets of initial spreaders A, randomly generated.

Experimental Results
Effectiveness.In the first set of experiments, for each one of the 20 sets of initial spreaders, A, each method is used to select a set L containing k nodes and increase the susceptibility to persuasion of these nodes by a value α.Then, the EIC diffusion model (defined in Section 3) is applied to simulate the spreading process and determine a set of cascades to which every node v j is exposed (i.e., M L j ).For every method, the expected exposure level of the network, B(L), is calculated according to Equation (3).In order to increase confidence in the results obtained by the spreading process, we repeat the EIC model 100 times for each set L. The mean exposure level obtained by 100 times of the EIC diffusion model for 20 different sets of initial spreaders A is considered as the impact of the method.As the scale of impact of each method varies greatly with respect to the number of |A|, k and α, we report the impact ratio of each method to make it easier to compare different methods.The impact ratio of every method X is calculated as B X (L) B LE (L) , where B X (L) and B LE (L) indicate the impact of method X and the impact of method LE (Lower Extremity, see previous section), respectively.A greater value of impact ratio indicates a higher effectiveness of method X.
Three different experiments are conducted in this section to assess the effect of varying each of |A|, k and α on the effectiveness of methods.
In the first experiment, we evaluate the effect of varying the number of initial spreaders (|A|) on the impact ratio of methods.For this purpose, the parameters k and α are set to 30 and 0.3, respectively.The number of initial spreaders varies from 10 to 60.The obtained results from this experiment are shown in Figure 3. Clearly, UE results in the best impact ratio that can be achieved (by persuading all nodes).Our proposed method, SS, outperforms all other baseline methods, except in the case of the DHL network where OD results in a better impact ratio for values |A| = 20 and 40.The reason may be due to the high density and the small size of DHL that allows OD to take advantage of nodes with a large number of outgoing neighbours and expose them to a larger number of cascades.
The second experiment is dedicated to assessing the effect of varying the value of k (the number of persuaded nodes) on the impact ratio of the methods.For this purpose, the parameters |A| and α are set to 30 and 0.3, respectively.The value of k varies from 10 to 60. Figure 4 shows the obtained results from this experiment.As expected, the exposure level of a network increases as the value of k increases.In all cases, SS outperforms all other methods.Note that, once again, the results of OD are very close to the results of SS for the DHL network.The figure suggests that, regardless of the method, as the value of k increases the rate of increase of the exposure level in the two large networks (USE and IPN) is smaller than than the rate of increase in the two small networks (DHL and BRT).This may mean that in large networks a correspondingly large number of persuaded nodes should be used to increase the exposure level significantly.
The third experiment evaluates the effect of varying the value of persuasion rate, α, on the impact ratio of the methods.In this experiment, the parameters A and k are both set to 30.The value of α varies from 0.1 to 0.9. Figure 5 summarizes the obtained results from this experiment.It is interesting that the figure suggests that as the value of α increases from 0.1 to 0.5, the exposure level of the network increases significantly, something which is less profound when α exceeds 0.5.This means that persuading users up to a certain extent may be sufficient to increase the exposure level of the network.Once again SS outperforms all other baseline methods (with the exception of UE, which, as mentioned, is an ideal upper bound).Note that, once again due to the structure 42:17   criterion (either no node is persuaded or all nodes are persuaded, respectively) their running time is negligible and they are not included.The results suggest that SS does not have a prohibitively high running time compared to other methods; for example, BT takes longer to run in three of the four networks; this is because BT relies on betweenness which needs to calculate the shortest paths.It is also interesting that although the USE network is larger than BRT and IPN, the running time of SS in BRT and IPN is higher.The reason may be that in BRT and IPN, due to the greater average weight of edges, messages spread more widely, which implies that SS needs to generate more samples to find a solution.
Approximation Quality of Sandwich Framework.Recall the approximation ratio yielded by the Sandwich framework (which was reported in Equation ( 7)).The approximation ratio relies on how well the quantity B (L) B (L) has been approximated.Note that this ratio takes values between 0 and 1 with higher values suggesting better approximation quality.An experiment is carried out to measure this ratio using the parameter α = 0.3, values |A| = 30 and |A| = 60 and varying k from 10 to 60 in steps of 10. Figure 7 shows the obtained results on four networks.In all networks, as the value of k increases, the quality also increases which suggests that the gap between the upper bound and the objective functions of the proposed framework decreases for greater values of k.The approximation has the least quality in DHL compared to the other networks.In general, the results suggest the approximation quality is higher in larger networks; for instance, the quality is higher than 0.7 in the two largest networks IPN and USE.

Theorem 4 .
The function B(L) is submodular under the EIC diffusion model.

ALGORITHM 1 :
The Sandwich Framework Data: Loop-graph G (V , E ), value α, the number of nodes for persuasion k and initial spreader set A Result: Set L containing k nodes 1 find L as an approximation solution for the lower bound (Algorithm 2 (G , α, k, B, A)); 2 find L as an approximation solution for the upper bound (Algorithm 2 (G , α, k, B, A)); 3 find μ as a solution for the objective function (Algorithm 2 (G , α, k, B, A)); 4 let L = max σ ∈{L, μ, L } B(σ ); 5 return L;

ALGORITHM 2 :
Determine an approximation solution Data: Loop-graph G (V , E ), value α, the number of nodes for persuasion k, function δ and initial spreader set A Result: Set R containing k nodes 1 G = {}; 2 determine sampling size θ ; // (Based on the given values ϵ and γ ) 3 while |G| < θ do 4 choose a random target node v j ; 5 determine a sample д for the target node

13 < 18 T 19 D ir = − 1 ; 20 I ir = − 1 ; 21 ρii + α ) then 25 I ir = 0; 26 if (v i A) then 27 add 28 QT 29 else 30 set
v f , T f , I f , D f >← QT .remove ();14 foreach v i ∈ N − f do 15 if (r and () < w if ) then 16 add e if to sample дT ;17 for r = 1 to |C | do ir = min(I f r , T f r ); = r and (); < v i , T i , I i , D i > to sample дT ; .add(< v i , T i , I i , D i >); D ir = 1 (node v i is an initial spreader in cascade C r );

Fig. 3 .
Fig. 3. Impact ratio of methods with respect to the number of initial spreaders (|A|).

Fig. 4 .
Fig. 4. Impact ratio of methods with respect to the number of persuaded nodes (k).

Fig. 5 .
Fig. 5. Impact ratio of methods with respect to the persuasion rate (α).

Fig. 6 .
Fig. 6.The running time of different methods.

Fig. 7 .
Fig. 7. Quality of the solutions approximated by the Sandwich framework.

Table 2 .
The Properties of the Datasets Used in the Experiments