Predicting Information Pathways Across Online Communities

The problem of community-level information pathway prediction (CLIPP) aims at predicting the transmission trajectory of content across online communities. A successful solution to CLIPP holds significance as it facilitates the distribution of valuable information to a larger audience and prevents the proliferation of misinformation. Notably, solving CLIPP is non-trivial as inter-community relationships and influence are unknown, information spread is multi-modal, and new content and new communities appear over time. In this work, we address CLIPP by collecting large-scale, multi-modal datasets to examine the diffusion of online YouTube videos on Reddit. We analyze these datasets to construct community influence graphs (CIGs) and develop a novel dynamic graph framework, INPAC (Information Pathway Across Online Communities), which incorporates CIGs to capture the temporal variability and multi-modal nature of video propagation across communities. Experimental results in both warm-start and cold-start scenarios show that INPAC outperforms seven baselines in CLIPP.


INTRODUCTION
Background.Social media users form communities based on their interests, beliefs, ethnicity, and geographical location [76,79].These communities are prevalent on popular social platforms such as Reddit, WhatsApp, and Telegram, enabling users to connect with like-minded individuals as well as consume and disseminate information in an interactive manner.As communities grow in size, they become hubs of information flow, facilitating the exchange of information across communities.Existing research has shown that online communities interact with and influence one another [19,47,52,96,100].
As information spreads from one community to the other, it can rapidly reach all members in the new community.While individual posts and hyperlinks may propagate in varying patterns, the underlying pathways on which information propagates remain relatively stable [23,84].Their stability is partially due to the behavior of common users who repeatedly spread information among the same communities, creating a reinforcing effect of the underlying information pathways.
The fast-paced evolution of social media has accelerated the spread of information, including a variety of content types ranging from news articles, commercial advertisements, to harmful content such as online rumors [91,117], fake news [109,115,118], hate speech [128], and political bias [44].The unmoderated spread of these contents can cause adverse social impacts.For example, the COVID-19 pandemic has led to the formation and growth of multiple online communities, such as subreddits r/CoronavirusUS, r/COVID-19Positive, and r/COVID19, where users discuss various topics related to the pandemic.These communities are interconnected, with similar topics and user groups, thus having a significant influence on each other.Sometimes misinformation proliferates in online communities, such as the unfounded claim that 5G technology can spread the virus [1,95].Despite a lack of scientific evidence, this conspiracy theory gained traction in several online communities, including r/conspiracy, r/5G, r/CoronavirusUS, and r/COVID19, causing unwarranted fear and concern among the public.
The Community-Level Information Pathway Prediction (CLIPP) problem seeks to predict the transmission trajectory of information among online communities.CLIPP is of significant importance as it enables prediction of communities where information, including problematic content, is likely to emerge and spread.Such capability can provide numerous benefits across a wide range of applications.Efficient prediction of misinformation spread with CLIPP can guide intervention strategies, while for advertising, CLIPP can refine strategies and maximize the efficacy of marketing campaigns, increasing the visibility of information and providing insights into the communities where their target audience is most active.
Challenges.Solving CLIPP is challenging.First, community-tocommunity influence is usually unknown [23,87], and the mechanisms of interactions between communities and how they impact users remains hidden [52].Different communities may have different norms, values, and communication patterns that influence the temporal patterns of information diffusion [111].In this case, we only observe where new content is propagated to the new communities and when it takes place.The underlying community influence, i.e., who influences the propagation, remains unknown.Most existing works focus on predicting information diffusion at the user level (i.e., microscopic influence) [58,84].Meanwhile, existing datasets [73,74,98,128] only contain limited information about community structures, making it difficult to study cross-community information spread.
Second, the spread of information is characterized by a complex and dynamic diffusion environment [63].Posts contain multi-modal signals, such as text, images, and videos [4,8,38].Diffusion patterns vary across content types.For example, misleading news and inflammatory microblogs spread faster and wider than true information [28,39,99].Niche content are usually shared within a few narrow-interest communities, while broad-interest contents create far-reaching cascades and reach several disparate communities [83,100,101].Understanding these propagation patterns is essential to predicting information spread across communities.
Our Work.In this work, we investigate the dynamics of communitylevel information flow while jointly addressing the challenges of complex diffusion environment and the continuously evolving information ecosystem.
We choose Reddit as the platform for studying community-level information diffusion since it provides numerous communities, named "subreddits," that are dedicated to specific topics or interests.Towards this goal, we collect two large-scale and multi-modal datasets that enable us to study the community-level diffusion of visual contents for information pathway prediction.Based on that, we identify distinct temporal patterns of information sharing using inter-activity time distribution, infer macroscopic community-tocommunity influence, and construct novel community influence graphs (CIGs).
We design INPAC, or Information Pathway Across Online Comm unities, a dynamic graph-based method to predict community-level information pathways using CIGs and content's multi-modal information (visual features and channel metadata).INPAC integrates structure, content semantics, and temporal information by utilizing

DATASET AND PROBLEM 2.1 Dataset Description
In this study, we aim to study the spread of visual content across communities on social media.To this end, we collect massive visual contents on YouTube and long-term community activity on Reddit.The reasons for selecting these two platforms in this study as follows: • YouTube is one of the most widely used video-sharing platforms that contains over 2.56 billion users 1 and provides a venue for users to upload, share, and view videos.• Reddit is one of the largest social platforms for content creation, rating, and sharing.It allows users to interact in a variety of communities (i.e., subreddits).Reddit is an ideal platform for studying the propagation of online visual contents such as YouTube videos because of its vast and diverse user base as well as its open-source nature and community structures.
As the first step, we collected 54 months of historical Reddit posts from January 2018 to June 2022 via PushShift 2 .We removed any posts that did not contain valid URLs and retained URLs associated with valid YouTube videos, resulting in 5,723,910 posts and 3,737,191

Problem Formulation
We formulate the CLIPP problem as follows: Given a video and a sequence of subreddits in which it has been posted, predict the next community the video will be posted in at a given time.Formally, we define a posting of a video as a video link appearing on a subreddit, either as a standalone post or as part of a longer post.A posting instance is represented as a 4-tuple    = (  ,   ,   ,   ), where   is a video posted by a user   in an online community   at time   .
The posting sequence for   is defined as a list of posting instances   = {(  ,   ,   ,   )}  =1 , which indicates the dissemination trajectory with length  across communities for the video   .Then, our problem can be defined as follows: Problem 1 (Information Pathway Prediction).Given a video   , its posting sequence   = {(  ,   ,   ,   )}  =1 with length  , and a target timestamp   ′ , our model outputs a ranked list of communities {  } indicating the most likely communities that   will appear at time   ′ .Table 3 summarizes a list of notations used in this paper.

THE PROPOSED FRAMEWORK: INPAC 3.1 Overview
In this work, we aim to study the propagation of online visual content on social media.To this end, we propose a dynamic graph framework INPAC based on Community Influence Graphs (CIGs) that learns the dynamics of cross-community information flow and accurately predicts information pathways.As shown in Figure 1,

Community Influence Modeling
Given a community (e.g., a subreddit), INPAC learns its embedding such that the embedding preserves its influence on other communities during information propagation.We infer the influence relationships between communities using content sharing patterns in those communities.Specifically, a video is usually shared in communities that have similar topics.For example, in Table 2, the video "Practical Greeting Phrases for Chinese New Year" is shared within a set of subreddits related to language learning, such as r/learnchinese and r/learnmandarin.To model this, we create a novel influence network by leveraging the video's temporal interaction patterns.
Influence Graph Construction.In the context of CLIPP, communitylevel influence is defined as the presence of causal relationships between posting of a video in two different communities.This can happen when two communities share a common group of users.
To infer the influence exerted by one community on another, we employ a sequence of communities { 1 ,  2 , . ..}, in which a video   is posted.Assuming that users require a certain amount of time to engage in online content, the interval between the appearance of a video   in two communities  1 and  2 serves as an indicator of the influence of  1 on the appearance of the video in  2 .If a video To model the propagation sequence of a video, we first identify its concurrent sharing events, where the propagation of the video takes place within a brief time period, referred to as a session, in the same or different communities.To this end, one needs to decide whether two shares are within the same session.A straightforward approach is to set a threshold time limit, such as one hour or one day, as is common in session-based recommender systems [6,27,61,69,110] However, this ad-hoc use of the time limit is insufficient as it can vary across datasets, videos, and platforms [37,78].
We note that consecutive sharing of a video can occur due to the same user or different users, resulting in differing sharing patterns and motivations.Therefore, we create two distributions of time difference between consecutive shares of each video   : (1) Δ Same , representing the time intervals between consecutive shares of   by the same user; (2) Δ Diff , representing the time intervals between the first share of   by different users.Figure 2 illustrates an example of a video's consecutive sharing on several communities over time by three users.From Figure 2, we can observe how the two time intervals Δ same 1 , Δ same 2 for the same user  1 as well as the two intervals Δ diff 1 , Δ diff 2 for different users  1 ,  2 ,  3 are computed.For Δ Same , it is important to consider that a user's multiple postings of the same video in different communities should not be viewed as one community influencing another.This is because users usually post the same content in various venues to enhance its visibility and attract more "likes" [20,40,116].This is not indicative of natural flow of content from one community to another.
Thus, we only utilize Δ Diff to identify community-level influence.Specifically, we plot the distribution of Δ Diff across sharing events of all videos, as shown in Figure 3, where the -axis represents the time interval in seconds with a logarithmic scale of base 10, and the -axis indicates the percentage.Then, we fit a Gaussian distribution to Δ Diff and found that the distribution has a mean of 6.844 and a standard deviation of 0.823 on the logarithmic scale.Based on this finding, we determine the cutoff time for partitioning sessions using Δ Thres =  − , where  is a hyperparameter that represents the confidence level for determining concurrent shares.When the time difference between two postings exceeds Δ Thres , the later posting is considered to be influenced by the former.Now, we construct the community influence graph (CIG) G   with respect to   based on the threshold Δ Thres .Each node in G   indicates a community   and a directed edge from   to   indicates   is influenced by   .Specifically, if two shares of   from different users occur within Δ Thres , they are considered concurrent postings in the same session and not influenced by each other.Otherwise, a directed edge is added from   to   for   <   in G   .Furthermore, when   is simultaneously shared by the same user in two different communities   and   , a bi-directional edge is added between these communities to reflect their mutual influence as a result of overlapping users.

Message Aggregation. After the construction of G 𝑆
, the graph is transformed from a multigraph to a weighted graph by merging multiple edges with the same source and destination nodes.Let E  denote the set of edges between   and   .The new edge weight   is calculated using the logarithmic value As G   consists of a number of periphery nodes, such as inactive online communities with few propagations, long-range dependencies should be considered to learn distinct node representation.To this end, we leverage the propagation scheme of APPNP [21] based on the personalized PageRank algorithm [2].APPNP adds a probability of teleporting back to the root node, which ensures that the PageRank score encodes the local neighborhood for every node and mitigates the oversmoothing issues.
Then, we obtain the embedding matrix S ( ) at layer  for communities involved in the -th propagation sequence   : where is the initial embedding matrix for all   ∈ G   .Â and D are the adjacency matrix and the diagonal degree matrix, respectively. ∈ [0, 1) is the teleport probability.
During training, we derive a probability distribution over all communities P(  +1 |  , G   ), which indicates the most likely community for the next share of   .This requires both the current status of the sharing and the global information about G   .The current status can be represented using the latest posting event encoded in s | G   | .For global information, we leverage soft-attention to derive   , the importance of each community in the posting sequence where w 1 ∈ R  is trainable parameter. (•) is the sigmoid activation function.
Finally, we compute the probability by taking linear transformation over the concatenation: where || is the concatenation operand.
is the concatenation of all community embeddings in the sessions.

Video Content Modeling
Given a video, INPAC encodes its visual content into a low-dimensional feature vector.The content modeling component of INPAC can utilize a diverse range of encoders.Here, we note that online visual content is highly diverse in terms of topics, languages, and subject matter.Therefore, the titles, descriptions, and metadata of these videos such as channel information, can provide valuable insights into their content.These additional data sources can be leveraged to better categorize and understand the content of videos.We thus utilize the titles, descriptions, and channel information as the static features for each video.Specifically, inspired by the success of pre-trained language models in natural language understanding [10,43,102,103], we encode the title and descriptions of each video   into a feature vector v  ∈ R  based on a multilingual version of MiniLM [104].Similarly, we encode each video's channel   ( ) into a feature vector c  ( ) , where  (•) :  →  maps each video to the channel that posts   .Then, the two feature vectors are aggregated into a joint representation ṽ = Aggr(v  , c  ( ) ). ( Here, a wide variety of aggregation schemes can be applied, including addition, concatenation, and element-wise multiplication, to obtain the joint representation.In Section 4.3, we investigate the impact of using different aggregation schemes for v  and c  ( ) .

Dynamic Modeling
In the dynamic modeling component, INPAC models the temporal variability of each video's propagation on communities, obtaining temporal embedding of videos and communities.Here, we note that a video can be shared multiple times within a short amount of time [60].Inspired by continuous-time dynamic graph (CTDG) [65,67,88,124], we design a dynamic modeling module to provide a robust representation of the video sharing process and better handles the bursty nature of information sharing.
First, we leverage temporal graph network (TGN) [88] and represent our dynamic network as a pair (G  0 , ) where G  0 is the initial state of the dynamic network represented as a static graph. is a set of graph events with timestamps.In INPAC, we consider two types of graph events, including node additions (i.e., the emergence of new videos and communities) and edge additions (i.e., a video is posted in an online community).
Input Encoding.The input embeddings x  () and x  () are raw feature representations for each video   and community   , respectively.We leverage the embeddings derived from Section 3.2-3.3as the raw node embeddings.Namely, x  () = ṽ for video   and x  () = s ()  for community   , where s ()  is the representation of   at the final layer in Equation 2. Time Encoding.Similar to [88,92,112], the time encoding function  (•) : R → R  maps a continuous timestamp to the -dimensional vector space: where w 2 , b 1 ∈ R  are learnable parameters.
Temporal Memory.As in [88], to track the propagation state for each node,   or   , at any timestamp, there exists a memory vector, h  () or h  (), to store history interactive memory in a compressed format.The memory of each node is initialized to zero and updated after each graph event.Given a node addition event of   ,   's message m   () at time  is computed as the concatenation of 's raw features and memory: where h  ( ′ ) is   's memory from time  ′ , i.e., the time of the previous interaction involving   .In the same manner, we obtain each community   's message m  () at  given   's event.
For an edge addition event involving   and   , the edge's message m   () with respect to   at  is computed as: Similarly, we can obtain the edge's message m   () with respect to   at .
During batch training, multiple events in the same batch can be associated with the same nodes.Therefore, we aggregate multiple messages of video   and community   from  1 to   through mean pooling, thus obtaining m  () and m  () as in [88].
Based on these messages, the memory embeddings of   and   are updated upon each event involving   and   , respectively: During prediction, we pass the representation h  (), h  () through multiple GNN layers to aggregate the features of each node from its neighbors on where ṽ  , s  are the transformed representation of   ,   .The aggregation function   (•, •) can be chosen from a wide range of GNN operators, such as GCN [50], GraphSAGE [29], Transformer-Conv [89], and GIN [113].In practice, we employ a 2-layer Graph Attention Network (GAT) [97].

Training
We employ element-wise multiplication to calculate the score between each video   and each community   at time : where ŷ   is the predicted score between   and   .We train our model using the Bayesian Personalized Ranking (BPR) [86] loss, which encourages the prediction of an observed interaction to be greater than an unobserved one: where (,  + ,  − , ) denotes an example in the pairwise training data.
+ indicates that one sharing of   is observed in community   + , and  − indicates an unobserved one.Furthermore, for the training of the community influence graph, we use the next item prediction objective.Given each G   , the loss function L  CE is defined as the cross-entropy of the predicted and ground-truth community that will propagate   at the next timestamp: where y  +1 ∈ R | S | is a one-hot vector that encodes the groundtruth community interacted at the next timestamp.The overall optimization objective is defined as follows: where Θ denotes all trainable model parameters. 1 and  2 are hyperparameters in INPAC.

EVALUATION
In this section, we conduct experiments to answer the following evaluation questions (EQs): (4) Representation Learning on Temporal Graphs, including TGAT [112], and TGN [88].

Metrics.
We measure the models' performances using three widely adopted metrics in the field of ranking systems: (1) recall@, which measures the proportion of relevant items (i.e., ground truth) that are retrieved among the top- items; (2) normalized discounted cumulative gain (NDCG@), which evaluates the ranking quality of the top- items, with a score of 1 assigned to the ideal ranking; (3) mean reciprocal rank (MRR), which computes the average reciprocal rank of the top-ranked relevant item.In this paper, we set  to 5 and 10.Our evaluation procedure follows the established method [17,34,62] by randomly selecting 100 communities with no observed propagations of the video and ranking the test item among the 100 items.Additionally, we exclude any existing interactions in the training set from the test set.

Implementation Details.
We implemented INPAC in PyTorch [81] and PyG [18].For a fair comparison, we set the embedding size to 64 in all methods including INPAC and perform Xavier initialization [22] on the model parameters.We use Adam optimizer [49] with a batch size of 256.For the baseline models, the hyperparameters are set to the optimal values as reported in the original paper.For all models, we search the learning rate within the range of [1 − 4, 3 − 4, 1 − 3, 3 − 3, 1 − 2] and select the best setting.We set  = 0.1,  = 3,  1 = 1, and  2 = 1 − 3, respectively., the number of GNN layers in Community Influence Modeling (Section 3.2), is set to 4.

Overall Performances
We conducted comparative experiments on 2 datasets to demonstrate the superiority of INPAC over the 7 baselines.To this end, we grouped the videos into warm-start and cold-start videos.We define warm-start and cold-start videos as videos with ≥ 2 postings and 1 postings in the training phase, respectively.Furthermore, the number of videos posted in communities creates an imbalanced distribution.For instance, in the small dataset, more than 20% of videos were posted on the two most popular subreddits.Since it is relatively trivial to make predictions for such popular subreddits, we split subreddits into popular (i.e., top 25 percentile subreddits where YouTube videos are posted most frequently) and non-popular (i.e., the rest of the subreddits).The results are partitioned with respect to whether the target community is a popular subreddit or a non-popular subreddit.

Warm-Start Prediction. Tables 4(a)-(b)
show the results for warm-start prediction for popular and non-popular subreddits, respectively, on the Large dataset.The results for the Small dataset can be found in Appendix B. We observe that INPAC consistently and significantly outperforms all baselines on both datasets for both groups of subreddits.On the Large dataset, INPAC outperforms the best baseline by 5.1% on NDCG@5 and 4.4% on MRR for the popular communities, as well as 6.8% on NDCG@5 and 5.8% on MRR for non-popular communities, respectively.On the Small dataset, INPAC outperforms the best competitor by 8.6% and 7.5% on the two metrics for popular communities, and 12.9% and 18.8% for popular communities, respectively.Our results demonstrate the effectiveness of INPAC in the task of CLIPP.Moreover, we observe that representation learning methods on temporal graphs (i.e., TGAT and TGN) outperform all other baselines.This observation underscores the importance of considering temporal information in predicting information pathways.

Cold-Start Prediction.
As the content sharing network evolves, the emergence and spread of new content to a diverse range of communities presents considerable challenges for CLIPP, particularly in cold-start scenarios where historical propagation of videos is absent.Thus, the prediction problem becomes: given a video that has only 1 propagation, how can we predict its second propagation?Tables 5(a)-(b) show the performances of seven baselines and IN-PAC for popular and non-popular subreddits, respectively, on the Large dataset.The results for the Small dataset can be found in Appendix B. We observe that INPAC is able to achieve even greater performance improvements in the cold-start scenario through its inductive reasoning capability, consistently outperforming all competitors on both datasets for both groups of subreddits.Moreover, from Table 5(a), we observed that when the cold-start videos are propagated to popular communities, predicting these flows is relatively straightforward for all the models, including INPAC.On the other hand, the results in Table 5(b) show that predicting the flow of cold-start videos to less popular communities is a more challenging task.Despite this, INPAC still shows the best performance.These results encourage further investigation into such flows, which we consider to be a potential area of future work.

Ablation Studies
We validate the effectiveness of the design choices in INPAC.In Section 3.2, we designed a way to construct community influence graphs (CIGs) by considering the time that videos were propagated in communities.To evaluate this design, we made 4 variants of INPAC: INPAC-Seq connects the community nodes sequentially, i.e., we create a directed edge from   to   if they are adjacent in the corresponding propagation sequence   .INPAC-FC establishes connections in a fully-connected manner, meaning that an edge is created between   and   if   precedes   in   .INPAC-G adopts the graph construction method of CIG as suggested by the GAINRec model [71].INPAC-C omits any content information about the video and its channel.Specifically, the video embeddings v  and the channel embedding c  ( ) in Eq. ( 5) are randomly initialized.
From Figure 4, we observe that INPAC-Seq exhibits the lowest performances.This result can be attributed to the limitations of the sequential connection method, which fails to capture the underlying influencing relationships between communities as manifested by the sharing events.On the other hand, INPAC-FC performs better than INPAC-Seq in terms of Rec@5 and Rec@10.However, the fully-connected method can potentially lead to spurious correlations.Perhaps surprisingly, INPAC-C outperforms both INPAC-Seq, INPAC-FC, and INPAC-G on most metrics, suggesting that the model can still achieve remarkable performance in the absence of content features, given that the Community Influence Graphs (CIGs) are properly constructed and modeled.This has broader implications for its applicability to other types of information with less available content, such as short online posts or URLs to misinformative websites.The superior performance of INPAC-C highlights the importance of the construction of the Community Influence Graph (CIG) in our approach.The CIG captures the interactions and influence patterns among communities, which is a crucial aspect when modeling the spread of information in online social networks.By focusing on the underlying social structures, our method is able to identify and predict the propagation of information more effectively than solely relying on content features.Overall, the method employed by INPAC achieves the best performances, demonstrating the effectiveness of our graph construction approach.Regarding the CIGs with multiple clusters, we analyzed the differences between the clusters and the factors that contributed to the video spreading across different clusters.In Figures 5(a)(c), the videos were first posted in highly active communities.As the videos gained visibility over time, they spread to different clusters of communities.For instance, in Figure 5(a), the video was initially shared in r/AskScienceDiscussion, a community focused on in-depth scientific discussions, which aligned with the video's original purpose.Subsequently, as the video gained popularity, it was shared by distinct users in highly active COVID-19 related communities such as r/CoronavirusUS and r/China_Flu.Furthermore, the video also sheds light on the poor living conditions of animals in produce markets, where animals are confined in stacked cages and subjected to unsanitary conditions, evoking sympathy among viewers regarding animal welfare.As a result, the video was shared in 5 topically similar communities related to vegetarianism and animal welfare, including r/Vegan, r/VeganActivism, r/PlantBasedDiet, r/AnimalRights, and r/animalwelfare.In fact, the same group of users spread the video to multiple semantically similar communities potentially due to overlapping interests.Our INPAC model successfully models these correlated sharing behaviors as a 5-clique.

Analysis of CIG
On the other hand, the CIGs in Figures 5(b)(d) exhibit a single cluster.We manually examined how these videos spread to communities with less obvious topical similarities.For instance, in Figure 5(d), the video first appeared in subreddits like r/WorshipTaylorSwift, a popular subreddits centered around the famous singer Taylor Swift, which directly relates to the posted video.Subsequently, the video propagated to multiple semantically distinct communities at different time periods.These communities included r/terracehouse, a subreddit about the reality TV show Terrace House, where users compared Taylor Swift's songs with the show's theme song and other famous singers' songs.Another example is r/NoStupidQuestions, a subreddit for discussing a wide range of curious questions, where a user shared this video and questioned people's obsession with Taylor Swift.Our key findings are as follows: • Initially, online content tends to be shared within communities that closely match its topic.As the content gains popularity, it gradually spreads to multiple communities with a broader range of topics.• Content is shared within topically similar communities in a short period, regardless of whether it is shared by the same user or different users.This observation aligns with previous studies [90] that found faster/slower information diffusion among topically similar/distant communities, respectively.• There exist "Super spreaders" on online platforms who actively engage in and disseminate content across multiple topically diverse communities.For example, we identified a user who played a significant role in spreading the video in Figure 5(a) across vegetarian-related subreddits.This user has posted a total of 118 YouTube videos, with 67 shared in vegetarian-related communities.Another similar observation from Figure 5(c) is a user who actively contributed to communities about emotions, philosophy, Marvel Comics, and anime before eventually spreading the video among depression-related subreddits.

RELATED WORKS 5.1 Information Diffusion
Modeling the spread of information in online social networks has been a challenging task.Previous works have investigated information diffusion on social media [16,23], prediction of popularity [3], social influence [58,84], and topological analysis of follower networks [54,93] for information sharing.While these studies cover a broad spectrum of social interactions in online communities, they generally focus on user-level influence and interactions.Research has shown that the dissemination of information within a community is different from that at the individual level [5,75,80,125].In this sense, diffusion models have been used to understand the spread of ideas, information and influence on social and information networks [59,77].Our study differs from the prior studies in its methodology as it endeavors to delve into the intricacies of community-level interactions.

DISCUSSION AND CONCLUSION
Inference of community influence pathways can provide important information about the structure and dynamics of online platforms and the resulting information flow in the platform.This work created and utilized this influence graph in a dynamic graph framework INPAC to predict the flow of YouTube videos across Reddit communities (subreddits).Some shortcomings of this work include: (i) studying only YouTube-Reddit data and (ii) difficulty in the validation of the inferred influence graph.Future work includes alternate approaches to generate and validate influence graphs, creation of new dynamic graph models to predict information flow, and using multi-platform data.

A DISCUSSION A.1 Difference between CLIPP and Recommendation Problems
A.1.1 Distinct underlying dynamics.In recommendation problems, the focus is on user behavior, as it largely reflects their interests, making it crucial to model user preferences accurately for precise recommendations.Group recommendation models [119] suggest items based on the combined preferences of users in a group, whereas sequential recommendation [42,110] models concentrate on individual users' preferences and the extent to which item attributes align with those preferences.On the other hand, the CLIPP problem encompasses a combination of factors that influence a user's decision to post a video within a community, where different users can share the video on different communities.An information-sharing event within a community is subject to factors such as user interests, community characteristics, and the relationship between the community and the information being shared.For example, a piece of information can be posted in some online community due to the following reasons: • Community members find the information valuable and wish to share it with other, driven by internal factors such as interest or altruism; • Some users who originally do not belong to the community want to promote their product or service to a wider audience; • Users with malicious intent seek to spread false or misleading information A.1.2The User Behaviors to be Modeled are Different.The goal of the proposed CLIPP problem requires simultaneous understanding of multiple users' behavior.One video can be shared by different users on different communities with completely different motivations.For example, a video  1 can be shared in community  1 by user  1 with positive intent (e.g., promoting the video) while by another user  2 in community  2 with negative intent (e.g., criticizing the video).Yet, the goal is still to predict the next community on which the video will appear.
A.1.3The Goals of the Problems are Different.The primary objective of the CLIPP problem is to model information flow across online communities rather than creating a recommender system.Although our proposed INPAC approach can be adapted for sequential recommendation, its primary focus is on capturing the complex interactions between users, communities, and information.Our experimental results (Tables 4-5) demonstrate that existing recommendation models were not designed to address the CLIPP problem and have inherent limitations when applied to it.
A.1.4The Datasets are not Directly Transferable.As the first three points suggest, existing recommender system datasets, such as LastFM3 , MovieLens4 , and Goodreads5 , are not directly applicable to solving the CLIPP problem, as they lack information about clearly defined online communities and the sharing of information across those online communities.This discrepancy highlights the need for distinct datasets that capture the complex dynamics specific to the CLIPP problem.In summary, although there may be some overlap between the methods used in recommender systems and the CLIPP problem, they are fundamentally different problems that require distinct approaches to model the unique interactions between users, communities, and information sharing.

A.2 Extension to Other Types of Features
Our proposed framework can be extended to handle other complex types of information, such as images and audio.We outline the simple modifications required to accommodate these data formats.Specifically, in Section 3.3, we can add the appropriate encoders for image or audio instead of using the encoder for video content.

Below are the potential encoders to use image and audio content
Image: To handle images, we can incorporate a variety of image encoders, such as CNN [55], ResNet [32], or Vision Transformer [14,114], which will convert each input image into a -dimensional feature vector.This vector can then be fed into our current model architecture as an input for community prediction.
Audio To accommodate audio data, there are several options for encoding audio into a -dimensional representation, including MFCC-based models [68], LSTM [24], or Transformer-based models such as Conformer [25].The choice of encoder would depend on the specific characteristics of the audio data, the acceptable level of computational costs, and the desired level of representation.

Figure 1 :Figure 2 :
Figure 1: The overview of our proposed INPAC framework, which consists of static modeling, including video content and community influence modeling, as well as dynamic modeling.

'Figure 3 :
Figure 3: Distribution of Δ Same (Left) and Δ Diff (Right) for videos on the Large dataset.

Figure 4 :
Figure 4: Performances of different methods for constructing the community influence graph (CIG) on the Small dataset.

Figure 5
Figure5presents the visualization of Community Influence Graphs (CIGs) for 4 videos with different topics (Section 3.2).Each video was propagated in exactly 20 communities.The node colors and sizes in the graphs represent the node degrees, while edge colors indicate the edge weights.We observe that CIGs generated from different videos demonstrate diverse connectivities and structures.We categorized the CIGs into two groups: (1) CIGs with multiple clusters, such as Figures5(a)(c); and (2) CIGs with a single cluster, such as Figures 5(b)(d).Regarding the CIGs with multiple clusters, we analyzed the differences between the clusters and the factors that contributed to the video spreading across different clusters.In Figures5(a)(c), the videos were first posted in highly active communities.As the videos gained visibility over time, they spread to different clusters of communities.For instance, in Figure5(a), the video was initially shared in r/AskScienceDiscussion, a community focused on in-depth scientific discussions, which aligned with the video's original purpose.Subsequently, as the video gained popularity, it was shared by distinct users in highly active COVID-19 related communities such as r/CoronavirusUS and r/China_Flu.Furthermore, the video also sheds light on the poor living conditions of animals in produce markets, where animals are confined in stacked cages and subjected to unsanitary conditions, evoking sympathy among viewers regarding animal welfare.As a result, the video was shared in 5 topically similar communities related to vegetarianism and animal welfare, including r/Vegan, r/VeganActivism, r/PlantBasedDiet, r/AnimalRights, and r/animalwelfare.In fact, the same group of users spread the video to multiple semantically similar communities potentially due to overlapping interests.Our INPAC model successfully models these correlated sharing behaviors as a 5-clique.On the other hand, the CIGs in Figures5(b)(d) exhibit a single cluster.We manually examined how these videos spread to communities with less obvious topical similarities.For instance, in, the video first appeared in subreddits like r/WorshipTaylorSwift, a popular subreddits centered around the famous singer Taylor Swift, which directly relates to the posted video.Subsequently, the video propagated to multiple semantically distinct communities at different time periods.These communities included r/terracehouse, a subreddit about the reality TV show Terrace House, where users compared Taylor Swift's songs with the show's theme song and other famous singers' songs.Another example is r/NoStupidQuestions, x i s t e n a l i s m r / I s i t B u l l s h i t r / a n y t h i n g r / d e p r e s s i o n r / S e r i e s X b o x r / T wo B e s t F r i e n d s P l a y r / v e g a n r / A s k S c i e n c e D i s c u s s i o n r / C o r o n a v i r u s U S r / C o r o n a v i r u s C o n s p i r a c y r / C h i n a _ F l u r / V e g a n A c v i s m r / P l a n t B a s e d D i e t r / a n i ma l we l f a r e r / A n i ma l R i g h t s r / t r u e g a mi n g r / P a t h fi n d e r _ R P G r / P a t h fi n d e r r / I n d i a n G a mi n g r / T a y l o r S wi r / p o p h e a d s r / Wo r s h i p T a y l o r S wi r / p a n i c a h e d i s c o r / c f s r / t e r r a c e h o u s e r / f r e n c h r / N o S t u p i d -Q u e s o n s r / G a y l o r S wi r / c h a n g e my v i e w r / C h a r a c t e r R a n t r / s a d

Figure 5 :
Figure 5: Community Influence Graphs (CIGs) of 4 different videos, all of which were propagated in exactly 20 communities.(a) How Wildlife Trade is Linked to Coronavirus; (b) Black Myth: Wukong -Official 13 Minutes Gameplay Trailer; (c) Thought experiment "BRAIN IN A VAT"; (d) Taylor Swift -ME! Node sizes and colors indicate the node degrees.Edge colors indicate the edge weights.

Table 1 :
Statistics of our datasets.
Contributions.Our main contributions are as follows:• Novel Multi-modal Datasets and Analysis: We collect two large-scale, multi-modal datasets to study community-level diffusion of visual contents for information pathway prediction.We identify distinct temporal content sharing patterns that are used to infer community-to-community influence graphs.• Information Pathway Prediction Framework: To solve CLIPP, we propose INPAC, a dynamic graph framework based on CIGs that learns from multimodal data and the dynamics of the interactions between users and communities.• Experimental Evaluation: We demonstrate the effectiveness of INPAC framework and its design choices through experiments in various scenarios, e.g., prediction of cold/warm-start videos on communities with various popularity.INPAC reaches performance improvements of up to 18.8% on MRR, 13.8% on NDCG@5, and 6.2% on Rec@5.

Table 2 :
Examples of cross-community information flow in our datasets.A video is usually shared on a set of semantically similar subreddits."→" indicates the temporal order of the sharing.

Table 3 :
Notations used in this paper.

Table 6 :
Performances on the Small dataset for warm-start videos