Bridging Performance of X (formerly known as Twitter) Users: A Predictor of Subjective Well-Being During the Pandemic

The outbreak of the COVID-19 pandemic triggered the perils of misinformation over social media. By amplifying the spreading speed and popularity of trustworthy information, influential social media users have been helping overcome the negative impacts of such flooding misinformation. In this article, we use the COVID-19 pandemic as a representative global health crisisand examine the impact of the COVID-19 pandemic on these influential users’ subjective well-being (SWB), one of the most important indicators of mental health. We leverage X (formerly known as Twitter) as a representative social media platform and conduct the analysis with our collection of 37,281,824 tweets spanning almost two years. To identify influential X users, we propose a new measurement called user bridging performance (UBM) to evaluate the speed and wideness gain of information transmission due to their sharing. With our tweet collection, we manage to reveal the more significant mental sufferings of influential users during the COVID-19 pandemic. According to this observation, through comprehensive hierarchical multiple regression analysis, we are the first to discover the strong relationship between individual social users’ subjective well-being and their bridging performance. We proceed to extend bridging performance from individuals to user subgroups. The new measurement allows us to conduct a subgroup analysis according to users’ multilingualism and confirm the bridging role of multilingual users in the COVID-19 information propagation. We also find that multilingual users not only suffer from a much lower SWB in the pandemic, but also experienced a more significant SWB drop.


INTRODUCTION
The recently concluded COVID-19 pandemic is unanimously accepted as one unprecedented global public health crisis.With its disastrous and fatal cost to the global economy and people's health, it is critical to take stock of lessons for future large-scale infectious diseases of a similar type, which include both effective approaches to fighting the virus itself and to combating the infodemic caused by the virus.The term "infodemic" outlines the perils of misinformation during disease outbreaks mainly on social media [10,25].Apart from accelerating virus transmission by distracting social reactions, the infodemic increases cases of psychological diseases such as anxiety, phobia and depression during the pandemic [14].
In response to the infodemic, governments and healthcare bodies outlined a series of effective social media-based countermeasures to spread trustworthy information to overcome the negative impact of misinformation.With the purpose of promoting the speed and wideness of information spread, individual users with a large number of followers were invited to share messages [2,57].Meanwhile, various user subgroups, including healthcare professionals and social activists, also voluntarily took part in relaying information they considered useful with their social media accounts.All these users in fact act the role of bridges on social media delivering information to the public, although their bridging performance varies.We use the term "bridging performance" as an analogy to evaluate how quickly and widely information can diffuse on social media because of the sharing of a user.
Subjective well-being (SWB) evaluates individuals' cognitive (e.g., life satisfaction) and affective (i.e., positive and negative) perceptions of everyday lives [31].The global decrease in the public's SWB has been unanimously recognised since the outbreak of the COVID-19 pandemic.The SWBs of various sub-populations such as immigrants and healthcare workers [18,29] has been studied extensively.Such research manages to identify the vulnerable populations which deserve special attention.Despite the contribution of influential users on social media in combating the infodemic, no work has been conducted to examine and analyse their psychological status.In this article, we use the COVID-19 pandemic as one representative future global health crisis, and perform the first attempt to study the negative impacts of a global pandemic on the subjective well-being of this specific subgroup of people.
There are two challenges that have to be overcome in advance to conduct this analysis.First, no measurements exist to accurately quantify social media users' actual bridging performance in promoting the spread of COVID-19 related information.The existing measurements, used in crisis communications and marketing, identify influential users based on their social relations.Their effectiveness is found to deteriorate when capturing users' real bridging performance, especially in the COVID-19 pandemic [47].We take healthcare professionals on X (formerly known as Twitter) as an example.Their professional endorsement significantly promotes the popularity of tweets they shared [47].However, they usually do not belong to super tweeters who have thousands of followers.The second challenge is how to obtain the SWB levels of a sufficiently large group of social media users whose bridging performance is simultaneously available.
We address the identified challenges by leveraging the information outbreak on social media triggered by the COVID-19 pandemic and the recent advances of deep learning in text mining.

15:3
Although the methods we develop to solve these two challenges can be used independently for each research problem, we use them as two indispensable components to achieve our research objective which is to analyse the mental health status of influential users on social media.To address the first challenge, we start by proposing a new measurement based on information cascades instead of relying only on social relations between users to comprehensively quantify individual users' influences in diffusing COVID-19 related information.We also extend the measurement to quantify and compare the bridging performance of user subgroups.Regarding the second challenge, we take advantage of the success of deep learning in Natural Language Processing (NLP) and quantitatively evaluate individual users' SWB with the sentiments of their posts.Recent studies [32] have illustrated the advantages of social media posts in extracting subjective well-being, especially with data-driven methods.In this article, we follow the same data-driven methodology and adopt the state-of-the-art text embedding based on transformers.Compared to traditional machine learning models with manually pre-defined features, it allows us to automatically learn representative features of textual posts.Our contributions.We make use of X as a representative social media platform considering its increasing popularity during the pandemic.It had become one major source for the public to acquire COVID-19 related information, especially during the first few months after the outbreak of the pandemic.We collect data from X originating from the Greater Region of Luxembourg (GR) which is a cross-border region centred around Luxembourg and composed of adjacent regions of Belgium, Germany and France.One important reason for selecting this region is its intense connections of international residents from diverse cultures, which is unique as a global financial centre.Moreover, they represent the first batch of countries administering COVID-19 vaccines.Our collection spans over 2 years from October 2019 to the end of 2021, including 3 months before the pandemic.Our contributions are summarised as follows: -We propose a new measurement to capture the actual bridging performance of individual social media users in diffusing COVID-19 related information.Compared to existing ones based on social connections, it is directly derived from information diffusion history.This measurement allows for identifying the accounts of influential health professionals and volunteers which are usually missed in previous studies.-Through deep learning-based text embedding methods, we implement a classification model that accurately extracts the sentiments expressed in social media messages.With the sentiments of posts, we quantitatively estimate the SWB of individual users and discover that influential individual users are more negatively affected in their SWB during the pandemic.-Through the hierarchical multiple regression model, we reveal for the first time that users' SWB has a strong negative relationship with their bridging performance in COVID-19 information diffusion, but a weak relationship with their social connections.-We identify the insufficiency of our bridging performance measurement for individual users in comparing the bridging performance between user subgroups.We thus extend it to the level of user subgroups.With the new measure, we successfully re-confirm the bridging role of multilingual users in information diffusion during the pandemic and reveal the more drastic adverse impact of the pandemic on their SWB.This complements the previous studies before the pandemic claiming that monolingual users have lower SWB due to their language boundaries [43,50].
Our research provides policy-makers with an effective method to identify influential users and user subgroups in the fight against infodemics.In addition, we highlight the need to pay particular attention to the mental health of people who are actively involved in transmitting information Structure of our article.We discuss the previous research related to our work in Section 2 and describe the collection of our data used in the analysis and its main characteristics in Section 3. Section 4 depicts the pre-processing we conducted on our data before our analysis, i.e., how to calculate information cascades, extract sentiments of textual messages and measure individual users' SWB.Our major contribution is stated in Section 5 where we focus on individual users presenting our bridging performance measurement and its strong negative relation with users' SWB.In Section 6, we extend our research to subgroup analysis by measuring subgroups' bridging performance and conducting a case study on multilingual users.The article is concluded with its limits and interesting future works in Section 7.

RELATED WORK 2.1 Impact of the Pandemic on Subjective Well-being
The negative impact of the COVID-19 pandemic on mental health and well-being has been acknowledged as a global health concern.Statistics have been reported on various subgroups of people with different demographic characteristics.More than 4 in 10 adults in the US suffered from symptoms of anxiety and depressive disorders in January 2021 compared to 1 in 10 before the pandemic. 1Females and young people are found more negative to respond to the loneliness caused by social distancing [41,49].In Japan, the monthly suicide ratio increased by 37% for women and 49% for children and adolescents while the average ratio increased by 16% [49].The vulnerable subgroups such as migrants [44,56] and refugees [17] have also been found to deserve more attention due to their relatively bad economic conditions.As an indispensable part of mental health, subjective well-being has also drawn public attention.Multiple factors have been found associated with its decline after the onset of the pandemic such as country-specific pandemic severity [20] and satisfaction with health [27].
In this article, we concentrate on the SWB of a specific group of people who are actively involved in information spread on social media and have not been studied in the literature.Furthermore, we will examine whether their contribution to COVID-related information diffusion can be used as an effective predictor of their SWB.

Measuring Bridging Performance
Measuring the spreading capabilities of individual social media users is an essential component in many research lines such as influence maximisation [9,34] and information diffusion prediction [8,24].Influence maximisation is to find a set of seed users who collaboratively optimise the diffusion of a piece of information while information diffusion prediction is to infer the final popularity of a message according to the given short-term early observation.These works need to estimate users' capabilities of promoting information spread.We can classify them into two categories.The first category is to make use of observed data to quantify and compare users' "centrality" while the other category of methods leverage diffusion models (statistical models, e.g., independent cascade [9], or deep learning [8,24,34]) to infer the most likely popularity of messages posted by given individual users.Both of them have their own benefits.For instance, the first approach is computationally efficient and explainable while the second may better capture effective indicators but lack interpretability.We focus on the first approach which quantifies users' bridging performance with explainable measurements on available data observation.
Bridging Performance of Twitter Users: A Predictor of SWB during the Pandemic 15:5 A considerable amount of literature has been published quantifying users' bridging performance based on social connections to identify amplifiers in social media.We can divide the measurements into two types.The first type of measurements implicitly assume that influential users are likely to hold certain topology properties on social networks such as large degrees, strong betweenness centrality or community centrality [21,23,58].Degrees only capture the number of users' local social connections without considering their overall position in the whole social network.Betweenness centrality [58] measures the importance of a user connecting with other users in the network while community centrality [23] measures a user's importance in connecting communities.Measurements are also proposed to fuse multiple topology properties such as fusing degree, ego-betweenness centrality and eigenvector centrality into an overall evaluation [30].It has been noticed that such measurements are usually inefficient to compute in sparse networks where node degrees do not follow power law distributions [22].The second type of measurements assume that influential users tend to be more likely reachable from other users through random walks.PageRank [42] and its variant XRank [53] among the representative benchmarks of this type of measurements.PageRank is calculated only with network structures while XRank additionally takes into account topic similarities between users.All the two categories of measurements have been widely applied in practice, from public health crisis communication [38] to online marketing [36].However, recent studies have pointed out that they may not truly capture users' actual bridging performance in information diffusion during a specific public healthy crisis [38].One of our goals in this article is to define new measurements that can capture social media users' bridging performance in information diffusion during global public health crises like COVID-19 in real-world social networks.

Subjective Well-being Extraction
Subjective well-being is used to measure how people subjectively rate their lives both in the present and in the near future [13].Many methods have been used to assess subjective well-being, from traditional self-reporting methods [12] to the recent ones exploiting social media [55].
Studies have cross-validated the trustworthiness of social media as a complementary data source for public opinions [7,32,39].Chen et al. [7] extracted vaccine hesitancy from X during the COVID-19 pandemic and empirically illustrated its consistency with social surveys across regions and along with time.With the Gallup-Sharecare Well-Being Index survey,2 a classic reference used to investigate public SWB, SWB extracted from social media has been shown as a reliable indicator of SWB [32].X-based studies usually calculate SWB as the overall scores of positive or negative emotions (i.e., sentiment or valence) [31].Sentiment analysis has developed from the original lexicon-based approaches [4] to the data-driven ones which ensure better performance [31].We adopt the recent advances of the latter approaches, and make use of the pre-trained XLM-RoBERTa [40], a variant of RoBERTa [37], to automatically learn the linguistic representation of textual posts.As a deep learning model, RoBERTa and its variants have been shown to overwhelm traditional machine learning models in capturing the linguistic patterns of multilingual texts [3].

THE GR-EGO X DATASET
In this section, we describe how we build our X dataset, referred to as GR-ego.In addition to its large number of active users, we have another two considerations for selecting X (formally called Twitter) as our data source.First, the geographical addresses of posters are attached to tweets and thus can be used to locate users.Second, tweet status indicates whether a tweet is retweeted.If a tweet is retweeted, the corresponding original tweet ID is provided.Together with the time stamps, we can track the diffusion process of an original tweet.Our GR-ego dataset consists of two components: (i) the social network of GR users recording their following relations; (ii) the tweets posted or retweeted by GR users during the pandemic.We follow three sequential steps to collect our GR-ego dataset.Table 1 summarises its main statistics.
Step 1. Metadata collection.Our purpose of this step is to collect seed users in GR who actively participate in COVID-19 discussions.Instead of directly searching by COVID-19 related keywords, we make use of a publicly available dataset of COVID-19 related tweets for the purpose of efficiency [5].Restricted by X's privacy policies, this dataset only consists of tweet IDs.We extract the tweet IDs posted between October 22nd, 2019, which is about three months before the start of the COVID-19 pandemic, and December 31st, 2021.Then with these IDS, we download the corresponding tweet content.On X, geographical information, i.e., the locations of tweet posters and original users if tweets are re-tweeted, is either maintained by X users, or provided directly by their positioning devices.We stick to the device-input positions, and only use user-maintained ones when such positions are unavailable.Due to the ambiguity of user-maintained positions, we leverage the geocoding APIs, Geopy and ArcGis Geocoding to regularise them into machine-parsable locations.With regularised locations, we filter the crawled tweets and only retain those from GR.In total, we obtain 128,310 tweets from 8,872 GR users.
Step 2. Social network construction.In this step, we search GR users from the seed users and construct the GR-ego social network.We adopt an iterative approach to gradually enrich the social network.For each seed user, we obtain his/her followers and only retain those who have a mutual following relation with the seed user, because such users are more likely to reside in GR.We then extract new users' locations from their profile data and regularise them.Only users from GR are added to the social network as new nodes.New edges are added if there exist users in the network with the following relationships with the newly added users.After the first round, we continue going through the newly added users by adding their mutually followed friends that do not exist in the current social network.This process continues until no new users can be added.Our collection takes 5 iterations before termination.In the end, we take the largest weakly connected component as the GR-ego social network.
Step 3. COVID-19 related timeline tweets crawling.In this step, we collect tweets originally posted or re-tweeted by the users in our dataset.These tweets will be used to extract users' SWB.Thus, the collected tweets are not limited to those relevant to the COVID-19 pandemic.Due to the constraints of X, it is not tractable to download all the users' past tweets.We select a sufficiently large number of representative users who actively participated in retweeting COVID-19 related messages, and then crawl their history tweets.In detail, we choose 14,756 users who (re)tweeted at least three COVID-19 related messages.With the newly released X API 15:7 which allows for downloading 500 tweets of any given month for each user, we collect 37, 281, 824 tweets spanning between October 22nd, 2019 and December 31st, 2021.This period also contains the last three months before the pandemic is officially claimed.In order to advance future related research, we release the IDs of our collected tweets via Github and can be downloaded from https://github.com/NinghanC/SWB4Twitter. Multilingualism of the dataset.One specific characteristic of our dataset is its multilingualism originating from the international nature of the GR region.Figure 1 presents the distribution of tweets in the top 15 most used languages.We can see that the collected tweets are composed in very diverse languages.The distribution of languages is consistent with that of GR inhabitants' nationalities and the corresponding official languages.3

DATA PROCESSING 4.1 Cascade Computation
A cascade records the process of the diffusion of a message.It stores all activated users and the time when they are activated.In our dataset, a user is activated in diffusing a message when he/she retweets the message.In this article, we adopt the widely accepted cascade tree to represent the cascade of a message [51].
The first user who posted the message is regarded as the root of the cascade tree.Users who retweeted the message, but received no further retweeting comprise the leaf nodes.Note that a tweet with a quotation to another tweet is also considered as a retweet of the quoted message.An edge from u to u is added to the cascade if u follows u and u re-tweeted the message after u, indicating u activated u .If many of the users whom u follows ever retweeted the message, meaning u may be activated by any of them, we select the one who lastly retweeted as the parent node of u .Figure 2(b) shows a cascade of the social network in Figure 2(a).In this example, user u 4 can be activated by the messages retweeted by either u 1 or u 3 .Since u 3 retweeted after u 1 , we add the edge from u 3 to u 4 indicating that the retweeting of u 3 activated u 4 .
We denote the root node of a cascade C by r (C).We call a path that connects the root and a leaf node a cascade path, which is actually a sequence of nodes ordered by their activation time.For instance, (u 1 , u 3 , u 4 ) is a cascade path in our example indicating that the diffusion of a message started from u 1 and reached u 4 in the end through u 3 .In this article, we represent a cascade tree For our study, we follow the method in [35] to construct tweet cascades.Recall that when a tweet's status is "Retweeted", the ID number of the original tweet is also recorded.We first create a set of original tweets with all the ones labelled in our metadata as "Original".Second, for each original tweet, we collect the IDs of users who have retweeted the message.At last, we generate the cascade for every original tweet based on the following relations in our GR-ego social network and their retweeting time stamps.We eliminate cascades with only two users where messages are just retweeted once.In total, 614,926 cascades are built and the average size of these cascades is 7.13.

Sentiment Analysis
Previous works [59] leverage user-provided mood (e.g., angry, excited) or status to extract users' sentiment (i.e., positive or negative) and use them to approximately estimate affective subjective well-being.However, such information is not available on X.We refer to the sentiments expressed in textual posts to extract users' SWB.In this article, we treat sentiment extraction as a tri-polarity sentiment analysis for short texts and classify a tweet as negative, neutral or positive.In order to deal with the multilingualism of our dataset, we benefit from the advantages of deep learning in sentiment analysis [3], and build an end-to-end deep learning model to conduct the classification.Figure 3 depicts the general architecture of the model.Our model is composed of three components.The first component uses a pre-trained multilingual language model, i.e., XLM-R [11,40], to calculate the representation of tweets.In addition to the improved accuracy, one important reason why XLM-R is selected is its capability to deal with multilingual languages which is claimed up to 100 languages including those low-resource ones [11].The representations are then sent to the second component, a fully-connected ReLU layer with dropout.Let z be a representation of a textual message.Then the output of this layer is relu (W • z) where W ∈ R d ×3 (d is the length of z).The last component is a linear layer added to the second component's outputs with sigmoid as the activation function.We use cross-entropy as the loss function and optimise it with the Adam optimiser.Model training and testing.We train our model on the SemEval-2017 Task 4A dataset [45], which has been used for sentiment analysis on COVID-19 related messages [15].The dataset contains 49,686 messages which are annotated with one of the three labels, i.e., positive, negative and neutral.We shuffle the dataset and take the first 80% for training and the rest 20% for testing.We assign other training parameters following the common principles in existing works.We run 10 epochs with a maximum string length of 128 and a dropout ratio as 0.5.When tested with macro-average F1 score and accuracy metrics, we achieve an accuracy of 70.09% and macro-average F1 score of 71.31%.
Despite its effectiveness in classifying SemEval-2017 Task 4A data, in order to check whether such performance will persist on our GR-ego dataset, we construct a new testing dataset.This dataset consists of 500 messages, 100 for each of the top 5 most popular languages.We hire two annotators to manually label the selected tweets and the annotated labels are consistent between them with Cohen's Kappa coefficient k = 0.93.When applied on this annotated dataset, our trained model achieves a similar accuracy of about 87%.Analysing our GR-ego dataset.Before applying our sentiment classification model to our GRego dataset, we clean tweet contents by removing all URLs and mentioned usernames.Figure 4 summarises the statistics obtained from user timeline tweets before and during the pandemic.The number of users' timeline tweets is consistent with previous studies.For instance, users tend to become more negative after the outbreak of the COVID-19 pandemic [18,29].

Measuring SWB
We extend the definition proposed in [59] to measure the level of subjective well-being of users based on the sentiment expressed in their past tweets.Specifically, we extend it from bi-polarity labels, i.e., negative and positive affection, to tri-polarity with neutral sentiment by multiplying a scalar to simulate the trustworthiness of the bi-polarity SWB.

Definition 1 (Social media Subjective well-being value (SWB))
. We use N p (u), N neg (u) and N neu (u) to denote the number of positive, negative and neutral posts of a user u, respectively.The subjective well-being value of u, denoted by swb (u), is calculated as .
If all messages are neutral, then swb(u) is 0. Discussion.Note that (i) consistent with [59], we focus on affective SWB (i.e., positive and negative) in this article, while ignoring its cognitive dimension; (ii) users' SWB is evaluated based 15:10 N. Chen et al.
on their original messages: originally posted tweets and quotations; (iii) for tweets with quotations to other messages, only the texts are considered without the quoted messages.As retweets may not explicitly include users' subjective opinions, we exclude them from the SWB calculation.

BRIDGING PERFORMANCE OF INDIVIDUAL USERS IN INFORMATION
DIFFUSION AND ITS RELATION WITH SWB We devote this section to measuring the bridging performance of individual users in the diffusion of COVID-19 related information.Through experimental validation and manual analysis, we validate the effectiveness of our proposed measurement in identifying influential social media users.

Measuring Individual User Bridging Performance
We evaluate individual users' overall performance in the diffusion of observed COVID-19 related messages.As a user can participate in diffusing multiple messages, we first focus on her/his importance in the diffusion of one single message and then combine her/his importance in all messages into one single measurement.Intuitively, we consider a user more important in diffusing a message when his/her retweeting behaviour activates more users or leads to a given number of activated users with fewer subsequent relays, e.g., retweets in X.In other words, a more important user promotes the popularity of the information or accelerates its transmission.
Given a cascade path S = (u 1 , u 2 , . . .,u n ), we use S * (u i ) (1 ≤ i < n) to denote the subsequence composed of the nodes after u i (including u i ), i.e., (u i , u i+1 , . . .,u n ).For any u that does not exist in S, we have S * (u) = ε where ε represents an empty sequence and its length |ε | = 0.In the following, we define how to quantify a user's contribution in the diffusion of a given message as a transmitter.
Definition 2 (Cascade bridging value).Given a cascade tree C and a user u (u r (C)), the cascade bridging value of u in C is calculated as Note that our purpose is to evaluate the importance of users as transmitters of messages.Therefore, the concept of cascade bridging value is not applicable to root users, i.e., the message originators.
Example 1.In Figure 2(b), u 3 participated in two cascade paths, i.e., S 1 = (u 1 , u 3 , u 4 ) and S 2 = (u 1 , u 3 , u 6 ).Thus, S * 1 = (u 3 , u 4 ) and S * 2 = (u 3 , u 6 ).We then have α In Definition 2, we do not simply use the proportion of users activated by a user in a cascade to evaluate her/his bridging performance.This is because that only allows for capturing the number of activated users.The speed of the diffusion will be ignored.Take u 2 in Figure 2(b) as an example.According to our definition, α C (u 2 ) = 0.25 which is smaller than α C (u 3 ).This is due to the fact that u 2 activated two users through two re-transmissions while u 3 only used one.However, if we only consider the proportion of activated users, the values of these two users will be the same.
With a user's bridging value calculated in each cascade, we define user bridging magnitude to evaluate her/his overall importance in the diffusion of the observed messages.Intuitively, we first add up the cascade bridging values of a user in all his/her participated cascades and then normalise the sum by the maximum number of cascades participated by a user.This method captures not only the bridging value of a user in each participated cascade, but also the number of cascades she/he participated in.This indicates that a more active user in sharing information is considered more important in information diffusion.
Bridging Performance of Twitter Users: A Predictor of SWB during the Pandemic 15:11 Definition 3 (User bridging magnitude (UBM)).Let C be a set of cascades on a social network and U be the set of users that participate in at least one cascade in C. A user u's user bridging magnitude (UBM) is calculated as .
With this measurement, we can compare the bridging performance of any two users, and learn which one plays a more important role in information diffusion.Note that UBM is a measurement of users' bridging performance in general information diffusion but not specifically customised to COVID-19 related information.

Validation of UBM
As discussed above, we make use of observed cascades instead of network typologies to estimate the capacity of individual users to promote the diffusion of COVID-19 related information.To test its effectiveness, we face the same challenge in research of social science, i.e., the absence of groundtruth data.In our case, it is the lack of real influential users that are consistent with widely accepted principles (if there exist any).To overcome this challenge and illustrate its advantages against existing measurements, we evaluate our UBM measurement from two perspectives.First, we use the auxiliary information associated with our collected tweets to empirically validate whether the influential users identified by UBM can improve the speed and popularity of information diffusion.Second, we examine the profiles of the identified influential users and cross-validate whether their composition is consistent with previous social studies in the literature.Positive conclusions from the above two perspectives can give us a reasonable level of confidence in the effectiveness of our new measurement.Empirical evaluation.We compare the effectiveness of our UBM to five widely used topologybased measurements in the literature, i.e., in-degree, PageRank [42], XRank [53], betweenness centrality [21] and community centrality [21].We randomly split the set of cascades into two subsets.The first subset accounts for 80% of the cascades and is used to calculate the bridging performance of all users.Then we select the top 20% users with the highest bridging performance in every adopted measurement and use the other subset to compare their actual influences in information diffusion.We adopt three measurements to quantitatively assess the effectiveness of UBM and the benchmarks.We use the average number of activated users per minute to evaluate the efficiency of the information diffusion.The more users activated in a minute, the faster information can be spread when it is shared by influential users.The average number of activated users counts the users who received the information after the retweeting behaviour of an identified influential user.It is meant to evaluate the expected wideness of the spread once an influential user retweets a message.The percentage of impacted users gives the proportion of users that have ever received a message due to the sharing behaviours of identified influential users.This measurement is to compare the overall accumulated influence of all the selected influential users.Note that every adopted measurement only focuses on one possible observable consequential factor of influential users, e.g., efficiency or wideness, and thus cannot be used as the overall estimation of a user's bridging performance.In addition, not all information used is available in practice, such as the temporal information in our validation.We show the results of UBM and other benchmark measurements in Table 2.We can observe that it takes less time on average for the influential users identified according to UBM to activate an additional user, with 0.104 users activated a minute due to their retweets.With 23.81 users activated, UBM allows for finding the users whose retweeting action can reach more than 35% users than those identified by the benchmarks.In the end, the top 20% influential users identified  by UBM spread their shared information to 71% users in our dataset, which overwhelms that of the best benchmark by about 15%.From the above analysis in terms of the three measurements, we can see that our UBM can successfully identify influential users whose sharing on social media manages to promote the wideness and speed of the diffusion of COVID-19 information.Manual analysis.In order to understand the profiles of the calculated influential users by the measurements, we select the top 30 users with the highest bridging performance of each measurement.We identify four types of user profiles: private, media, politicians and emergency management agencies (EMA).Figure 5 shows the distributions of their profiles.We can observe that the distributions vary due to the different semantics of social connections captured by the measurements.For instance, due to the large number of followers, X accounts managed by traditional media are favoured by in-degree.This obviously underestimates the importance of accounts such as those of EMAs in publishing pandemic updates.With reachability and importance in connecting users and communities considered, more accounts of politicians and EMAs stand out.The proportion of private accounts also starts to increase.When UBM is applied, in addition to the relatively same proportion of EMA accounts, the percentage of private accounts becomes dominant.This is unique compared to other adopted measurements.A closer check discovers that 10 out of the 11 private accounts belong to health professionals and celebrities.This is consistent with the literature [28] which highlights the importance of health professionals and individuals in broadcasting useful messages about preventive measures and healthcare suggestions in the COVID-19 pandemic.The manual analysis of identified influential users' profiles unveils that UBM is consistent with the measurements considering user connectiveness, and more interestingly confirms the important role of health professionals in information diffusion during the pandemic.

Impact of the Pandemic on SWB of Influential Users
With the proposed SWB measurement in Definition 1, we study how users' subjective well-beings change due to the outbreak of the COVID-19 pandemic.As our target is the SWBs of natural persons, the accounts of organisations and bots should be excluded from our analysis.We make use of existing methods/tools to identify these two types of accounts.We detected about 12.04 Bridging Performance of Twitter Users: A Predictor of SWB during the Pandemic 15:13 Fig. 6.SWB changes after the outbreak of the pandemic.
organisation accounts which are about 8.16% of the selected active users with the methods proposed in [52].We use Botometer [46] to detect bot accounts and only 131 users are classified as bots.In total, we removed 1,333 users (as 2 users are identified as both bot and organisation) from our collected dataset.
We calculate the UBM values of the remaining users and order them in descending order.Then, to compare the response of these two groups to the pandemic, we select the top 20% users and the bottom 20% users.For each group, we calculate users' SWBs according to their posts before the pandemic and after the pandemic to capture the changes.Note that we only consider the users with more than 5 posts in each time period.
In Figure 6, we show the SWB distributions of the two user groups.On average, the users with high UBM have a positive SWB of 0.13 before the pandemic while the users with low UBM are negative.The SWB of both user groups decreases after the pandemic but the SWB of the top 20% users drops more significantly.Specifically, their SWB falls by 0.38, which is two times as much as that of the bottom 20% users.The lowest value of the top 20% users' SWB slightly decreases after the pandemic, while the lowest value of the bottom 20% of users does not change significantly.Note that the minimum values here do not include outliers that lie outside the box whiskers.This indicates that the top 20% users become even more negative than the bottom 20% users, in terms of mean and minimum values.To sum up, the pandemic causes more negative mental impacts on the social media users who play a more important bridging role in transmitting COVID-19 related information.

Relation between SWB and Bridging Performance
We conduct the first attempt to study if a user's bridging performance has a relationship with the SWB changes of the users actively participating in the diffusion of COVID-19 related information.In addition to UBM and the five benchmark measurements used in Section 5.2, we consider two additional variables: out-degree and activity.Out-degree is used to check whether the number of accounts a user follows correlates with SWB changes.The activity variable evaluates how active a user is engaged in the online discourse and is quantified by the number of messages he/she posted.In order to isolate the impacts of these variables, we adopt the method of hierarchical multiple regression [48].The intuitive idea is to check whether the variables of interest can explain the SWB variance after accounting for some variables.
To check the validity of applying hierarchical multiple regression, we conduct first-line tests to ensure a sufficiently large sample size and independence between variables.We identify that the variables corresponding to community centrality and XRank fail to satisfy the multi-collinearity requirement.We thus ignore them in our analysis.The ratio of the number of variables to the sample size is 1:1,917, which is well below the requirement of 1:15 [48].This indicates the sample   7.
In Table 3, we show the results of the analysis.In the first stage, we input the variables related to network structures, i.e., in-degree, out-degree, Pagerank and Betweenness centrality.The combination of the variables can explain 4.80% of the SWB variance (F = 4.672, p < 0.05).Note that an F-value of greater than 4 indicates the linear equation can explain the relation between SWB and the variables.This demonstrates that there exists a positive relationship between the 15:15 topology-based variables and SWB, but this relationship is rather weak.A closer check on the t-values shows that out-degree is irrelevant to SWB and the rest three variables are weakly related.In the second stage, we add the variable of activity to the model.After controlling all the variables of the first stage, we observe that user activity does not significantly contribute to the model with a t-value of 0.396.This suggests that user activity is not a predictor of SWB.In the third stage, we introduce UBM to the model.The addition of UBM, with the variables in the previous two stages controlled, reduces the R value from -0.335 to -0.603.UBM contributes significantly to the overall model with F = 167.32(p < 0.001) and increases the predicted SWB variance by 25.1%.
Together with the t-value of -11.684 (p < 0.001), we can see there exists a strong negative relation between UBM and SWB, and UBM is a strong predictor for SWB.Discussion.To conclude, the results illustrate that UBM is strongly related to SWB, while in-degree, Pagerank and betweenness centrality are weakly related.This difference further shows that UBM can more accurately capture users' behaviour changes after the outbreak of the pandemic while topology features remain similar to those before the pandemic.This may be explained by the recent studies [29] that once considered as a change in life after the pandemic outbreak, this extra bridging responsibility in diffusing COVID-19 related messages is likely to be associated with lower life satisfaction.

COMPARING THE BRIDGING PERFORMANCE OF USER SUBGROUPS
With the above discussion, we have shown how to identify individual influential users.In practice, subgroup analysis is also an important analysis methodology to compare how people respond differently to an intervention or event.Samples are divided into multiple subsets, e.g., according to their demographic characteristics [27].Regarding bridging performance, our purpose can be to understand the different roles played by various user subgroups.Straightforward methods consist of aggregating users' UBM in each group into the overall bridging performance of the subgroup.In this section, we discover the deficiency of such methods and propose a new measurement to fix it.In order to validate the effectiveness of our measurement, we conduct a subgroup analysis based on users' multilingualism which successfully confirms previous studies about the potential bridging role of multilingual users in social networks.In addition, we analyse the SWB drops of multilingual and monolingual users after the onset of the pandemic and observe the same relation between SWB and the bridging performance among subgroups.

Measuring User Subgroup Bridging Performance
Our UBM measurement focuses on the level of users and evaluates their overall performance across all observed information cascades.It does not compare the relative bridging performance between different user subgroups within the cascades.We take the following example to clarify this deficiency.
Example 2. Suppose u 2 in Figure 2(b) is the only multilingual user.According to Definition 2, we learn that α C (u 2 ) = 0.25, and as a multilingual user, u 2 plays a more important role in diffusing the message than all monolingual users except for u 3 with α C (u 3 ) = 0.44.In this example, without a unified standard, we still cannot determine which group of users play a more important role in this cascade.
Similar to UBM, we want a unified single estimation of the influence of subgroups for total-order comparison between subgroups.Our core idea is to first evaluate subgroups' bridging performance in each cascade and then aggregate them into a final measurement.One notable challenge is the imbalanced sizes of subgroups.In practice, people with certain socio-demographic profiles only count for a minority of the general population.Another challenge is how to address the balance between subgroup activeness and the magnitudes of influences in participated cascades.Averaged bridging performance of subgroups may degrade incidentally the bridging performance of certain subgroups that are more active in diffusing information and participate in larger numbers of cascades.To address these two challenges, we use the number of cascades in which a subgroup dominates the other subgroups in terms of bridging performance to compare subgroups' influences.Suppose the set of users U is divided into multiple subsets, i.e., S ⊂ 2 U and for any S 1 , S 2 ∈ S, we have S 1 ∩ S 2 = ∅.Note that 2 U denotes the power set of U.
Given a cascade C, we calculate an integrated value through a function γ from the bridging values of the users in each subgroup, denoted by α S C for any S ∈ S. The integrated value can be the mean, median or maximum.Formally, The integration function γ should be determined according to real application scenarios.We consider a user subgroup S playing a more important bridging role in a particular cascade C than S if α S C > α S C .We use the notion SBM to quantify the importance of a user subgroup as a whole in information diffusion.
Definition 4 (Subgroup bridging magnitude (SBM)).Let C S = {C ∈ C|∃ u ∈S u ∈ C} be the set of cascades involving at least one user in subgroup S. The subgroup bridging magnitude (SBM) of the user subgroup S is calculated as follows: max S ∈S |C S | where 1(•) is an indicator function which returns 1 when the given proposition is true and 0, otherwise.
The introduction of the normalisation factor is to ensure the range of SBM is between 0 and 1.

Validation of SBM
Due to the lack of ground truth of subgroups' influences, we make use of a well-established result in social science and test whether we could cross-validate it with our measurement and data collection.The fact that a consistent conclusion is reached can help validate the effectiveness of our measurement.Existing studies highlight the bridging role of multilingual users in network connectivity [26] and between communities speaking different languages [16].With regard to information diffusion, it has been studied that non-native English speakers have a higher influence than native English users [33] and multilingual users play a special role in cross-lingual diffusion [1].
In the following, we first make use of one inherent characteristic of our tweet dataset: multilingualism to construct two user subgroups and compare their bridging performances based on SBM.After this cross-validation, we continue to study the SWB of subgroups and see whether the conclusion on individual users also applies to subgroups.Defining multilingualism and extracting user socio-demographic features.No consistent definition exists for the multilingualism of a person.We follow a conservative approach to determine multilingual users based on language usage frequency.We only consider active users who posted more than 5 messages to ensure sufficient evidence.If a user posted or retweeted tweets in more than two languages, we select the language with the most tweets as his/her main language.If the messages of the main language make up less than 60%, we consider the user as multilingual.Otherwise, the user is considered as monolingual.This conservative criterion helps exclude most monolingual users who just infrequently or accidentally retweet or cite information in languages other than their mother tongue.In our dataset, about 37% users are labelled as multilingual.
Bridging Performance of Twitter Users: A Predictor of SWB during the Pandemic 15:17  In Figure 8, we show the average number of monthly tweets posted by a user before and during the pandemic.We can see that users became more active on X after the outbreak of the pandemic and multilingual users are more willing to participate in the online discourse.These observations are consistent with existing studies [26], implying the reliability of our multilingualism definition.Experimental evaluation.We use the same measurements evaluating UBM in Section 5.2 to quantify the speed and popularity of COVID-19 related information diffusion after the retweeting of both subgroups.The results are summarised in Table 4.We can see that on average, every minute the retweets of multilingual users can activate about 2.7 times as many users as monolingual users.Moreover, each multilingual user can activate 3.84 users, which is 50% more than monolingual users.Although multilingual users account for only 37% of the collected users, they can impact in total almost 92% of the users as a subgroup, which is about 10% more than monolingual users.
From the above discussion, we can see that multilingual users actually play a more important bridging role in diffusing COVID-19 related information during the pandemic.
Both the experimental evaluation and the literature illustrate again the validity of our dataset.In the following, we will check whether our SBM measurement can capture this special role of multilingual users in our dataset.
In Table 5, we list the results about the SBM values calculated with our GR-ego dataset.Note that the third column depicts the number of cascades in which the corresponding subgroup has a larger bridging value than the other subgroup.In our analysis, we instantiate the integration function γ with maximum, median and mean.The obvious observation from Table 5 is that according to SBM, multilingual users have a cascade bridging magnitude of 0.78 on average under all three integration functions, which is more than two times larger than that of monolingual users.The above analysis shows that our SBM measurement can successfully capture the specific bridging role of multilingual users.

Analysing SWB Changes of User Subgroups
The correlation discovered in the previous section implies that multilingual users would suffer lower SWB due to their bridging role in diffusing COVID-19 related information.We proceed to validate this inference by comparing the SWB of multilingual users before and after the pandemic outbreak.Figure 9 presents the SWB values of multilingual and monolingual users before and after the onset of the pandemic.From the box plot on the left, we can see that multilingual users on average behaved more positively than monolingual users before the pandemic.This is consistent with previous studies conducted in different language regions where multilingualism is generally associated with better subjective well-being [43,50].Similar to recent COVID-19 related works [29], we can observe that the outbreak of the pandemic lowers the SWBs of all users but multilingual users' SWB dropped more drastically and became even lower than monolingual users' SWB.The right part of Figure 9 shows the distribution of users in each SWB category.We clearly see that 25.8% of the multilingual users are consistently negative before the pandemic, which is about one third less than that of monolingual users (37.7%).During the pandemic, in both the multilingual and monolingual groups, a large number of users changed from the positive category to the negative category due to the adverse impact of the pandemic.Positive multilingual users decrease from 20% to 5.4% while positive monolingual users drop to 12.8%.Negative multilingual users increase by 200% and negative monolingual users just increase by 26%.
It is natural to argue that the different reactions of subgroups of users by their multilingualism may also be present in other subgroup splittings and thus the observation discussed above will not be as insightful as claimed.We split social media users according to other three features: country, age and gender and show the corresponding results in Figures 10, 11 and 12, respectively.We use the multi-modal deep neural model M3 [52] to infer the age and gender of X users which makes the inference based on users' account name, screen name, self-descriptive description and profile image.We consider three age groups: 19 − 29, 30 − 39 and ≥ 40.To deal with the multilingualism of texts, we first translate all texts into English word by word and select the 3, 000 most frequent words to calculate users' embedding.We construct a sample dataset of 100 randomly selected users to test its performance on our collected X data.Two annotators are asked to manually annotate the samples' ages and genders.The annotated labels are highly agreed upon between the two   annotators with large Cohen's Kappa coefficients (k = 0.95 for gender and k = 0.81 for age).When tested on our sample dataset, the M3 model achieves a Macro F1 score of 0.92 and an accuracy of 0.91 for age classification.For gender classification, the Macro F1 score is 0.78 and the accuracy is 0.75.
We can observe the decrease of SWB in all subgroups of all three group splittings, after the outbreak of the pandemic.However, the magnitudes of the decrease are similar between subgroups.This is completely opposite to the fact observed from the monolingual and multilingual subgroups of users.This implies that multilingualism is actually an effective indicator of larger SWB suffering when a pandemic occurs.Discussion.From the above analysis, we conclude that the outbreak of the pandemic imposes a more adverse impact on multilingual users and they reacted more negatively than monolingual users.This finding complements the existing pre-pandemic studies.It is claimed that monolingual users suffer from a lower SWB in an international environment due to their language barriers.Our results show that during a global pandemic like COVID-19, the influence of language barriers does not constitute a factor for SWB with the same level of importance as that before the pandemic.

CONCLUSION AND LIMITATION
In this article, we concentrated on the performance of social media users and their sharing behaviour in promoting the popularity of COVID-19 related information.By proposing two new measurements, we quantify the bridging performance of both individual users and subgroups of users.With X data collected from an international region, we successfully showed the influential users and subgroups suffer from more decrease in their subjective well-being.With our measurement of individual users, we conducted the first research to reveal the strong negative relationship between a user's bridging performance in information diffusion and his/her SWB during the pandemic.With our bridging performance measurement on subgroups, we re-confirmed the bridging role of multilingual users in diffusing information on social media and illustrated the negative relation between multilingual users' SWB and their bridging performance.This finding complements existing studies about multilingual people's subjective well-being before the pandemic in the sense that the impact of language barriers on SWB becomes less significant during a global health crisis like the COVID-19 pandemic.Our research provides a cautious reference to public health bodies that some individual users and subgroups can be mobilised to help spread health information, but special attention should be paid to their psychological health.
Limitations and our future work.This article has a few limitations that deserve further discussion.First, we only focused on the affective dimension of subjective well-being while noticing its multi-dimensional nature.This allows us to follow previous SWB studies to convert the calculation of SWB to sentiment analysis, but does not comprehensively evaluate users' cognitive well-being, such as life satisfaction.In our following research, we will attempt to leverage more advanced AI models to investigate cognitive aspects such as happy and angry.Second, extracting SWB from users' online disclosure inevitably incurs bias compared to social surveys although it supports analysis of an unprecedented large number of users.Third, socio-demographic information of users is not taken into account in this article.It is known that SWB varies among different socio-demographic groups, and such variation may have an impact on our results.Currently, deep learning-based models exist for socio-demographic inference.In our future work, we will extract users' socio-demographic information such as age, gender and income to ascertain whether the regression results will change due to the variations of socio-demographic information.Last, we notice that the region we targeted at may introduce additional bias in our results.As a continuous work, we will extend our study to a region of multiple European countries and cross-validate our findings with other published results in social science.
Ethical considerations.This work is based completely on public data and does not contain private information of individuals.Our dataset is built in accordance with the FAIR data principles [54] and X Developer Agreement and Policy and related policies.Meanwhile, there have been a significant amount of studies on measuring users' subjective well-being through social media data.It has become a consensus that following the terms of service of social media networks is adequate for respecting users' privacy in research [19].To conclude, we have no ethical violation in the collection and interpretation of data in our study.

15 : 4 N
. Chen et al. such as multilingual users, in the case of COVID-19 or other potential future large-scale infectious diseases of similar type.

Fig. 3 .
Fig. 3. Architecture of our sentiment classification model.as a set of cascade paths.For instance, the cascade in Figure 2(b) is represented by the following set{(u 1 , u 2 , u 7 , u 8 ), (u 1 , u 3 , u 4 ), (u 1 , u 3 , u 6 )}.For our study, we follow the method in[35] to construct tweet cascades.Recall that when a tweet's status is "Retweeted", the ID number of the original tweet is also recorded.We first create a set of original tweets with all the ones labelled in our metadata as "Original".Second, for each original tweet, we collect the IDs of users who have retweeted the message.At last, we generate the cascade for every original tweet based on the following relations in our GR-ego social network and their retweeting time stamps.We eliminate cascades with only two users where messages are just retweeted once.In total, 614,926 cascades are built and the average size of these cascades is 7.13.

Fig. 5 .
Fig. 5. Profile distribution of the top 30 accounts with highest bridging performance.

Table 1 .
Statistics of the GR-ego Dataset

Table 2 .
Comparison of Bridging Performance with Benchmarks

Table 4 .
Multilingual and Monolingual users Bridging Performance Comparison

Table 5 .
The SBM of Multilingual and Monolingual users: Multilingual users Perform Dominantly Better with Respect to all the Three Integration Functions Fig. 9. Distribution of SWB values before and during COVID-19 by multilingualism.