The Great Ban: Efficacy and Unintended Consequences of a Massive Deplatforming Operation on Reddit

In the current landscape of online abuses and harms, effective content moderation is necessary to cultivate safe and inclusive online spaces. Yet, the effectiveness of many moderation interventions is still unclear. Here, we assess the effectiveness of The Great Ban, a massive deplatforming operation that affected nearly 2,000 communities on Reddit. By analyzing 16M comments posted by 17K users during 14 months, we provide nuanced results on the effects, both desired and otherwise, of the ban. Among our main findings is that 15.6% of the affected users left Reddit and that those who remained reduced their toxicity by 6.6% on average. The ban also caused 5% users to increase their toxicity by more than 70% of their pre-ban level. Overall, our multifaceted results provide new insights into the efficacy of deplatforming. As such, our findings can inform the development of future moderation interventions and the policing of online platforms.


INTRODUCTION
Within online platforms, content moderation is needed to maintain safe and inclusive social environments by mitigating the spread of problematic content and harmful behavior.It fosters user trust and safety, thereby upholding ethical standards and contributing to the overall flourishing of healthy online communities [10].Hence, platform policies are enforced through moderation interventions [30].There exist many possible interventions to be applied by a platform, ranging from short warning messages [18] and the use of informative labels [22,32] up to the removal of large amounts of content or users [16,28].However, in spite of the growing reliance on content moderation, there is still a limited understanding of the general effectiveness of moderation interventions, which impair the efficacy of current regulatory efforts.Indeed, recent research has shown that some interventions yielded mixed [14,29] or no effects at all [8], and in some cases, some even resulted in unintended and undesired consequences [1,23].For these reasons, it is paramount to conduct thorough assessments of the outcomes of recent moderation interventions as a fundamental preliminary step to the planning and development of future effective solutions.
Out of all the possible interventions, the removal (i.e., banning) of users, communities, and content -a moderation practice called deplatforming-is by far the most frequently adopted [13].A famous example is the banishment of Donald Trump from Facebook and X (formerly Twitter) in 2021 [26].
Other notable examples are the deplatforming of some toxic influencers on X [16], the removal of accounts involved in coordinated inauthentic behaviors on X [6], and the permanent shut of racist, sexist, and generally hateful communities on Reddit [5,11].In 2020, Reddit carried out a massive deplatforming operation that involved the ban of around 2,000 subreddits, on the grounds of their promotion of hate of groups based on identity and vulnerability. 1 Among the banned communities were the very popular subreddits r/The_Donald and r/ChapoTrapHouse, with hundreds of thousands of subscribers.This event is commonly referred to as The Great Ban and is currently one of the biggest bans in the history of social media.Despite its impact on multiple communities and on a large number of users within and beyond Reddit, its effects remain under-explored.Existing studies on The Great Ban have primarily focused on assessing writing style changes [31], which do not answer the fundamental question as to whether the ban was effective at curbing toxic behavior.The few studies that sought to analyze the degree of toxicity after the ban did so only for a few of the affected subreddits [28].Moreover, the majority of such studies investigated community-level effects, overlooking user-level reactions that are however clearly relevant with respect to the moderation goal of thwarting particularly problematic users and behaviors [29].
Research questions.Here, we contribute to filling these knowledge gaps by carrying out a large-scale quantitative analysis of the changes in toxic behavior exhibited by those users who participated in the 15 most popular subreddits shut during The Great Ban.We analyze 16M comments shared by almost 17K users over 14 months, answering to the following research questions.
• RQ1: Was The Great Ban effective at reducing toxicity?The affected subreddits were banned due to hateful and toxic speech.However, previous studies showed that botched interventions can exacerbate -rather than mitigate-toxic behaviors.Here we evaluate the effectiveness of The Great Ban at reducing user toxicity.
• RQ2: Did The Great Ban cause any undesired side effects to some users?In other words, were there users who became much more toxic after the intervention?When evaluating the outcomes of a moderation action, it is important to consider whether the action made a subset of users resentful, thus leading to marked increases in their toxicity.Such extreme reactions can also occur in the presence of an overall (i.e., platform-or community-level) reduction in total toxicity, which mandates analyses at the user level.Here we investigate the presence and quantify the extent of toxic reactions to The Great Ban, as an indicator of potential shortcomings in the intervention.
Main findings.Based on our answers to the above RQs, our study yields the following novel findings: • Overall, The Great Ban caused 15.6% of the affected users to abandon Reddit.Those who remained decreased their toxicity by 6.6%, on average.• Despite this modest overall reduction in toxicity, a nonnegligible fraction of users became much more toxic.For example, 5% users increased their toxicity by more than 70% of their pre-ban level.• The presence of resentful users who increased their toxicity was widespread across the analyzed subreddits.
Overall, our study provides a comprehensive account of the effects of The Great Ban.It surfaces and describes an undesired side effect of the intervention, drawing attention to the delicate balance entailed by the moderation of heterogeneous communities.As such, our results can inform future moderation strategies and the development of effective interventions.

RELATED WORK
We summarize and critically discuss recent literature on the evaluation of moderation interventions, starting from those works that are most similar to our present study.

Deplatforming
Despite the relevance and the extent of The Great Ban, few works delved into a systematic evaluation of its effects.
Among them is the study by Milo Trujillo et al. (2021) that analyzed activity and linguistic changes in the 15 most popular subreddits affected by the ban.They found that top users suffered the largest decreases in activity and that community response was heterogeneous between subreddits, and even between users of a subreddit [31].Here we complement this work by assessing effects in terms of toxicity, rather than activity and language.Other works evaluated deplatforming effects in a subset of the subreddits affected by The Great Ban, or in other subreddits altogether.Chandrasekharan et al. [5] and Saleem and Ruths [25] examined the repercussions of the bans on r/fatpeoplehate and r/coontown, revealing that a substantial number of users departed from Reddit following the interventions.Among those who stayed, a notable reduction in hate speech was observed.However, they also found a considerable portion of users who migrated to other subreddits, doubling their posting activity [4].Instead, Horta Ribeiro et al. [14] assessed the impact of deplatforming across multiple platforms, concentrating on the migration of users from banned subreddits to newly established platforms.Their findings indicated a significant decline in user activity on the new platforms.However, they also found a subset of users who increased their toxicity and radicalization.Deplatforming was also studied on platforms other than Reddit.For example, Mekacher et al. [20] studied ban-induced migrations from Twitter to Gettr, finding that politically polarized users are less toxic on fringe platforms as they are exposed to less out-group interactions.Cima et al. [6] analyzed massive bans done by Twitter to counteract coordinated inauthentic behaviors.Finally, Jhaver et al. [16] examined Twitter's ban on multiple toxic influencers, revealing a general reduction in conversations about these figures, accompanied by decreased activity and toxicity among their supporters.Nonetheless, they also found a fraction of users who greatly increased activity and toxicity.
Overall, this body of work shows that moderation interventions frequently cause a combination of desired and undesired effects, and that effects vary between different interventions and types of users.To this end, our work leverages and extends previous knowledge by evaluating the effects of The Great Ban -a massive, yet essentially unexplored, moderation event-across the dimension of comment toxicity.Our results provide a picture of the effectiveness of the ban on the 15 most popular affected subreddits.

Soft moderation
So-called soft interventions emerged as an alternative to deplatforming, addressing the concerns about censorship and the loss of free speech that often accompany content and user removals [32].Trujillo andCresci (2022, 2023) studied quarantines and restrictions: soft interventions that often precede community bans on Reddit.They studied moderation outcomes on r/the_donald, revealing a general reduction in activity and toxicity, at the cost of an increased political polarization and decreased factuality of shared news [28,29].Reddit's quarantine of r/the_donald was also studied by Chandrasekharan et al. [4] and Shen and Rosé [27], who concluded that the intervention did not produce meaningful changes, nor in terms of misogyny and racist comments, neither regarding engagement and internal dynamics.Another type of soft intervention is the attachment of warning labels to disputed posts, whose effects were assessed in terms of perceived credibility and obtained engagement.In detail, Pennycook et al. [23] found that the presence of some posts with warning labels increases the perceived credibility of all posts without labels, including false ones not yet debunked.Zannettou [32] found instead that tweets with warning labels were often replied to, to further debunk the claims.However, this resulted in the flagged tweets circulating more and obtaining more engagement than the undisputed ones.Finally, Katsaros et al. [18] carried out an A/B test on Twitter to evaluate the effectiveness of using warning messages to prompt users who are about to post toxic tweets.Their results show that the intervention was overall effective at reducing the posting of toxic tweets.Nonetheless, a small minority of users edited their tweets to make them more toxic after being exposed to the warning message.The above literature on soft moderation interventions corroborates that on deplatforming, confirming that each intervention can elicit both desired and undesired effects.Overall, this body of work underscores the need for further research to evaluate the impact of little-studied interventions.

DATASET
Our dataset for this study comprises 16M Reddit comments shared by 16,828 distinct users who participated in at least one of the 15 most popular public subreddits in terms of daily active users shut during The Great Ban [31], as reported in Table 1.Even if Reddit administrators originally published an obfuscated list of the most popular banned subreddits, 2 previous work deciphered it for those with over 2,000 daily active users [31], resulting in the 15 subreddits used both therein and herein.The composition of our dataset is illustrated in Figure 1 and the procedure adopted to build it is described in the following.
Dataset construction.We initially collected all comments posted between December 2019 and June 2020 in each of the 15 popular subreddits, resulting in 8M comments shared by 194K distinct users.To collect the comments we used the Pushshift data dumps [2] available through Reddit torrents. 3The data covers 30 weeks (i.e., 7 months) prior to The Great Ban, which allows for establishing a suitable baseline for the activity of the affected users before the moderation intervention [29].Notably, we could not collect any data between May and June 2020, due to several subreddits having halted their activity, or being banned, before Reddit's public announcement of The Great Ban on June 29, 2020.
As commonly done in literature, we obtained a representative set of users for the considered subreddits by constraining our analysis to core users -namely, users who participated regularly in at least one subreddit [3,28].We defined core users as those who posted at least one comment each month between December 2019 and March 2020. 4Additionally, we filtered out bots (i.e., clearly automated accounts) by discarding all accounts that posted at least two different comments at the exact same time.For this, we used a time delta of at least 1 second between comments, which guarantees we do not inadvertently filter out authentic users [15].Additionally, we manually validated a random sample of 1,000 comments to verify that manifest bots were effectively excluded from the dataset.After these filtering steps, we ended up with 2.2M comments by 16,828 core users.Hereafter, we refer to this portion of our dataset as IN-BEFORE since it involves activity within the banned subreddits before the ban.
Providing a fair evaluation of the effects of The Great Ban involves matching comparable datasets before and after the intervention.However, no activity exists within the banned subreddits after the intervention, since the subreddits were permanently shut.Therefore, evaluating the effects of the intervention must involve the analysis of user activities outside of the banned subreddits.For this reason, we collected all comments made by the 16,828 core users outside of the 15 banned subreddits across a wide time frame spanning 7 months before and after The Great Ban, as shown in Figure 1.We obtained around 13.8M comments from 16,540 distinct users.We labeled data related to user activities before the ban as OUT-BEFORE and that after the ban as OUT-AFTER, as reported in Table 1

RQ1: Effectiveness of The Great Ban
In this section, we assess the effectiveness of The Great Ban at reducing toxic behaviors.Abandoning users.Table 1 highlights a difference of 2,577 users (15.6%) between the OUT-BEFORE and OUT-AFTER datasets, corresponding to users who became inactive on Reddit after The Great Ban.Given that such users were consistently active before the ban, but did not post a single comment in the 7 months following it, we conclude that they abandoned For the latter, toxicity scores are computed both before (BEF) and after (AFT) the ban.The ABA vs BEF and BEF vs AFT columns show the effect sizes and the statistical significance levels of the differences in toxicity.The ABA vs BEF column shows that users who abandoned the platform were more toxic than those who remained.The BEF vs AFT column shows that users who remained active experienced a modest toxicity reduction after the ban.
Reddit, possibly migrating to other platforms [14].This result thus highlights a first straightforward effect of The Great Ban.
In order to draw more insights into this finding, we compare the pre-ban subreddit-wise toxicity of the abandoning users with that of the users who remained active on the platform.Table 2 shows that in 14 out of 15 subreddits, users who later abandoned the platform were more toxic than those who remained after the ban.In other words, toxic users were more likely to abandon the platform after the intervention than less toxic ones.For 12 subreddits the difference in toxicity between abandoning and remaining users is statistically significant ( < 0.05), according to a non-parametric Mann-Whitney test for unpaired data.Moreover, in 12 subreddits, abandoning users have larger mean absolute deviation (MAD) scores than remaining users, indicating greater variability in toxicity among the former user group.
Remaining users.In addition to causing some users to abandon the platform, the ban might also have caused toxicity changes in those users who remained.Table 2 reports subreddit-wise toxicity scores for the matched set of remaining users, before and after the ban.As shown, the remaining users were on average less toxic after the ban.Specifically, when aggregating results for each subreddit, users from all 15 subreddits slightly decreased their toxicity.The decrease is statistically significant for 11 subreddits ( < 0.1), according to a non-parametric Wilcoxon test for paired data.Notably, effect sizes for the comparison between remaining users before vs after the ban are smaller than those of the comparison between abandoning vs remaining users.In fact, after The Great Ban, the overall toxicity decreased by 6.57% on average -a modest amount.Moreover, for 13 out of 15 subreddits, toxicity MAD values are larger after the ban than before.This result suggests that the ban increased the degree of variability in the behavior of the remaining users, which we further investigate in the following.
User-level effects.So far, we provided platform-and communitylevel results, finding that The Great Ban caused 15.6% of core users to abandon the platform.Furthermore, those who remained exhibited a modest average toxicity reduction of 6.57%.To provide a thorough assessment of the intervention, here we also investigate user-level effects by computing the toxicity changes experienced by each of the 13,963 users who remained active on Reddit after the ban.The central panel of Figure 3 presents a slope chart of all user-level toxicity changes, independently of the subreddit in which users participated.Each line corresponds to a single user.Line slopes encode the amount of toxicity reduction or increment.Rising lines are colored with different shades of red depending on their slope, and denote users who increased their toxicity after the ban.Contrarily, decreasing lines are blue-colored and denote users who decreased their toxicity.Figure 3 also includes marginal boxplot toxicity distributions for remaining users before (left-hand side of the slope chart) and after (right-hand side) the ban.The leftmost boxplot presents the toxicity distribution for users who abandoned the platform.Finally, the bottom panel shows the distribution of user-level toxicity changes as a beeswarm plot.This latter plot is useful for highlighting outliers and for studying the two tails of the distribution -i.e., those related to marked toxicity increases (red dots, right-hand side of the beeswarm plot) and decreases (blue dots, left-hand side).
The three boxplots of Figure 3 confirm the overall toxicity trends observed in Table 2. Abandoning users have the largest median toxicity, followed by remaining users before the ban.Out of the three user groups, the remaining users after the ban have the lowest median toxicity.The slope chart in Figure 3 depicts a majority of lines with relatively small positive or negative slopes.These correspond to users who exhibited small toxicity changes -either increases or decreases-after the ban.At the same time, however, the slope chart also features a remarkable number of steep lines, which are related to users who exhibited large toxicity changes.Specifically, we note an overwhelming majority of red-colored steep lines.This implies that, among users who exhibited large toxicity changes, the vast majority increased their toxicity.This result qualitatively describes an undesired effect of The Great Ban, which caused a non-negligible minority of users to become resentful, thus exhibiting much more toxic behaviors.The points where the lines of the slope chart intersect the y axis on the right-hand side of the plot represent the toxicity of the users after the ban.The distribution of the post-ban toxicity scores is also depicted in the right-hand side boxplot.As shown in Figure 3 and as anticipated from the MAD values in Table 2, there is more variability in user toxicity after the ban. Figure 3 clarifies that this increased variability is caused by the increased toxicity of the resentful users.
Figure 3 provides aggregated results for all subreddits.However, the same set of visualizations can also be used to assess the consequences of The Great Ban among the users of a single subreddit.In turn, this is valuable for identifying common patterns and possible differences between the subreddits.We thus repeated the analysis for users of each individual subreddit.The comparison between the different subreddits confirmed previous results, according to which the majority of users who underwent substantial toxicity changes after the ban, increased their toxicity.Nonetheless, this behavior was more pronounced in certain subreddits while lacking in others.Among the subreddits where the effect was particularly pronounced are r/the_donald and r/consumeproduct, as visible from the large majority of steep red lines in the slope charts of Figure 4, and from the long right tails of the corresponding beeswarm plots.Conversely, r/soyboys and r/hatecrimehoaxes are subreddits whose participants did not experience marked toxicity changes and for which no extremely toxic behavior was measured post-ban.

RQ2: Extreme user reactions to The Great Ban
In answering RQ1 we found that The Great Ban caused a modest overall reduction in toxicity.At the same time, however, we also qualitatively discovered a fraction of users who became resentful and greatly increased their toxicity after the ban.We are now interested in quantitatively assessing the extent of this issue across the different subreddits.
Let  BEF ,  AFT be the toxicity of the -th user before and after the ban.Then, Δ =  AFT −  BEF is the change Based on this criterion, we found no outliers for r/hatecrimehoaxes, which is thus omitted from the right panel figure.
in toxicity for the same user.For clarity, the beeswarm plot in Figure 3 shows the distribution of Δ for all users.In detail, we measured that 5% of all users exhibited a Δ > 0.1.This result is particularly relevant in light of the median toxicity pre-ban, which was 0.137 as reported in Table 2.In other words, this result implies that 5% of users increased their toxicity by more than 70% of their pre-ban level.The overall change in toxicity in a given subreddit with  participants can be computed as Δ =  =1 Δ.Then, in order to separately weigh the contribution of the two tails of the beeswarm plots, we compute summations that only consider positive or negative Δ.For example, we quantify the contribution of the right tail (i.e., the one related to toxicity increases) of the distribution of toxicity changes in a subreddit as: Similarly, we quantify the contribution of the left tail as Δ − =  =1 |Δ|, with Δ < 0. The left panel of Figure 5 shows the balance between the contributions to the overall toxicity change in each subreddit brought by users who increased (Δ, red-colored) vs those who decreased (Δ − , blue-colored) their toxicity.As shown, the contributions of the two tails are relatively balanced for the majority of subreddits.In 12 out of 15 subreddits, the decreases in toxicity slightly outweigh the increases, which results in the modest overall decrease in toxicity that we already noted in RQ1.Toxicity increases outweigh decreases in r/debatealtright and r/imgoingtohellforthis2, while r/the_donald has perfectly balanced contributions.
In order to specifically investigate the behavior of outlier users -those who experienced marked changes in toxicitywe recompute the previous summations by only considering users whose change in toxicity exceeds a given threshold : |Δ| > .This allows us to focus on those users who experienced extreme behavioral changes, be them increases or decreases in toxicity.For example, the right panel of Figure 5  subreddits the contributions of those who increased or decreased their toxicity to the overall subreddit toxicity are very imbalanced.The only exceptions are r/gendercritical who presents relatively balanced contributions (55% vs 45%) and r/hatecrimehoaxes for which no user experienced a change in toxicity exceeding the threshold  = 0.25.Moreover, in 11 subreddits toxicity increases greatly outweigh decreases.
To generalize these findings, we repeated the analysis by varying the value of  from 0 to 1, with a step of 0.01.Results are shown in Figure 6 where each line corresponds to a subreddit and reports the % difference Δ − Δ − between the contributions of the two tails, for each value of .In the figure, only a few notable subreddits are highlighted, while the rest are grey-colored so as to reveal the overall trend.
As shown, even for small values of the threshold , the vast majority of lines rise steeply.In particular, in 12 out of 15 subreddits the outlier users exhibit large increases in toxicity.The subreddits r/imgoingtohellforthis2 and r/wojak are those featuring the largest toxicity contributions by outlier users.On the contrary, outliers in r/hatecrimehoaxes and r/oandaexclusiveforum show marked toxicity reductions, while r/chapotraphouse remains overall stationary.However, the spike observed in r/imgoingtohellforthis2, as well as the decline of r/oandaexclusiveforum, may be attributed to the relatively small number of users (less than 100).In summary, the results presented in this section highlight that the presence of resentful users who became much more toxic after The Great Ban was not localized within a single or a few subreddits.On the contrary, despite the fact that only a minority of users experienced marked toxicity increases, such adverse reactions were pervasive across most of the analyzed subreddits, which is indicative of a systemic phenomenon.

DISCUSSION
Our results shed light on the complex effects of The Great Ban, a paramount example of deplatforming that involved around 2,000 subreddits.Among our main findings is that 15.6% of the affected users abandoned Reddit after the ban.Those who remained on the platform reduced their toxicity by 6.6% on average.At the same time however, around 5% of all users markedly increased their toxicity.The presence of such resentful users was widespread across the analyzed subreddits.These nuanced results cover new ground on the effects of The Great Ban and, more broadly, on adverse reactions to moderation interventions.Our results also lend themselves to multiple considerations about the design and deployment of effective moderation as well as about the challenges of regulating online platforms.
Effectiveness of the moderation.In literature, the effectiveness of content moderation actions has been primarily assessed in terms of changes that the moderation caused to the activity and toxicity of the affected users [5,16].To this regard, our study revealed that a considerable share of toxic users abandoned the platform while the others exhibited a modest reduction in toxicity and a marked reduction in activity.At first glance, these results appear to be indicative of a successful moderation.However, it is necessary to deeply scrutinize these findings in light of potential unintended consequences.For example, the departure of some of the most toxic users from the platform suggests a form of displacement rather than a resolution of the toxicity issue.As recent research pointed out, these users have likely migrated to other online spaces [14], raising concerns about the displacement of their toxic interactions instead of their mitigation.These worries are emphasized by the knowledge that users who migrate after facing restrictions on a platform, subsequently engage in even more toxic and aggressive behavior [14].In addition, user churn and diminished activity levels post-ban might pose challenges for Reddit, as online platforms thrive on user engagement and interactions for generating revenues [28].Consequently, the apparent success of The Great Ban in mitigating toxicity must be interpreted with caution, considering the potential negative impact on the broader online ecosystem -due to user migrations-and on the platform's economic model -due to user churn and reduced activity.Future endeavors should aim to strike a balance, devising strategies to curb toxicity without inducing abandonment or substantial decreases in user activity.The quest for effective moderation should align with the overarching goal of cultivating healthier online communities without compromising the safety of other platforms or economic viability, which could otherwise disincentivize platforms to carry out scrupulous moderation.
Divergent reactions to moderation.Our analysis also revealed that, in spite of a modest overall reduction in toxicity, a non-negligible minority of users exhibited large toxicity increases.This result has important implications for the assessment and development of moderation interventions.First, it sheds light on the complexity of user reactions to content moderation, which is a largely underexplored area that requires further investigation [17].In addition, it surfaces the need for a more nuanced and personalized approach to content moderation.A generic intervention such as The Great Ban -which involved thousands of subreddits and tens of thousands of users-may not effectively address the diverse motivations and behaviors exhibited by the affected users.In our work, this was exemplified by the minority of resentful users who greatly increased their toxicity.Understanding the factors contributing to such divergent responses is paramount for developing effective moderation strategies.Future research and practical applications should delve into user profiling, considering individual characteristics, past behavior, and contextual factors to effectively tailor moderation interventions [7].Moreover, the migration of a subset of toxic users and the widespread presence of users experiencing heightened toxicity raises questions about the potential radicalization effect of moderation and its unintended contribution to the amplification of echo chambers [14].It suggests that some users may react negatively to certain moderation actions, possibly leading to more extreme behaviors.These observations highlight the delicate balance required in content moderation, where the aim should not only be that of reducing toxicity locally, but also preventing inadvertent consequences that might exacerbate polarization or radicalization in some user groups.
Limitations.Our study is based on a large historical dataset of Reddit comments shared by 17K users during a long observation window of 14 months centered around The Great Ban.Due to the nature of our dataset, our findings may be specific to Reddit and the context of The Great Ban.Therefore, caution is needed in generalizing the results to other online platforms or different moderation interventions.Similarly, online platforms are dynamic environments that are subject to continuous changes in user behavior, community norms, and platform policies.Our study covers a specific snapshot in time, albeit relatively long, which nonetheless limits the possibility to carry over our results to different time periods and, partially, also to account for long-term or evolving trends.Moreover, toxicity scores of comments were obtained via the machine learning model Detoxify, which in itself might have introduced a biased evaluation of toxicity.Another limitation of our work lies in its observational nature, which hinders the possibility of accounting for external events or changes in the broader online ecosystem that may have influenced user behavior independently of The Great Ban.For this reason, care is needed in establishing causal relationships from the findings presented herein.Future work could adopt more sophisticated causal inference techniques, for instance difference-in-differences or interrupted time series.However, finding suitable control subreddits and taking into account exogenous events remains a significant challenge [28].Finally, our dataset lacks comprehensive information about user demographics, motivations, or contextual factors.Accounting for these aspects could provide a more nuanced and actionable interpretation of our results.In this regard, future research could investigate the unexplored interplay between user characteristics and the outcomes of moderation, also as a preliminary step towards the development of targeted and personalized moderation interventions [7].
Ethical considerations.This research contributes to a deeper understanding of the impact of content moderation, shedding new light on the complexities of user reactions to moderation interventions.This knowledge can inform future developments of effective and nuanced moderation strategies aimed at curbing online toxicity while minimizing unintended consequences.To this end, our work draws attention on the trade off between common versus minority good.In our work this is exemplified by the ethical dilemma faced by moderators who must decide whether to enforce interventions that could possibly harm a minority of users by making them resentful, in order to provide a modest benefit to the broader community.

CONCLUSIONS
The Great Ban was a massive deplatforming operation enforced to shut toxic communities on Reddit.To evaluate its effectiveness, we analyzed 16M comments shared over the course of 14 months by 17K users affected by the ban.Our results reveal that 15.6% of the affected users abandoned Reddit after the ban and that those who remained reduced their toxicity by 6.6% on average.Despite this modest toxicity reduction, 5% of users increased their toxicity by more than 70% of their pre-ban level.The presence of such resentful users was widespread across the analyzed subreddits rather than concentrated in a few ones.
Overall, our study provides new and nuanced insights into the effectiveness of The Great Ban, including its undesired consequences.As such, it can inform the development of future and more effective moderation interventions and the policing of online platforms.Specifically, future work could extend our present results by delving deeper into the relationship between user characteristics and the outcome of moderation interventions.In turn, this would pave the way to the development of targeted or personalized interventions that could mitigate the undesired effects of moderation actions, such as those discussed in this work.Other promising avenues of future research are the development of predictive models for the outcome of moderation interventions.These would allow to estimate the likely effects of an intervention in advance of its application, enhancing the possibility to plan the strategic enforcement of moderation actions.

Figure 1 :
Figure 1: Timeline depicting the periods of data collection and analysis.Collected data spans two time periods of 7 months each, centered around The Great Ban.It is noteworthy that the IN-BEFORE dataset has no content since May 2020, indicating that the activity in the banned subreddits halted before the official intervention date.

Figure 3 :
Figure 3: User-level toxicity change after The Great Ban, for each active user.The slope chart in the central panel reveals a majority of red-colored rising lines, corresponding to a large number of users who drastically increased their toxicity.The beeswarm plot in the bottom panel confirms this effect, presenting more users in the right red-colored tail of the distribution than in the left blue-colored one.Boxplots present marginal distributions for abandoning and remaining user, before and after the intervention.

Figure 5 :
Figure 5: Subreddit-wise changes in toxicity obtained by summing all individual decreasing and increasing contributions, for all users (left panel) and outlier users (right panel).Outlier users are those whose change in toxicity exceeds the threshold  = 0.25.Based on this criterion, we found no outliers for r/hatecrimehoaxes, which is thus omitted from the right panel figure.

0Figure 6 :
Figure 6: Contribution of the outlier users to the increase/decrease of toxicity in each subreddit.Outlier users are defined as those whose individual change in toxicity exceeds the threshold .As shown, in 12 out of 15 subreddits the outlier users caused large toxicity increases.

Table 1 :
Dataset composition.Rows are ordered by number of active users after the ban.Data in IN-BEFORE is related to user activities within the banned subreddits before the ban took place.Data in

OUT-BEFORE and OUT-AFTER are
. Estimates of the effects of the ban are obtained by comparing the OUT-BEFORE and OUT-AFTER datasets.Lastly, we enriched our dataset by computing a toxicity score for each collected comment.Annotating toxicity: Detoxify vs Perspective API.Google's Perspective API and Detoxify are among the state-of-the-art [9,19]]veloped by the Jigsaw team at Google and currently represents the de facto standard for toxicity detection, both in production content moderation environments and in academia[21,24]. Tice is offered as a Web API that, given a piece of text, outputs several scores of offensiveness, including two indicators of toxicity and severe toxicity defined in the [0, 1] range.Detoxify is an open source deep learning toxicity classifier that also outputs the toxicity and severe toxicity indicators[12].Due to its convenience and competitive performance, it has recently seen frequent use[9,19].The advantage of Detoxify over Perspective API lies in the possibility of installing and running it locally, without incurring the limitations of a Web API (i.e., rate limits or quotas).Given the large number of comments to annotate in our dataset, we computed toxicity scores with Detoxify.Nonetheless, we first validated our choice by comparing the outputs of Perspective API and Detoxify on a stratified random sample of 10K comments extracted from the IN-BEFORE portion of our dataset.Figure2presents the results of this comparison, for both the toxicity and severe toxicity indicators.As shown, we found a strong positive Pearson correlation between the two methods, which supports the use of Detoxify.Then, with respect to the two provided indicators, for our subsequent analyses we relied on the toxicity indicator because it is the one on which the two methods agree the most, with  = 0.872 vs  = 0.796 for severe toxicity.

Table 2 :
aband.before (ABA) remain.before (BEF) remain.after (AFT) Subreddit-wise median toxicity scores for users who abandoned Reddit after the ban (ABA) and for those who remained.