Working Together (to Undermine Democratic Institutions): Challenging the Social Bot Paradigm in SSIO Research

Unlike most other forms of coordinated, inauthentic behavior occurring online, the goals of state-sponsored information operations, or SSIOs, are often complex and multifaceted. These goals range from flooding conversations with a certain narrative, to increasing the public's engagement with news sources of questionable quality, to stoking tensions between ideologically opposed groups to weaken public trust. The prevailing theoretical framework for understanding SSIOs is to treat them as a social botnet: a behaviorally homogeneous cluster of coordinated activity. However, the social bot framework is both at odds with some of the behaviors observed in early SSIOs and more broadly with the wide swathe of goals these operations set out to accomplish. To examine the fit of the social bot framework in the SSIO context, we develop a novel bag-of-words based method for clustering and describing user activity traces. Applying this method to a comprehensive repository of SSIOs conducted on Twitter over the last decade, we find that SSIOs violate both the core assumption of the social bot framework, and how it is operationalized in practical work. Instead, we find that SSIOs exhibit a clear division of labor and propose cooperative work with social roles as a more effective theoretical framework for understanding SSIOs. Through applying this framework, we find that the roles that SSIO agents take on have become more stable and simple over time, which holds substantial implications for developing methods for detection of these operations in the wild.


INTRODUCTION
For over a decade, the dominant framework for describing and detecting inauthentic, coordinated activity online has been the "social botnet" -a group of automated or semi-automated accounts which mimic human behaviors to evade detection, while altering or influencing the flow of information available online [28].Due to the multiple communities studying this activity, the term "social bot" encompasses a wide array of behaviors, goals, and levels of automation [17].However, across the many communities studying social bots, a central feature of their definitions is the assumption of behavioral homogeneity -that social bots that work together exhibit similar behaviors to one another, reflective of the goal of their deployment [32].Due to this assumption and the framework's popularity, social bot detection deployed in the last decade has largely converged on methods that identify groups of coordinated social bots, rather than individual ones [17].Since its inception, this framework has become fairly ubiquitous, and is used to understand everything from spammers and fake followers, to more longitudinal forms of manipulation: state-sponsored information operations [74].
State-sponsored information operations (SSIOs) are conducted or sanctioned by governments or political groups to further both domestic and international geopolitical objectives [24] using a mix of automated and genuine user activity online [10].One of the earliest, and perhaps most successful [34], of these operations was the Russian-backed Internet Research Agency (IRA) campaign, which interfered in the U.S. presidential election of 2016.By posing as U.S. citizens, these agents paid for political ads, helped cultivate narratives, and even organized physical rallies [11].More recently, Russia has demonstrated the efficacy of deploying information operations in-tandem with realworld wartime operations in their 2022 War in Ukraine.By deploying a massive propaganda network in the early months of the war [61], Russian operatives reached millions of users, primarily in India, South Africa, and the United States [29], in an effort to shape global opinion and sentiment on the war.More generally, recent studies have shown that this kind of activity is only becoming more common, with at least 70 countries being targeted by such operations in 2019 [10].These operations pose substantial risk to democratic institutions worldwide, with their goals ranging from influencing elections [5,13,27], to increasing the public's engagement with low quality news sources [8,60,63], to stoking tensions between ideologically opposed groups to weaken public trust [4,16,68].With the social bot framework seeing early success in the analysis of the Russian IRA campaign [6], it cemented itself as the framework of choice for understanding this kind of activity.
While the social bot framework has always been resonant in SSIO research, even early work seemed to indicate that it might not be a perfect fit.Analysis of the IRA operation found that it exhibited multiple unique behaviors at once [33], while researchers studying an operation active in Syria in 2012 found that its various automated agents displayed a division of labor [1].More recently, an operation active during the United Kingdom's European Union membership referendum of 2016 was found to exhibit clear tiers or clusters of agents, with some focusing on content generation, while others attempted to more directly engage with genuine users [8].This division of labor and taking-on of different roles suggests that SSIOs often violate the core assumption of the social bot framework: behavioral homogeneity -that agents within the same operation exhibit similar behaviors to one another.If this is the case, detection methods that rely on this assumption, and thus lack generalizability across different behavioral patterns [25], will likely fail to detect large portions of these operations -only finding the most similar clusters of agents.However, these deviations from the social bot framework stem from case studies of individual operations.As of yet, there have been no longitudinal studies across SSIOs to determine if these are just a few unique departures from the social bot framework or if they expose a more fundamental difference between social bots and SSIOs.
In this paper, we fill this gap by analyzing one of the most comprehensive SSIO datasets currently available [53].Specifically, we focus on the following research questions as we investigate the appropriateness of the social bot framework in this context: • RQ1a: To what degree does the theoretical assumption of the social bot framework -behavioral homogeneity -hold for state-sponsored information operations on Twitter?
While the assumption of behavioral homogeneity is core to the notion of what it means to be a social bot, in practice, this assumption is operationalized in a slightly different way.For the purposes of detection, social bots don't need to be completely behaviorally homogeneous so long as they are more similar to one another than authentic users are similar to one another.We refer to this quality throughout the paper as relative self-similarity.Since most practical work operates under this assumption, rather than the actual assumption of behavioral homogeneity, we also ask: • RQ1b: To what degree does the practical operationalization of the social bot framework -relative self-similarity-hold for state-sponsored information operations on Twitter? • RQ2: What kinds of behavioral patterns are exhibited by state-sponsored information operations and how have these evolved over time?
We initially attempted to answer these research questions using a classical approach of Digital DNA and Longest Common Substring (LCS) curves, proposed by Cresci et al. [20].However, we find it to be ill-suited to our task of analyzing SSIOs.Specifically, the substring-based clustering mechanism is both not scalable and overly sensitive to minor noise in the behavioral pattern of users.This prompted us to augment Cresci et al.'s Digital DNA method with a bag-of-words embedding [30], enabling the use of more scalable, robust clustering techniques.Using our method, we first cluster agents by operation to see if the majority are put in the same cluster, testing to what degree the theoretical assumption of the social bot framework holds (RQ1a).Next, we compare the similarity of the embedded agents against a sample of the "genuine" Twitter crowd to see if they are more similar, testing the practical operationalization of the framework (RQ1b).To answer our final research question, we apply a heuristic approach to broadly classify these clusters as clearly focused on specific behaviors or having no focus in behavior.We use these classifications to trace the evolution of the overall composition of SSIOs over time (RQ2).Having established a theoretical basis for replacing the social bot framework with that of cooperative work, the final component of our work explores this lack of fit from a practical perspective.Using our novel method for clustering users, we diagnose the "coverage" that popular social bot detection techniques achieve in the SSIO context.
Through our analysis, we find that the social bot framework does not fully describe the behavior of SSIOs.What's more, our method reveals that the characteristics observed in the "Brexit" operation [8] seem to hold across most operations: that agents engage in a division of labor, with different groups serving different roles in the operation.From a practical perspective, we find that due to this division of labor, detection techniques from all major "waves" of social bot research [19] offer only partial coverage of SSIOs, with each wave of techniques specializing in the detection of a particular level of coordination and performing markedly worse in detecting the others.
The contributions of this work include: 1) A novel, bag-of-words based method for identifying behaviorally similar clusters of users on Twitter and describing their core activities.2) The first empirical analysis of the degree to which the social bot framework fits SSIOs, en-masse.3) Empirically grounded recommendations for a more comprehensive theoretical framework for describing SSIOs: cooperative work with social roles.This framework both more accurately models the division of labor we discover in our analysis, and still allows the use of social bot detection tools to be applied, albeit selectively and at a finer granularity than is currently standard practice.From this framework, and through the use of our novel clustering method, we offer practical guidance on how current detection methods could be more effectively leveraged to detect SSIOs in the wild.We conclude this paper with a discussion of the implications and potential future directions for research that our work raises in this domain.

BACKGROUND
In the subsections below, we first define state-sponsored information operations and the social bot framework, along with instances of operations that problematized the application of the social botnet as a framework.We then introduce the theories of cooperative work and social roles as an alternative framework to social bots.Finally, we introduce sequence analysis as a means of analyzing the fit of these two frameworks in the SSIO context.

State-Sponsored Information Operations
In their 2019 report on state-sponsored information operations (SSIOs), Bradshaw and Howard define SSIOs as social media manipulation, performed or sanctioned by governments or political actors, in order to "manufacture consensus, automate suppression, and undermine trust in the liberal international order" [10].With information in the name, the primary tactic employed by SSIOs is the weaponization of three different categories of information: propaganda, misinformation (spreading unintentionally false information), and disinformation (spreading intentionally false information) [67].In general, SSIOs are more commonly used by authoritarian regimes for controlling information domestically, while only a handful of countries possess the resources and experience to conduct operations to influence global audiences [10].However, there is evidence of knowledge sharing and diffusion amongst these groups, with documented training for how to conduct an SSIO happening in Russia [45], India [10], and China [14].
An important aspect of SSIOs is that, by design, these operations extend beyond the agents that originally conduct the operation, and include "unwitting agents" [9]: individuals who are not necessarily coordinated and might even be unaware of the role they play in the operation -what we might consider to be authentic users.In this paper, we are interested in the suitability of the social bot framework specifically for SSIOs.As such, we focus our analysis specifically on the accounts that have been determined to be coordinating by Twitter's Safety team, and as a result, do not explore any behavior which might be exhibited by unwitting agents.

The Social Bot Paradigm
Defining social botnets is difficult due to the multiple communities who study them.Research on the more technical side of this work generally focuses on automation and the specific algorithms used by social bots [31,42].In contrast, social science work tends to focus more on the political and social repercussions of social bots [71], as well as their communication styles [35] and interaction with others [38].The works that bridge these different perspectives into a singular definition or concept, such as that conducted by Cresci [17], Gorwa and Guilbeault [31], or Grimme et al. [32], all find that the term social bot is an overloaded one, used to describe a variety of structures that strive to mimic human behavior.However, all this work points to several key characteristics that describe social bots: • social bots exhibit either full automation or partial automation supplemented with human operation • groups of social bots (social botnets) are built to achieve a specific goal or goals • social botnets are made up of agents that act in a coordinated or synchronized fashion to amplify their effectiveness • social bot behaviors have gotten more nuanced over time as a means of avoiding detection Together, these characteristics inform the core assumption of the social bot framework: that individual bots within a social botnet exhibit similar behaviors to one another.This similarity is a reflection of their automation, the shared goal(s) they are meant to achieve, and the synchronicity of their actions.Throughout this paper, we refer to this assumption as behavioral homogeneity.In practice, though, the assumption of behavioral homogeneity is operationalized slightly differently: social bots exhibit a higher degree of similarity to each other than authentic users exhibit to one another.We refer to this operationalization as relative self-similarity.This practice is largely driven by the fact that detection methods that look for groups of users that are suspiciously similar often outperform detection methods focused on finding individually suspicious accounts [19].
However, as social bots continue to evolve in order to evade detection, the research communities studying them have also had to change their understanding of what constitutes suspicious similarity and how to detect it.Cresci et al. dubbed the major milestones of this ever-changing understanding as the Three Waves of social bot evolution [19].In the first wave, social bot similarity was easy to identify, due to both their low reputation and their obvious signs of automation and spamming [73].In contrast, the second wave of social bots appeared as much more reputable (due to following and engaging with one another) and made significant efforts to avoid spammy or repetitive behaviors [17], instead coordinating through much more complex or longitudinal behaviors.To better detect this new wave of social bots, supervised classifiers were trained on social bots found in the wild to identify more nuanced forms of relative self-similarity, such as behavioral and social-graph-based features [73].In the current third wave of social bot research, the adversarial nature of this space is felt most acutely, as these social bots are carefully crafted to avoid commonly identified characteristics of previous waves, making them appear more similar to humans than to past versions of social bots.Through this careful design, third wave social bots are able to coordinate actions and behaviors more plainly, as they avoid the other common markers of social bot behavior [17].This problematized the use of supervised detectors trained on older social bot data, and led to an explosion of unsupervised group-based detection approaches, which aimed to automatically detect unique similarities between accounts on a case-by-case basis [17].While these group-based approaches have become quite ubiquitous, various case studies throughout the decade have demonstrated that the social bot framework in general might not be a perfect fit for large scale manipulations, like SSIOs.We explore some of these case studies and findings in the following section, and engage with the consequences throughout this paper.

A Critique of the Social Bot Paradigm in SSIO Research
In addition to its resonance in other lines of work, the social bot framework became popular in SSIO research thanks to its efficacy in making sense of some of the first high-profile SSIOs online.During both the 2014 Ukrainian Revolution of Dignity [21] and the 2016 U.S. presidential election [13], the social bot framework was quickly applied to understand the behaviors and goals of these campaigns [26,33].Its immediate efficacy helped cement it as a useful frame for understanding SSIO activity.However, some of these early case studies also revealed meaningful deviations from the social botnet frame.Hegelich and Janetzko found that the IRA operation conducted during the 2014 Ukrainian revolution exhibited multiple behaviors at once, including mimicking genuine users, boosting hashtag visibility, and amplifying messages through mass retweeting [33].Relatedly, Abokhodair et al. found that in an operation conducted during the 2012 Syrian civil war, major tasks related to production and distribution of content were split between different groups of agents [1].More recent case studies, such as Bastos and Mercea's analysis of an operation targeting the so-called "Brexit" discussion in the U.K. [8], found a similar distribution of work.These examples, drawn from the SSIO literature, suggest that distribution of work and individualized roles might be a more salient feature in SSIOs, which the social bot frame does not account for.
These case studies also contextualize the findings of Bradshaw and Howard in their analysis of significant sponsors of SSIOs in 2019.By systematically analyzing news articles, scientific reports, and investigating several case studies, they found that individual operations active in 2019 utilized multiple account types, messaging types, and communication strategies [10].Taken together, these trends towards a diversity in construction and functionality problematizes the usage of the social bots framework in two key ways.If groups of agents within an operation exhibit different behaviors from one another, then: 1.) From a theoretical perspective, a key assumption of the coordinated social bots framework -that SSIOs are behaviorally homogeneous -is violated.2.) From a methodological perspective, since most current detection techniques lack generalizability [17,25], and are tuned to detect a specific type of social bot (one specific behavioral pattern), these detection methods will likely only find a subset of agents, leaving the other groups undiscovered.
Perhaps the most telling observation of the current state of social bot research comes from Cresci's reflection on a decade of social bot research, where he argues that a significant challenge in this work is that most researchers are working with sparse, obsolete data resources, that "hardly cope with the rapid evolution of malicious accounts" [17].To address this, in this paper we analyze recent SSIO activity published through Twitter's continuous data sharing initiative [53].Cresci proposes that this data archive could "enable the next wave of research" in this domain [17].Yet, relatively few researchers have analyzed any of the campaigns in this archive, and none have generalized across the full set of 34 campaigns available.Thus, in this work we investigate the social bots framework in the SSIO context by analyzing, for the first time, the entirety of the most comprehensive and up-to-date archive of SSIOs on Twitter.

An Alternative to Social Bots: Cooperative Work and Social Roles
As described above, previous case studies of SSIOs have indicated that the social bot framework might not be a perfect fit, and even that they may violate core assumptions of how social bots function and behave [1,8,33].If this is the case, and SSIOs are not behaviorally homogeneous, then it is essential for us to identify a more comprehensive theoretical framework of coordinated behavior that makes room for social bots, while also describing how differently behaved actors might work together.One such model is that of cooperative work, as described by Schmidt and Bannon: that individual agents, as they are not omnipotent or omniscient, are interdependent and must cooperate to get work done [58].Cooperative work can manifest in a myriad of ways, including the use of "specialized activities of multiple workers" that make use of "different specialized tools, techniques, or routines" [56].Due to the interdependence of cooperative work, and the distributedness of the agents involved, the key way that agents coordinate is through articulation work-a form of work that manages these distributed activities and includes a spectrum of mechanisms, such as organizational structures, plans and schedules [57], and standard operating procedures [66].We, see evidence of this kind of articulation work in more well-known SSIOs, such as the Russian Internet Research Agency (IRA) operation in 2016, where leaked documents demonstrated that agents were directed on how many accounts and posts they were expected to manage daily, suggested subject matter for those posts, and even what kinds of profanity or abuses are allowed on specific sites [11,59].
The most important implication of applying the cooperative work frame to SSIO activity is that the primary mechanism for coordination amongst these different behaved groups, articulation work, is not (fully) visible to anyone outside of the operation.Articulation work is both formal and informal, taking place over many communication channels [58], and as a result, this coordination is not visible from activity traces alone.This is in stark contrast to earlier, simpler inauthentic activity on social media that popularized the use of the social bot frame [17].In lieu of directly observing the coordination of cooperative work, as articulation work is mostly invisible to those outside the organization and its communication channels [58], we instead are able to observe the results of this coordination: the specialized roles that agents take on [56].These roles are the key way that agents are able to contribute to the goals of an organization without fully understanding or being aware of those goals [70].Roles become even more important to understand in the context of high-tempo activities, such as political crises, when organizations, such as SSIOs, emerge quickly and dynamically [39,70].
As a lens for understanding SSIO activity, perhaps the most important implication of the cooperative work frame and social role theory is that SSIO research needs to shift the unit of analysis from entire operations (social botnet-based work) to the specialized roles that it is composed of.This investigation of roles, rather than top-level organizing structure, has already seen success in understanding misinformation-or propaganda-producing communities on social media.In a longitudinal study of three strategic information operations, Starbird et al. found that the work conducted in these operations was often interdependent in nature, where different types of formal and informal workers made unique contributions to the operation and were often reliant on each other [64].Likewise, Phadke and Mitra identify five roles commonly seen in extremist group activism: educators, solicitors, flamers, motivators, and sympathizers.Further, they find that some of these roles, especially those more central in the hierarchy of the movement, are more stable throughout its course than others [50].In this paper, we extend the findings on behavioral heterogeneity of the above research to a wider, more generalizable range of SSIO activity through the use of cooperative work and social roles as an investigative lens.Since the most direct form of coordination (articulation work) is not captured by the activity traces, we instead focus on the activity SSIO agents exhibit, turning to sequence analysis for uncovering their roles.

Sequence Analysis in the Study of Online Behavior
Within the last decade, researchers in the CSCW community have begun to more fully embrace the descriptive power of sequence-based analysis for answering questions related to collaborative behaviors.Some of the earliest proponents for sequence analysis in CSCW were Keegan et al., who argued that sequence analysis could be used, amongst other things, to effectively identify sub-classes of work and how these evolve over time [36].At nearly the same time, Cresci et al. proposed similar benefits of sequence analysis in the domain of social bot research.Describing these sequences as digital DNA, Cresci et al. proposed that sequences could be used to identify groups of suspiciously behaviorally similar users for the purpose of social bot detection [18].By encoding different activities of interest at the tweet level as a letter, a user's Twitter activity stream could be encoded as a "DNA strand".These strands could then be measured against other users', with heavily coordinating accounts having far longer common substrings than authentic users.This method has the benefit of being quite flexible, as the "alphabet" used to create the strand can be exchanged or combined with other alphabets to describe even more complex behavior [18].
While sequence analysis has seen a lot of use in social bot research, one of its key benefits is currently underutilized: the sheer breadth of modifications and techniques that have been developed in other domains [36].In particular, a substantial amount of Natural Language Processing (NLP) research in recent years has been dedicated to developing methods for representing sequences numerically [23,49,72].The earliest, and certainly the simplest, of these methods is the bag-ofwords (BOW) model: given a vocabulary of all known words, a BOW model represents text as a vector of how many times each word in the vocabulary is present in the text [30].While this model is fairly coarse, ignoring any order or structure in the text, it has endured as one of the most popular representation methods due to its simple implementation, high interpretability, and impressive performance for a variety of tasks, both NLP-related and not [51].Additionally, given its popularity, there is substantial literature on how to boost BOW performance and scalability [43,75].In this paper, we demonstrate the further utility of the BOW model in the SSIO research context.Specifically, we use it to extend the Digital DNA concept, allowing us to both cluster and describe the behavioral profiles of SSIO agents operating across the world for the last decade.

METHODS
To answer our research questions, we utilize all 34 SSIOs available on the Twitter Information Operations archive [53], as well as a random sample of "authentic" Twitter users, collected via the sampling technique proposed by Liang and Fu [41].We attempt to answer RQ1a (to what degree does behavioral homogeneity hold for SSIOs on Twitter?) using the Digital DNA and LCS approach proposed by Cresci et al. [18].While this method proved extremely useful for studying early social bots in the wild, we find that it is too sensitive to noisy behavioral patterns, and not scalable to large datasets, making it ill-equipped to handle the longitudinal coordination of SSIOs.This prompts us to propose our own, more robust adaptation of Cresci et al. 's Digital DNA approach, using the BOW-model to simplify comparison and clustering of behavior.We use this BOW-embedded Digital DNA to answer RQ1a and RQ1b (to what degree does relative self-similarity hold for SSIOs on Twitter?).To answer RQ2 (what kinds of behavioral patterns are exhibited by state-sponsored information operations and have these evolved over time?), we apply a heuristic approach for classifying clusters of agents to broadly classify their activity and trace their usage over time.

State-Sponsored Information
Operations.SSIO behavior is represented using the accounts and tweets available on the Twitter Information Operations archive [53]: a continuous data sharing initiative that began in October 2018.This archive, at the time of writing, contains the complete Twitter activity of accounts tied to 34 different SSIOs.The countries who sponsored these operations, and the countries they targeted, are shown in Figure 1.In total, these operations produced roughly 207 million tweets authored by 78,151 accounts.While the reporting of these operations began in 2018, many of the operations were active before then, with some of the oldest operations having accounts that were active in 2009.It is worth noting that some (but not all) of the accounts that were active in the early 2010's are likely hijacked accounts, or genuine accounts that were purchased, stolen, etc. by operation agencies as an easy way to skirt suspicious account creation checks done by Twitter [10].It is also important to note that the criteria used by Twitter for identifying these accounts as being part of an SSIO are not made publicly available.In this paper, we do not go into detail on the suspicious behaviors that these operations exhibited that may have led to their detection.However, this information is available in the blog posts written alongside data releases by the Twitter Safety Team [54].

Baseline Dataset.
In order to compare the behaviors exhibited by accounts participating in the SSIOs described above to some baseline of "authentic behavior" on Twitter for RQ1b, we manually collect a representative sample of users via the methodology proposed by Liang and Fu [41].We begin by generating 100,000 random numbers in the range 0 to 5 billion and then searched for these Twitter IDs using the Twitter REST API.This returned 46,005 valid Twitter accounts.We then used the REST API to retrieve the users' most recent tweets, retweets, etc. up to the limit of 3200.This left us with a collection of roughly 6.5 million tweets.We imposed no restrictions on when or where tweets were produced, what language was used to produce them, or how many times a user produced tweets.

Describing
Behavior with Digital DNA.In order to address our first research question, and investigate the behavioral homogeneity of SSIOs, we make use of Cresci et al.'s sequence based identification of unique behaviors, which they refer to as Digital DNA [18].A key benefit of Cresci et al's.method is that it allows for the combination of multiple alphabets to describe multiple

Sponsor Country Target Country
Fig. 1.SSIO sponsor and target countries.Each flow represents an operation, its thickness representing the number of agents involved, and its color representing whether the target of an SSIO was domestic (blue) or foreign (yellow).Most countries explicitly target a single country (often their own), while a few countries with more extensive counterintelligence infrastructure (Russia, Iran) have multiple targets, both foreign and domestic.
dimensions of behavior.We choose to utilize a combination of three alphabets, shown in Figure 2, to describe behavior.
The alphabets used to encode state-sponsored information operation agent behavior.Every tweet is described using a three character sequence, relating to the type of activity ( 1 ), the time since last action ( 2 ), and what type of tool was used to author the tweet ( 3 ).
We use  1 , which describes the type of tweet and is drawn directly from Cresci et al.'s work, as it does a satisfactory job of classifying the primary types of activities on Twitter.Since early SSIO research [1,33] has established that temporality is an important dimension of basic SSIO functionality, we use  2 to encode the scale of time elapsed between the current activity and most recent previous activity.Finally, as Bradshaw and Howard's 2019 report suggests that solely human-operated accounts and solely automated accounts are far more common than hybrid/cyborg accounts [10], we utilize  3 to investigate the role of automation in SSIO activity.Specifically, we categorize tweets as using standard, automated, or custom publishing tools (referred to as sources hereafter) as follows: the first author searched on a web browser for every source attribute found in our data set (as the Twitter Information Operations archive currently does not provide the URL of the source app).Apps that exist in the native Twitter ecosystem, such as Twitter for mobile devices, Twitter Web Client, etc., were coded as "standard", while apps that explicitly mentioned automated posting, scheduled posting, or control of multiple accounts, such as dlvr.it or Zapier.com,were coded as "automated".Any source apps that did not advertise automation or coordination as a functionality, or that did not return any websites when searched, were labeled as "other".

Clustering Digital DNA Sequences with LCS-Curves.
In order to answer RQ1a and determine whether an SSIO is behaviorally homogeneous, we utilize clustering of behavioral strands.Specifically, if the majority of an SSIO agents' behavioral sequences, defined using the alphabet defined above, are put into a single cluster, then we can assume that the SSIO is behaviorally homogeneous.In contrast, if these sequences are divided into multiple clusters, with no one cluster representing a clear majority, then we would claim that the SSIO is behaviorally heterogeneous.To cluster sequences based on similarity, we borrow from Cresci et al. 's later work with Digital DNA, where Longest Common Substring (LCS) curves were used to detect subgroups of highly similar users out of a mix of social bots and genuine users [20].Specifically, the discrete derivative of the LCS curve can be used to separate users into a behaviorally homogeneous cluster (or clusters) and a noise cluster, where simple peaks in the derivative of the LCS curve indicate the boundaries of separation.
However, the power of LCS to cluster comes at a high computational cost.A major limitation of the LCS method is that it requires the use of a generalized suffix tree, which is not scalable to large datasets [17].In order to use LCS on our collection of SSIOs1 , only the most recent 2000 activities of an SSIO agent were used to construct a digital DNA sequence.The most recent activities, rather than a random set, were used because temporality is a key feature of digital DNA, and without temporality, the longest common substring is meaningless.Even with this substantial adjustment, this method was unable to scale to the three largest operations in our dataset: the Saudi Arabian operation in December 2019, the Turkish operation in June 2020, and the Serbian operation in April 2020.We found that even when adjusting this approach to only consider the most recent 500 activities for these three operations, the resulting generalized suffix tree became too large to keep in RAM.Thus, these three operations are excluded from our analysis of RQ1a using this method.

Measuring Social Bot
Assumptions with Digital DNA, BOW, and k-means 3.3.1 Describing Behavior with BOW-embedded Digital DNA.The results of our initial investigation of RQ1a give us some indication that our hypothesis is true, or in other words that the core assumption of the social botnet framework does not hold for SSIOs.However, we do not have a high degree of confidence in these results, as Cresci et al.'s method is both unable to scale to all operations in our dataset, and identifies a suspiciously small number of agents per operation as coordinating.To address this question with greater confidence, we apply the robust Bag of Words (BOW) model to numerically embed Digital DNA sequences into a more workable space.Specifically, we embed the sequences that were created with the alphabet in Figure 2. Since the size of this alphabet is | 1 | * | 2 | * | 3 | = 54, each SSIO agent's behavioral sequence is mapped to a 54-dimensional vector.Each component of this vector represents the relative frequency of a "word" in the dictionary, where a word is a 3-tuple made of one character from each of  1 ,  2 , and  3 .The benefits we reap from this approach are: (1) by working with vectors, as opposed to sequences, we are able to scale this approach to all operations in our datasets, (2) vectors are easier to work with than sequences and are thus able to be clustered using much simpler methods than LCS-curves, and (3) BOW vectors are much more interpretable than common substrings, meaning that we can summarize the behavior of the clusters we identify.These benefits come at the cost of nuance -the BOW model throws away the ordering of sequences, making it less sensitive to minor deviations in behavior, but also losing some signal in the process [30].However, this has proved to be an acceptable trade-off in other domains [51], and prior work has found that LCS-curves are limited in utility due to their inability to scale [44] and intense sensitivity to minor differences between two sequences [37].As such, we find that the benefits of the BOW model outweigh the drawbacks, and move forward in answering our research questions with BOW vectors.

3.3.2
Clustering BOW embedded Digital DNA.Since we are skeptical of the results of our analysis using LCS-curves to answer RQ1a, we revisit this question, now using our more robust BOWbased approach.We use the same approach as before, and utilize clustering to indicate if an SSIO is behaviorally homogeneous or not.Specifically, we look for the presence of a single cluster containing a majority of the agents, indicating homogeneity, or a collection of smaller clusters containing only some of the agents, indicating heterogeneity.However, since we are now working with BOW vectors, as opposed to sequences, we are able to use more common clustering techniques.In particular, k-means is one of the most common clustering methods used with BOW models [51], and so we follow this precedent, testing values of k in the range 0 to 20, and using elbow detection on the plot of within-cluster sum of squares for each k to pick the optimal k value for each SSIO.
An additional advantage of this BOW-based method for clustering behavior, as opposed to sequence based ones, is interpretability, demonstrated in Figure 3. Specifically, the original approach used by Cresci et al. relied on common substrings, and thus substrings are the only way of describing clusters.In contrast, our BOW-based approach clusters agents based on relative frequency of "words" from our alphabet (Figure 2), which can then be directly used to describe the primary behavior(s) of agents in a specific cluster.For example, looking at the heatmap of the Ecuadorian SSIO of September 2019 in Figure 3, we can see that the largest cluster of agents (cluster 6: ) primarily retweets using standard tools, but infrequently engages in other activities, though still using standard tools.In contrast, cluster 2 ( ) exhibits a clear focus on replying to other users via custom tools exclusively.This ability to describe behavioral themes of clusters becomes particularly useful in understanding behaviors expressed in clusters across operations, in the service of answering RQ2.

3.3.3
Measuring SSIO Relative Self-Similarity.While clustering BOW vectors allows us to measure behavioral homogeneity in SSIOs, we adopt a slightly more sophisticated measure for determining if agents in an SSIO are more relatively self-similar than genuine Twitter users in order to answer RQ1b.To investigate relative self-similarity, we first find the centroid for each SSIO, as well as the centroid of the baseline Twitter dataset.We then calculate the distance from all points in each group to their respective centroid, combine all of these into one distribution, and then compare it to the distribution of distances from points in the baseline dataset to the baseline's centroid (Figure 4A).Specifically, we make this comparison using a one-sided Mann-Whitney U (MWU) test.We use this non-parametric test, because we are interested in comparing the means of two distributions that are not normally distributed.We make this shift from clustering to centroid-point distance measures because the more nuanced understanding of the assumption of relative self-similarity does not necessarily require all agents to have the same behavior (i.e.belong to the same cluster), but only that their behaviors are more similar than the spectrum of behaviors exhibited by the genuine crowd.
In addition to this overall test of relative self-similarity, we make pairwise comparisons of each SSIO individually to the baseline distribution (Figure 4B).To this end, we use a Kruskal-Wallis test (KWH), with Bonferroni-corrected, one-sided Mann-Whitney post hoc analysis to test if SSIO distribution medians are smaller (more relatively self-similar) than the baseline distribution median.Similar to our choice of MWU, we choose the KWH test becase we are interested in comparing samples from a non-normal distribution.This gives us a more fine-grained way of measuring how well the practical operationalization of the social bot framework fits SSIO activity.

Describing Behavioral Cluster Complexity
In order to describe the complexity of behavior expressed in an SSIO agent cluster (RQ2), we adopt the following heuristic: for each cluster identified using k-means, we calculate the average  We then do the same for the genuine crowd (marked with △).We then combine all SSIO distributions into one, representing the overall spread of SSIO similarity in our dataset.We then compare this to the genuine distribution using a one-sided Mann-Whitney U test.In Part B (right), we compare each SSIO distribution separately to the genuine crowd using a one-sided Mann-Whitney U (MWU) Test with Bonferroni correction.proportion of each column of the BOW vectors in that cluster.These averages correspond to the average proportion of SSIO agent activity in that cluster that is dedicated to one activity.We take the sum of the two largest columns and then classify clusters using that sum and the heuristic presented in Table 1.
The cutoffs of this heuristic were chosen to distinguish how clear of a majority the two most common activities make up in a given cluster.Clusters with a clear focus have a near single-minded orientation towards a specific task, such as automatically tweeting or retweeting every couple seconds -in line with the behavior of first wave social bots [19].Clusters with a semi-clear or even diffuse focus are more akin to what is described in the third wave of social bot literature, where individual agents do direct their main activity towards one task (making groups of them detectable), but engage in a variety of others to obfuscate their purpose and make themselves appear more similar to humans than early spam bots [17].Clusters which lack any focus at all are perhaps the trickiest to orient in the social bot literature, but seem to be best classified as second wave social bots, which go to great lengths to drop any indication of spammed or repetitive behavior.We choose to look at the two largest columns, as these effectively correspond to the primary function of a cluster and either the secondary function or the activity it performs in order to obscure its primary function.

FINDINGS
The goal of this study is to determine if the social bot framework is appropriate for the SSIO context.Specifically, we investigate if this framework's key assumption (behavioral homogeneity), and the way that it is commonly operationalized in practical work (relative self-similarity) hold for SSIOs found on Twitter over the last decade.Overall, we find that neither of these hold in the SSIO context, largely due to the fact SSIOs are not composed of homogeneous agents, but instead display a clear division of labor, where clusters of agents in an operation focus on specific activities.By measuring the complexity of these agent clusters, we find that while nuanced social bots are commonly employed, simple spamming agents (more akin to what is found in the first wave of social bot research) that primarily focus on tweeting and retweeting are just as common.Temporally, third wave-like agents, have become less common in recent years, with simpler first wave-like agents (with a clear behavioral focus) becoming more prevalent.

SSIOs are not Behaviorally Homogeneous (RQ1a)
In this section, we first present our initial investigation of this research question using Cresci et al. 's proposed method of using LCS-curves [20].After highlighting the ways that this method falls short in the SSIO context, we move on to answering this question using our novel method of embedding Digital DNA using the BOW model.

Measuring SSIO Behavioral Homogeneity with LCS-curves.
Using LCS-curves to identify coordinating agents in SSIOs, we find that for over half of the operations in our dataset2 , less than 5% of the operation is found to be behaviorally homogeneous.Even when considering the operation with the largest proportion of behaviorally homogeneous accounts, the Ghana operation in March 2020, only 30% of the operation is determined to be behaviorally homogeneous (Figure 5).
To further show this lack of homogeneity, we calculate the proportion of accounts in an SSIO that have a longest common substring (LCS) of at least 10 (Figure 6).This cutoff is a full order of magnitude lower than the LCS for social bot accounts found in Cresci et al.'s original analysis [20].Even under this incredibly relaxed criteria for considering accounts to be coordinating, the majority of SSIOs in our dataset have less than 30% behavioral homogeneity amongst its agents.
While these results give us some indication that SSIOs do not satisfy the social bot assumption of behavioral homogeneity, we lack confidence in them.The suspiciously small proportion of agents identified as behaviorally homogeneous above suggests that the method proposed by Cresci et al. [20] does not translate well in this context because of its reliance on substrings.We can see this more clearly by visualizing the median Digital DNA length for each SSIO, shown in Figure 7.
From Figure 7, we can see that for the majority of SSIOs in our dataset, the median agent Digital DNA length is quite small (under 200).These small sequences make finding long common subsequences unlikely.This is not to mention the extreme sensitivity that LCS-based approaches have to minor deviations between two strings [37], making long common subsequences even more unlikely in this context.Comparatively small sequences, sensitivity to minor deviations between strings, and inability to scale all make this LCS-based approach ill-suited for answering our research

Proportion of Agents in Largest
Cluster (using LCS-curves) Fig. 5. Histogram of the proportion of agents found to be behaviorally homogeneous using LCS-curves.
The majority exhibit nearly no homogeneity, and none exhibit a majority of their agents being homogeneous.questions.As such, we turn to using our BOW-based method to re-answer question RQ1a, and use it throughout the remainder of this paper.4.1.2Measuring SSIO Behavioral Homogeneity with BOW Vectors and K-means.Using our novel, BOW-based method to identify coordinating agents in SSIOs, we find that the median proportion of behavioral homogeneity across SSIOs is 31.8%.In other words, across SSIOs in our dataset, the largest cluster of agents generally is composed of less than a third of the total number of agents participating in the operation (Figure 8).
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 For a clear majority of SSIOs in our dataset, the cluster of behaviorally homogeneous agents make up less than 40% of agents participating in the operation.Only six operations exhibit greater behavioral homogeneity than this, with an outlier being the Russian operation in October 2020, with 80% behavioral homogeneity amongst its agents.

Proportion of Agents in Largest Cluster
These results provide further evidence that SSIOs do not exhibit high levels of behavioral homogeneity.With the exception of four operations in our dataset, no SSIO has a majority of its agents exhibiting behavioral homogeneity.Beyond this, we find that SSIOs seem to exhibit a pretty clear division of labor, with SSIOs in our dataset being split into an average of 7.0 clusters ( = 1.5).Furthermore, an individual cluster of an SSIO typically only contains about 14.3% of the agents participating in that operation on average.We explore these results in greater detail when answering RQ2 below.

SSIOs are not Relatively Self Similar (RQ1b), but Agent Clusters Are
The results above leave us with enough confidence to claim that SSIOs are not behaviorally homogeneous.While this problematizes the use of the social bot framework in the SSIO context, it does not necessarily pose a practical problem so long as SSIOs exhibit greater relative self-similarity than the authentic Twitter crowds that they operate in.In order to investigate relative self similarity, we first combine all of the SSIO user-to-centroid distributions and compare this to the genuine crowd user-to-centroid distribution, shown in Figure 9.
We find that genuine crowd user-to-centroid distances are not larger than the combined activity of all SSIOs in our dataset (  = 1694177217,  = 1.0), meaning that SSIO activity, as whole, is not more relatively self-similar than authentic activity.To break down this result more finely, we now analyze pairwise comparisons between each SSIO distribution individually to the genuine crowd distribution.After establishing that these SSIOs and the genuine crowd are not drawn from the same distribution (  = 12722.0, < 0.01), we find that only five of the 34 SSIOs in our dataset have significantly smaller median user-to-centroid distances than the genuine crowd.In other words, only five operations in our dataset exhibit greater relative-self similarity than the

Count of Records
Genuine Crowd SSIO Clusters

User-to-Centroid Distances of SSIO Clusters compared to Genuine Users
Fig. 10.Distribution of user-to-centroid distances for genuine users, and a sample of the combined distribution of all SSIO cluster user-centroid distributions (samples taken for visual clarity).
genuine Twitter crowd and satisfy the operationalization of the social bot framework.These five operations are the Russian operation in June 2020, the (second) Venezuelan operation in January 2019, the joint U.A.E. and Egyptian operation in September 2019, the Cuban operation in October 2020, and the Serbian operation in April 2020 (all of which have p < 0.01).This result means that in addition to the majority of SSIO agents not being behaviorally homogeneous in a given operation, SSIOs generally do not even exhibit greater relative self-similarity than the genuine Twitter crowd.While SSIOs overall do not exhibit relative self-similarity, we get a different picture when we consider SSIO agent clusters, as opposed to the full operations.If we calculate centroids, and then user-to-centroid distances, by cluster instead of by SSIO, we can measure the relative self-similarity of the individual agent clusters that SSIOs are composed of.In Figure 10, we show the combined distribution of SSIO cluster user-to-centroid distributions compared to the genuine crowd user-tocentroid distribution.When now considering SSIO clusters of agents, we find that genuine userto-centroid distances are larger than the combined activity of SSIO clusters (  = 2794889693,  < 0.01).In essence, this means that while SSIOs violate the assumption of relative self-similarity, the individual clusters of agents that participate in them typically do not, and might be more readily detected by techniques that leverage this assumption.

SSIOs are Increasingly Making Use of Simple, First Wave-like Agent Clusters (RQ2)
To analyze the complexity of activity undertaken by SSIO agents, we apply the heuristic approach described in Section 3.4 to describe SSIO agent clusters as being either clearly focused, semi-clearly focused, diffusely focused, or not focused on a set of the two most common activities for that cluster.The proportion of clusters of each type active each year is shown below in Figure 11.
As can be seen in Figure 11, year after year, SSIOs have made greater use of clearly focused groups of agents, what we would consider to be more akin to first wave social bots.Similarly, we can see a decline in semi-clearly focused agents year after year, which exhibit traits more in line with third wave social bots.This demonstrates why understanding the temporal trends of SSIO evolution by investigating its division of labor is so critical: in a time where social bot research is

Clear Focus Diffuse Focus
No Focus Semi-Clear Focus

Proportion of Agent Cluster Types by Year
Fig. 11.The proportion of agents active each year by cluster type.While semi-clearly focused agent clusters, what we would consider to be more social-bot like, were incredibly common early on, the last several years has seen a trend towards more clearly focused clusters, which are more akin to simple bots or trolls.We also see that throughout the decade, diffusely focused clusters without a clear purpose, make up a considerable proportion of clusters active at any given year.
focusing more and more on understanding the behavior of third wave social bots [17], the state of SSIO development seems to be moving in the opposite direction -simplifying more and more of the process through the use of first wave-like social bots and simple coordination.
As clearly and semi-clearly focused agent clusters make up the clear majority of activity exhibited by SSIOs each year, it is worth investigating the specific core activities that these clusters take up in more detail.Specifically, we look at the co-occurence of the two most common behaviors across clearly and semi-clearly focused agent clusters, shown in Figure 12.
The most striking result in Figure 12 is that for the 184 clusters labeled as either clearly focused or semi-clearly focused, 174 of them exhibit incredibly simple core behaviors, either focusing on spammy tweeting or retweeting (occurring every couple seconds to minutes, or at least daily).Additionally, these 174 clusters only utilize one tool (either standard, automated, or other 3rd party) for their core activity, rather than a combination of them.Furthermore, out of the entire group of focused and semi-focused clusters, only 28 exhibit two different activities as their core behavior (i.e. both tweeting and replying) and only 4 exhibit the use of multiple tool types (i.e. both standard and automated tools) in their core behavior.Since we are only looking at the two most common behaviors, this does not mean that these clusters don't exhibit multiple behaviors or use multiple tools, but it does mean that for the majority of their work, they stick to one activity and one tool.In contrast to the newest waves of social bots found online, these activities are far more similar to

Second Most Common Behavior
Fig. 12. Heatmap of the co-occurrence of top SSIO agent behaviors measured across focused and semi-focused clusters for all SSIOs in the dataset.Thick grid lines separate behaviors that use different tools, while thin grid lines within the main diagonal separate behaviors that engage in different actions.For the majority of clusters identified in our dataset, the two most common behaviors for a given cluster utilize the same tool-types to perform the same activity.
early first wave social bots [17], which utilize very simple spamming behavior, and do not readily make use of hybridized accounts.

PRACTICAL IMPLICATIONS OF SSIOS AS COOPERATIVE WORKERS: ASSESSING "COVERAGE" OF SOCIAL BOT DETECTION TECHNIQUES
At this point, we have demonstrated that the social bot framework is an inappropriate one for studying SSIO activity, due to both SSIOs as a whole violating the core theoretical assumption of the framework, as well as how this assumption is operationalized in social bot detection work.
The heterogeneous, clustered nature of SSIOs (Section 4.3) suggests to us that these operations are better described as instances of cooperative with social roles.In this section, we provide further evidence that cooperative work is a more fitting top-level framework by using it to quantitatively identify strengths and weaknesses of social bot detection techniques in the SSIO context.The vast majority of works presenting new social bot detection techniques use common machine learning measures of accuracy [46].However, from a cooperative work perspective, understanding what proportion of agents are labeled correctly is only part of the picture.Equally important is understanding what types of agents are identified by the method and which aren't.A key benefit of our BOW-based method for clustering SSIO agents is that in addition to being able to identify the key activities of a cluster, we can apply simple heuristics to broadly classify their complexity and level of coordination.In this section, we make use of these broad classifications to measure what we refer to as "coverage" of a host of social bot detection techniques.Coverage, simply, is the true-positive rate of a technique across each one of the four levels of coordination/complexity produced by the heuristic presented in Section 3.4.Separate from measures of accuracy or precision, coverage allows us to determine whether a detection technique performs equally well across all of levels of coordination and complexity present in current day SSIOs (i.e. if it's performance is generalizable across the whole campaign), or if it is attuned to detect one level of coordination (i.e. it performs particularly well for certain kinds of clusters, but not others).
With this notion of coverage in mind, we now describe which social bot detection methods we chose to investigate and why.We root our exploration of detection techniques in Cresci's concept of the Three Waves of social bot evolution [19].Since each of these waves brought new considerations for researchers developing detection techniques, we identify detection techniques as belonging to a certain wave of social bot research based on whether they aim to detect: simple spam and automated behaviors (wave one), individual social bots through primarily supervised means (wave two), or groups of social bots primarily through unsupervised means (wave three) [17].We largely leverage longitudinal reviews of social bot detection work, such as those presented by Orabi et al. [46] and Latah [40], to classify which wave detection techniques belong to.
In total, we selected six total techniques -two per wave of social bot research.These six techniques cover the main strategies employed in social bot research over the last decade.The six techniques that we chose, as well as our rationale for including them, are presented in Table 2. Implementation details for these methods be found in the Appendix.Of note, our selection is missing some notable detection techniques due to three restrictions we imposed on our exploration.First, similar to the methodology of Echeverría et al. [25], we avoid detection techniques which make extensive use of social graph information.In addition to being time-consuming to collect, this data is not accessible for any of the SSIOs in our dataset as all of their constituent accounts have been suspended.In a similar vein, we also avoid all detection techniques that require real-time use of the Twitter API, due to the fact that all SSIO accounts in our analysis are suspended and not accessible via the API.While necessary, this restricts us from using some of the most common detection techniques, such as Botometer [22] or DeBot3 [15].Finally, we avoid any detection techniques that heavily utilize metadata that we do not have access to through the SSIO Twitter archive, such as membership to lists or geographic location.While it is possible that these features would be useful for finding SSIO activity "in the wild", it is impossible to ground that argument in any analysis done on the Twitter Information Operations archive.
Having established our rationale for including the six detection techniques listed in Table 2, we move on to our analysis and discussion of their coverage of the different levels of coordination present in our SSIO data.Our findings on the coverage of these detection techniques are presented in Table 3.The most immediate result from the table, and perhaps the most striking, is that each of the three waves of social bot research seems to have yielded detection techniques that are tuned to detect a certain level of coordination.Wave one techniques specialize in detecting agents with a clear focus on a specific behavior, which makes sense considering that these techniques work by looking for obvious signs of automation or repetitive content.In contrast, wave two techniques swing to the other end of the spectrum and have the most coverage of non-focused agents, likely reflective of the intense focus that these techniques have on avoiding clear signs of automation [19].Finally, wave three techniques cover the in-between, with their best coverage split between semiand diffusely focused accounts.As wave three techniques are generally group-based detectors that find similarities on a case-by-case basis [17], they are best suited for finding accounts that are more explicit in their coordination than wave two-like users, but appear more genuine that wave one-like users.
It is important to note that a method having its best coverage on one level of coordination does not necessarily mean it has the best coverage of that level out of the six detection techniques.For instance, in Table 3 we see that while the wave one techniques perform their best on detecting clearly focused agents, Cresci et al's modified detector [19] has greater coverage of that same level than either wave one technique.The takeaway is not that one of these techniques is better than the other six.Rather, our key result from this analysis is that regardless of what detection technique one chooses from the social bot literature, when it is applied in the SSIO context is is almost certain to be biased towards detecting a specific type and complexity of coordination.This is further supported when looking at the difference between the most and least covered level of coordination for each technique.We can see that at a minimum, there is greater than a 11 percentage point difference between these two proportions in the best case scenario (Cresci et al. 's modified method [19]) and at worst, well over 27 percentage points (Wang's technique [69]).This stark difference suggests that both classical and state of the art social bot detection techniques are not one-size-fits-all, a conclusion supported by the findings of Echeverriá et al. [25].Rather than SSIOs in their entirety, individual social bot detection techniques are suited to detecting SSIO agent clusters of a certain type or complexity.The key practical finding from our analysis here draws upon the fact that SSIOs are not selfsimilar, but that their constituent clusters are (Section 4.2).If SSIOs are not behaviorally self-similar, then attempting to apply a single social bot detection technique (which assumes this quality at its core) to find them is both theoretically and practically flawed.Instead, the appropriate unit of analysis that social bot detection techniques should be applied to is individual SSIO agent clusters.Given that these clusters exhibit varying levels of coordination from one another (Section 4.2), then the most successful strategy for detecting them is to combine techniques that each specialize in detecting a specific level of coordination, similar in spirit to the approach proposed by Sayyadiharikandeh et al. [55].We propose that our novel method for clustering SSIO agents and the heuristic that it enables are excellent tools for realizing this ensemble approach to SSIO detection.Alongside traditional measures of accuracy, coverage describes the ability of detection techniques to generalize across the variety of SSIO activity and to identify the types of agents that it is best and least suited for detecting.Using this metric as a guide, researchers can select a set of detection techniques that each specialize in detecting different complexities of coordination in order to identify SSIOs "in the wild".

DISCUSSION
In this paper, we have critically examined how the social bot framework holds up in the SSIO context.We showed that a majority of the time, SSIOs violate both the core assumption of the social bot framework (RQ1a) and the way that it is commonly operationalized in practical work (RQ1b).We demonstrated that a core reason for this departure from the framework is that SSIOs exhibit a division of labor, utilizing several clusters of agents, each of which perform a specific task.By classifying the complexity of these agent clusters (RQ2), we find that third wave-like agents were the most common type used in the beginning of the 2010's, but over time, they have been supplemented more heavily with agents that exhibit simpler spam behaviors.Perhaps most importantly, our analysis surfaces that SSIO agent clusters are the critical unit of analysis from a detection perspective.Furthermore, existing detection techniques are tuned to perform well in finding clusters of a certain complexity, at the cost of performance in finding the others (Section 5).The following section explores the implications of our findings in greater detail.We also recommend potential opportunities for future research directions inspired by our work.

SSIOs as Cooperative Workers with Social Roles
Early research in the SSIO space surfaced a few operations which, while studied using the social bot framework, exhibited significant deviations from the behavior assumed by this framework [1,8,33].In this paper, we generalize these findings across a decade of SSIOs conducted on Twitter and find that SSIO behaviors violate both the core assumption of the social bot framework, and the way that it is operationalized practical work.Specifically, we find that these operations differ substantially from social bots in that they divide up tasks and activities across their agents, as opposed to every agent engaging in roughly the same set of activities to achieve the same goal [32].Some agents specialize in producing content, while others focus on distributing content through rapid retweeting using automated or even custom tools, while others still take on the role of engaging with other users, replying to other tweets periodically throughout the day.
Turning to cooperative work, it becomes clear that the roles found in SSIOs allow agents to contribute to the high level goals of an operation, without having to know what exactly those goals are [70].We find that the roles exhibited in SSIOs are also a manifestation of cooperative work [56] -SSIO agent clusters typically specialize in a single activity and use a specific tool or technique, shown in our analysis of commonly co-occurring behaviors (RQ2).SSIOs exhibiting cooperative work also helps explain the temporal trends we see in the usage of focused and semi-focused agents over time (Figure 11).When SSIOs were first being conducted at the beginning of the 2010's, the goals and tactics of these operations were intensely varied [24], reflective of the lack of prior examples on which to build.As time went on, knowledge was diffused both indirectly and explicitly [10,14,45], leading to successful tactics and meaningful goals becoming more crystallized [62].When overarching goals become more stable, individual improvisation is deemphasized and roles take more stable shape [70].In the context of SSIO activity, as the goals of SSIO sponsors have become more crystallized, the improvisation embodied in semi-focused clusters has become less needed and the stable social roles, embodied by clearly focused (spam-like) clusters, has become more emphasized.In short, we argue that the cooperative work framework, in conjunction with social roles, helps us to understand how unique agents are able to coordinate to successfully operate SSIOs, why these agents typically focus on one activity and toolset, and how third wave-like social bot usage has declined in SSIOs while being displaced by simpler spam behaviors.

Implications of the Cooperative Work Framework for SSIO Detection Research
By reframing SSIOs as cooperative work, rather than social botnets, the most important practical consideration that arises from our work is that SSIO agent clusters are the critical unit of analysis in detection work, rather than SSIOs in their entirety.Our analysis in Section 4.3 demonstrates that SSIO agent clusters exhibit varying levels of complexity and coordination from one another.This poses practical challenges in light of our findings in Section 5 that social bot detection techniques from across the last decade seem to be tuned to detect only one level of this behavioral complexity.In short, any single social bot detection technique, when applied to the detection of SSIOs, will disproportionately misclassify agents depending on how complex their behavior and coordination is.What's more, the current trends in social bot research seem unlikely to alleviate this problem, as research in this area focuses more and more on the detection of wave three-like social bots [17], while trends in SSIO composition seem to point towards SSIOs increasingly supplementing their operations with wave one-like agents (Figure 11).
Despite the hurdle our findings pose to SSIO detection efforts, we also find cause for optimism.From our results in Table 3, we can see that amongst the exemplar social bot detection techniques we have chosen, for each level of coordination (from clearly-focused to no focus), there is a technique whose greatest coverage is in that category.This indicates that the easiest path forward for detecting SSIOs, using the detection techniques we already have available, would be to use an ensemble of detectors, each of which is tuned to detect a different level of complexity.Our results in Table 3 offer insight into the kinds of detectors one might try to combine when detecting SSIOs (temporallybased spam detectors for finding clearly focused agents, content-based unsupervised techniques for finding diffusely focused agents, etc.).By combining these differently tuned techniques, we would achieve greater coverage across the entirety of the SSIO than any one of these techniques could offer on own.
More generally, we see our novel method for SSIO agent clustering (and the heuristic presented in Section 3.4 that it enables) empowering SSIO detection work in two key ways.First, as more SSIO data is made available (whether that be through continued Twitter initiatives or the work of independent researchers), our methodology for measuring cluster complexity can be used to infer longitudinal trends in SSIO construction as done in Figure 11.These insights would aid researchers in prioritizing what methods to choose when forming ensembles of detectors.For example, our current understanding of SSIO trends suggests that detection techniques suited for finding wave one-like accounts are becoming increasingly important and perhaps even moreso than those suited to finding wave three-like accounts.Second, our methodology presented in Section 5 can be used to evaluate and aid in the development of new SSIO detection techniques.In essence, the Twitter information operations archive [53] serves as an ideal test set for new detection tools.Given our results from Section 5, we know that simply measuring accuracy of novel techniques runs the risk of missing the potential bias of that technique towards a specific level of coordination.As such, our method for measuring cluster coverage can assess generalizability across the varied cluster activity of an SSIO and serve as a complement to more traditional machine learning measures of accuracy.In short, the methodology we present in this paper could be used to monitor longitudinal trends in how SSIOs are constructed, while also providing a means for researchers to evaluate and therefore develop detectors that generalize across the varied complexity of SSIO activity.

Limitations and Future Work
In this study, we have focused our attention on Twitter's Information Operations archive [53] in an effort to move beyond the obsolete and sparse datasets that commonly pose problems in social bot research [17].While we feel that this is an appropriate means of studying SSIO activity, it is important to recognize that this approach does have important limitations.By tying our work to Twitter's archive, it is unclear how generalizable the method we propose in this work is to other social media platforms.More generally, as our work only explores SSIO activity on a single platform, it is also unclear how our method could be applied across multiple platforms at the same time.Given these limitations, a valuable direction for future work would be to analyze the fit of the cooperative work and social roles framework for SSIOs conducted on other social media platforms.In particular, Facebook appears to still be the platform of choice for those looking to wage information operations [10] and TikTok is becoming increasingly targeted in order to reach younger audiences [12] -as such, these would be the most impactful platforms to extend the cooperative work framework to.The impact of this future work would be multiplied if it could identify behavioral alphabets that work for all of these platforms, rather than just one.This would make comparing behaviors across platforms more achievable, and thus aid in combining data from multiple platforms to find more wide-reaching SSIOs.In particular, we believe that some of the content-based approaches that Starbird et al. [65] and DiResta et al. [24] have taken towards understanding multi-platform information operations could serve as an excellent starting point for developing more generalizable alphabets.
Additionally, a current limitation of our BOW embedded, digital DNA clustering method is that it assumes stationarity of the behavioral clusters.In other words, our method currently assumes that agents exhibit a similar behavioral pattern over the entire course of their lifetime, and precludes the possibility that agents might move from one behavioral cluster to another throughout the duration of the SSIO.Prior work has shown that genuine users in extremist communities tend to navigate multiple roles over time [50].Since SSIOs commonly operate in these types of communities [7,9], it is reasonable to suspect that SSIO agents might also take on multiple roles over time.We believe that an exciting opportunity for future work to address this limitation would be to relax this assumption of behavioral stationarity.This enable exploration of temporally meaningful segmentations of an agent's behavioral traces, to determine whether there is movement between behaviorally homogeneous clusters.In particular, work towards understanding the level of retention and churn in spam-like clusters to gauge how stable they are over time would be incredibly useful for informing future detection methods.
Finally, while our novel agent clustering method is able to derive insight from SSIO activity, our choices for how to encode agent activity are largely derived from previous work in the social bot literature [18].As such, a useful direction for future work would be to develop a more SSIO specific alphabetical encoding -particularly one that can generalize across social media platforms and information ecosystems [24].Additionally, since much of the detection work that we build off of in this work is informed by the social bot literature, and a key argument of our work is that clusters of SSIOs behave very similarly to different waves of social bots, our method cannot be used to distinguish between SSIO agent clusters and smaller, self-contained social botnets.We believe that one of the most straightforward means of filling this gap would be for future work to investigate the network dynamics between SSIO agent clusters.Specifically, if future work is able to identify ways in which SSIO agent clusters interact or are otherwise connected with each other, the presence or absence of these connections with other groups of users could indicate whether a detected set of accounts is an SSIO agent cluster, or simply a self-contained social botnet.
More generally, since our work has established the theoretical and basic practical support for describing SSIOs as cooperative workers with social roles, the primary direction for future research is to engage with the practical implications of this shift in framework.We believe that the most urgent direction for future work is to conduct richer statistical and empirical analysis of the SSIO agent clusters whose existence we identify in this work.We believe that this work could move our understanding of SSIO activity from general awareness of the division of labor that they engage in towards specific insight into the common social roles of these operations and how they contribute to operation longevity, reach, and ability to avoid detection.

CONCLUSION
In this paper, we have critically examined the appropriateness of the social bot framework in the context of state-sponsored information operations, or SSIOs.To this end, we develop a novel method, based on Digital DNA and extended using a BOW embedding, to cluster and describe behavior of SSIO agents.By applying this tool to one of the most comprehensive SSIO repositories in existence, we find that SSIOs violate both the core assumption of the social bot framework, and the way this assumption is operationalized in practical detection work.Our findings indicate that this is due to the fact that SSIOs do not exhibit one core behavioral pattern, but instead utilize a division of labor, where different pools of agents serve different roles.We propose that cooperative work with social roles is a more comprehensive framework for understanding SSIO behaviors, while still allowing social bot detection techniques to be applied -simply at a different granularity and more selectively.Through the cooperative work lens, we find that as SSIO goals have become more clear and crystallized, SSIO roles have become simpler and more stable over time.Overall, our findings suggest that when applying social bot detection techniques in the SSIO context, agent clusters need to be understood as the unit of analysis, rather than SSIOs in their entirety.This insight not only directs how existing social bot techniques can be used in the SSIO context (via ensembles of several differently tuned detectors), but also suggests how future SSIO research could capitalize on clusters as units of analysis to develop better understanding of how SSIOs operate.

Fig. 3 .
Fig.3.A heatmap view of the agent clusters for the SSIO conducted by Ecuador in September 2019.Thick horizontal grid lines separate behaviors that use different tools, while thin horizontal grid lines separate behaviors that focus on different actions.Cluster identity is shown via color of the bottom bar.Agent focus is shown by cell color: the darker the cell, the more focused the agent is on that activity.Looking for dark horizontal bars reveals separation of SSIO agent clusters based on specific activities and tool types.

Fig. 4 .
Fig.4.A representation of how we measure relative self-similarity.In Part A, shown on the left, we calculate the centroids (shown in bold) of the BOW-embedded behaviors of all SSIOs in our dataset (marked with •).We then do the same for the genuine crowd (marked with △).We then combine all SSIO distributions into one, representing the overall spread of SSIO similarity in our dataset.We then compare this to the genuine distribution using a one-sided Mann-Whitney U test.In Part B (right), we compare each SSIO distribution separately to the genuine crowd using a one-sided Mann-Whitney U (MWU) Test with Bonferroni correction.

10 Fig. 6 .
Fig.6.Histogram of the proportion of agents in each SSIO that have a longest common substring of at least 10.Even under this more relaxed criteria, the majority of operations still exhibit relatively low levels of behavioral homogeneity.

Fig. 7 .
Fig. 7. Histogram of the median Digital DNA length for all SSIOs except for the Venezuelan operation in June 2019 (excluded for visual clarity, as agents in this operation had a median Digital DNA length of over 12,000).The majority of operations in our dataset have a median agent Digital DNA length of under 200, which provides relatively small sequences to find common subsequences over.

Fig. 8 .
Fig.8.Histogram of the proportion of agents in the largest cluster of an SSIO (the proportion of agents exhibiting behavioral homogeneity).For a clear majority of SSIOs in our dataset, the cluster of behaviorally homogeneous agents make up less than 40% of agents participating in the operation.Only six operations exhibit greater behavioral homogeneity than this, with an outlier being the Russian operation in October 2020, with 80% behavioral homogeneity amongst its agents.

Table 1 .
Classification for a cluster of SSIO agents based on what proportion of their behavioral sequences are captured in the two most frequent activities ( 1 and  2 ). 1 +  2 ≤ 75% Diffuse Focus Over a Range of Activities 25% <  1 +  2 ≤ 50% No Focus on Any Activity  1 +  2 ≤ 25%

Table 2 .
The social bot detection techniques that we selected to represent the space of the most popular research in this domain.Comprised of both supervised and unsupervised techniques, this selection includes an equal balance of both classical and historically important techniques, as well as state of the art approaches published within the last two years.

Table 3 .
Coverage of each detection technique for the four broad levels of coordination we find in SSIOs.Each most covered level is highlighted in green.We can see clear separation in most covered level based on what wave of social bot research each technique belongs to.