Do We Run How We Say We Run? Formalization and Practice of Governance in OSS Communities

Open Source Software (OSS) communities often resist regulation typical of traditional organizations. Yet formal governance systems are being increasingly adopted among communities, particularly through non-profit mentor foundations. Our study looks at the Apache Software Foundation Incubator program and 208 projects it supports. We assemble a scalable, semantic pipeline to discover and analyze the governance behavior of projects from their mailing lists. We then investigate the reception of formal policies among communities, through their own governance priorities and internalization of the policies. Our findings indicate that while communities observe formal requirements and policies as extensively as they are defined, their day-to-day governance focus does not dwell on topics that see most formal policy-making. Moreover formalization, be it dedicating governance focus or adopting policy, has limited association with project sustenance.


Introduction
An exemplary instance of online peer production [3], Open Source Software (OSS) has emerged as a multi-billion dollar informal industry supporting major contemporary tech enterprises, academia, and scientific research and development.Over the past three decades, the increasing stakes of OSS have paved the way for several non-profit OSS foundations providing standardized project support and governance frameworks to hundreds of projects, notable among them, the Apache Software Foundation (ASF).These organizations serve OSS projects by providing mentoring, much-needed infrastructure (servers, centralized storage [90], etc.), legal aid around OSS licensing [57], and well-maintained technical support [22].OSS foundations like the ASF have brought OSS into the mainstream, attracting large numbers of contributors and financial support [50].
OSS projects have often benefited from some degree of overarching coordination and governance [40,66].Several of these foundations implement their own governance to manage projects and the developers they mentor.Written, well-laid-out formal policies steer and synchronize community operations, thus minimizing the costs of coordination and management [6,27].At the same time, communities have often observed their own informal rules and normative codes to structure activities, assign responsibilities, utilize project resources, and ensure sustained development [45,88,14,50,52,32,40,30]. Consequently, community governance within foundation-mentored projects is a product of the foundation's policies, the project's own specific practices, and any interactions between those two sources of institutional structure.Hence, even among OSS projects from the same foundation, their decisions, actions, and ensuing interactions may reflect varied degrees of involvement with the centralized governance, as they may prefer to manage their community in their own fashion.
Non-profit OSS foundations are steadily rising, with one survey finding 101 active organizations that host over 1,600 OSS projects as of 2018 [33,34].With mentored projects generally showing higher survival rates over independent communities [69,89], they are being increasingly viewed as a model to raise thriving projects producing usable, compliant software.Yet, OSS governance is not without its quirks and challenges [40,14,72].While foundations may bolster communities with resources and support, the implications of such formalization for OSS has recently drawn significant research interest.Indeed, there have been instances where formal governance has produced little impact or has actually limited community flexibility and autonomy [57,72,35].Hence, to assess the contribution of foundations towards OSS sustainability, we need to examine how they structure the mentored communities.We particularly look at how foundation policies are received in communities, reflected through operations as well as how they determine a project's governance focus.
The Apache Software Foundation Incubator (ASFI) was founded by the Apache Software Foundation (ASF) in 2002, in part to propagate Apache's approach to OSS governance, and has mentored over 300 projects ('podlings') since.Several non-profits require interested projects to undergo initiation through an incubation program to learn the ways and requirements of the foundation.ASFI also evaluates projects for performance and overall organizational fit throughout their incubation, before accepting ('graduating') them for continued support, or 'retiring' them from the foundation.Being cognizant of the importance of project self-governance, ASFI empowers every project [23] to oversee its own governance with Podling Project Management Committees (PPMC).These PPMCs act as the interface between project developers and the ASF.The foundation's commitment to project self-reliance raises a fundamental question: what is the relationship of each project's emergent governance structure to the formal policies representing governance across the foundation?
Our study focuses on community-level governance among mentored projects and how they relate to foundation-level policies.We leverage developer conversations from ASFI's public mailing lists.Compared to traditional approaches like surveys, interviews, or other forms of qualitative inference, retrieving behavioral measures from trace data is faster, convenient for replication across foundations, and less susceptible to reporting bias while offering more granular, realtime insight.We assess each project's governance efforts and resulting operational structuring through the routinized governed activities they perform.Next, we evaluate their policy internalization, i.e. the extent to which ASFI formal policies structure their community governance and frame their governed activities.We analyze how the extent of community governance efforts and policy internalization relate to ASFI's extent of regulation (number of rules) across different governance topics.Finally, we empirically investigate how community governance and its extent of formal policy internalization together explain its Incubator outcomes.Our contributions and findings are as follows: 1. We demonstrate a scalable approach, based on semi-supervised learning, to understand governance across peer-production communities, both its formal specification and lived instantiation.
2. A foundation-level analysis of ASFI projects shows that the extent of policy regulation -the number of rules structuring different governance topics -is not mirrored in practice through the extent of governed activity.Yet governed activity tends to be framed by policies in topics where they are extensively defined, as indicated through policy internalization among projects.Therefore, while communities show greater acknowledgment of formal policies in the topics where they are extensively laid out by the ASFI, such topics do not necessarily elicit more governance efforts from communities.
3. When it comes to sustaining the community and efficient development towards graduation, dedicating governance focus or internalizing policies from topics highly regulated/prioritized in formal policies had little association with the odds of success.All in all, formalized policies in OSS communities may not accurately reflect their underlying patterns of governance.

Related Work
Open Source governance includes all organizational structures and coordination mechanisms that regulate community interactions as well as product development.Prior work has extensively explored OSS community governance in terms of decision-making [30,89], assignment of tasks [14,50], managing developer roles and access [52,32], mentorship [1], code quality, review, and contribution [40,77], etc.
Community governance has been treated as an expansive, multi-level system of mutually interactive socio-technical networks [35,26].Meanwhile, Schweik et al. studied OSS projects at scale on SourceForge and found governance structures to be generally informal and lean, with increased sophistication and formal rules as communities grew [67], Similar findings are also echoed by O'Mahoney's work on the Debian Linux community's evolving governance [58].
Community-level analysis of Apache Incubator projects also found that more successful projects showed greater adoption and use of definitive rules and norms [89].Heckmann et al's investigation of decision-making processes further found that in well-performing projects developers and users participated more proactively in steering the course of the project [30].
Leadership is a crucial aspect of OSS governance, where developers with greater technical initiative, development prowess, and effective communication strategies generally emerge to fill administrative roles [31].Analysis of decision episodes in communities found administrators to be critical drivers during the initial phases of a project [30].Meanwhile, Atkisson specifically examined individual mentors of the Apache Incubator and found a significant correlation between who managed a project and its odds of graduation [2].Investigation of communities on SourceForge found that while a sizeable fraction (around 15-20%) of successful projects comprised a stable community with dedicated users, the rest showed rapid growth and were often led by a 'benevolent dictator' [68,69].
Prior work has explored the challenges of OSS moderation.Attempts towards greater inclusiveness by enforcing community codes of conduct (CoC's) have often received limited engagement or been perceived as distractions from core development priorities [42].Several studies have focused on interactions within foundation-led communities.A qualitative cost-benefit analysis of Apache Incubator policies found that the implementation efforts and payoffs are evenly balanced between projects and the ASF [71].The implications of congruence/dissonance become particularly salient when it concerns software licensing.The rigor of the licensing requirements, including ASF's rights over individual contributions, has often seen varied reception and interpretation among OSS developers [57].Sun's introduction of changes in the Netbeans licensing scheme threatened the collapse of the very project [35].Stringent terms set by corporations supporting gated OSS communities often turned away sincere contributors or restricted usage of the product, thus hindering developer engagement and community health [72].
While prior work has either focused on foundations or community dynamics, a limited number have empirically treated their mutual interactions unraveling in real-time [56,89].Moreover, they have generally focused on a particular aspect of governance, such as licensing, through case studies of a select number of projects.We attempt to capture the multifacetedness of OSS governance (including but not limited to licensing, trademarks, documentation, committees, voting, etc.) and study hundreds of mentored projects.Motivated by collective action theory and behavior in communities of practice, we proceed to investigate the governance behavior of OSS communities around formalization.

Institutional Theory
OSS communities, generally comprising transient volunteer developers centered around a core of long-term contributors, organize in a decentralized fashion to create software for open use and distribution.This phenomenon has been framed in terms of the peer production of public goods, making OSS communities an increasingly important locus of online collective action research [3].
Institutions are defined as ". . .prescriptions that humans use to organize all forms of repetitive and structured interactions . . ." [55].For a collectively maintained resource such as an OSS community, governance includes all formal and informal rules for management and production, along with the mechanisms for such policy design, reform, and implementation.[69,46].
OSS governance lies on the spectrum between purely self-interest-driven spontaneous governance ("the invisible hand") and intentional governance [16].Polycentric governance refers to a condition where there are overlapping interests between multiple centers of authority [46,47,36].This often implies varying degrees of interdependence and autonomy among concurrent governments.For example, while ASFI encourages projects to admit consistent contributors, the specific process of admission is left to each project community itself [71,90].The dynamic nature of organizational fit is especially evident in decentralized [16,83], ideology-rich environments like OSS projects [75], notably as resource abundance varies [70].It is our goal in this paper to study the extent of internalization of ASFI's governance in regular project operations, along with the different themes of its rules and policies, across graduated and retired projects.

Communities of Practice and Organizational Learning
OSS projects are essentially online communities of practice [39,5,54], where coordinated operations are studied in terms of routines.Routines stem from beliefs, cognitive scripts, habitual conventions as well as evolving norms as they translate into 'repeated patterns of actions' across appropriate settings [10,41].These include management, standard operating procedures e.g.workflows, or experiential strategies encoded into everyday activities and associated interactions [41,15,86].Community routines may not be only technical, and may also emerge to coordinate developers through informal norms and social control [45,52,32].For example, developers use their particular routines for managing and deploying builds, incorporating patches, testing, prioritizing issues, et cetera.Similarly, communities also perform a sequence of routines when it comes to more formal events like setting up committees, organizing conferences, and ratifying releases.
Routines are generally stable [9], until changes in organization, technology, development goals, or other events cause them to evolve [20,60,21].OSS projects are dynamic and decentralized with fluid membership [82] and may thus be inventive and flexible in their norms [49,28].Consider the following email from Apache Netbeans dated 9/13/2017.ASFI does not cover code management.Yet their projects themselves usually chose between two approaches: reviewthen-commit (RTC) and commit-then-review (CTR).The example shows deliberation among Netbeans developers on their appropriateness and scope: different asf projects have different policies.the important part is that we should have a common understanding about our commit policy.there might e.g.be a branch for the next release where rtc (review then commit) is applied.that's useful when preparing a release or for maintenance releases we still actively maintain.and beside that we might have a 'future' branch (e.g. on master) or multiple feature branches where ctr (commit then review) is standard.most asf projects have the whole repo on ctr...
Incubator policies are set up through pragmatic planning.The observed influence of the foundation's policies on a mentored project's routine operations indicates how its governance has been internalized in the community.The more community members discuss and describe activity in a way that resembles the framed policy, the more we can argue that members have internalized the formal description.At the same time, through learning and discovery [43], communities may also prefer procedures and protocols when Incubator policies are deemed less effective, inadequately defined [49,83] or fall short of their needs.This further motivates us to understand the impact of foundations on projects through policy internalization across sustained community practices.

Research Questions
Formal rules and policies are critical in shaping the basic structure and guiding activity in an organization [60,28].Foundation Incubators implement systematic policies to coordinate and promote community engagement and productivity.These establish baseline standards and rules for participation across all the diverse member projects, may define certain roles and offices for leadership, assign responsibilities, as well as lay out the scope of various activities.At the same time, routines also reflect the project community's own implicit governance, i.e. informal beliefs, norms, codes of conduct, and other practices.Therefore, polycentric governance in foundation projects stems from governance among individual communities (Project Management committees (PMCs), as well as all other informal rules and developer norms) alongside the ASFI itself.Situated in the backdrop of OSS-foundation polycentricity, this section presents our research questions which look at community governance and policy internalization across the different aspects of ASFI governance.
The formalization of governance in traditionally volunteer-driven communities has been a contentious theme.OSS pioneer Eric Raymond observed that the "number of hoops" or too many formalized procedures and rules may drive away potential skilled contributors [63,67].Extensive regulation may introduce additional requirements and necessitate the enactment of institutional obligations.Therefore, communities may be expected to show more governed activity in domains that are heavily policied, given their presumed importance in the ASFI ecosystem.As a result, we may expect a positive relation between the number of policies and the frequency of observed routine activities in a particular area of governance.
While there are concerns about redundant routines and overheads, lack of regulation may cause individuals/communities to draw upon larger social and cultural constructs for predictability.Such "tyranny of structurelessness" may perpetuate broader social inequalities [25].The idea of "green tape" encapsulates the potential of policy to provide clarity and certainty, focus organizational attention, and convey legitimacy [17].Implications may also extend to OSS formalization, whereby extensive yet well-designed policies may streamline rather than divert developer efforts.However, in domains where regulatory clarity is limited, greater project activity may become necessary to sustain development.
RQ1 explores how the extent of policy-making relates to the governance priorities and operations among mentored projects.We identify governance concerns/topics actively shared between the ASFI and its projects, through policy documents and extensive mailing lists across 208 communities.Since structuration from the mutual interaction of foundation policies and community governance determines the routine behavior of projects, we aggregate all similar activities from email conversations and examine their correlation with the topical distribution of ASFI policies.
RQ1: How does Incubator regulation relate to community-level governed activities across different governance topics?
Institutions manifest through the practice of routines formalized by such established rules [38].As mentored projects increasingly internalize foundation policies, their operations are expected to be generally constrained and enacted through routines prescribed by such rules.Yet, community governance also requires the dynamic selection and adaptation of various other routines (Section 3.2 ).Therefore, we may expect variation in the influence of ASFI policy on governed activity, along the different governance concerns.
Well-designed rules seek to reduce uncertainty and can act as formulaic precedents to replicate success across mentored projects [49], or at least help standardize the provision for Incubator resources.Therefore, extensive regulation in a certain area of policy-making (i.e., more rules outlining a wide range of organizational possibilities), may induce greater adoption if it facilitates project functioning and improves efficiency.
On the other hand, activities and related exchanges in a topic may deliberate policy to only an extent, while their actual operations may reflect a marked departure from formal structure [83,19].This may be especially true when certain institutional obligations are ceremonial or necessary to maintain affiliation with the ASFI but are less relevant in day-to-day development.If such is the case, the observable policy internalization among communities across different governance topics may not be correlated to the extent of policy overseeing the topic.
We might expect alignment between the amount of formal policy on a topic and how resulting policy prescriptions are internalized in practice.Organizations engage in many functions, some of which are more critical than others.More important functions may be marked by a greater amount of policy formalizing behavior and may elicit greater internalization, toward more compliant execution.On the other hand, if policy extent is driven more by the complexity than the criticality of a governance subject, then that complexity may paradoxically predict a greater quantity of policy, for its various cases, and also less internalization, as practitioners take license from that very complexity to exercise greater discretion in how they execute.
RQ2 explores how the extent of policy-making relates to the formal policy internalization among projects.For all topical governed activities we measure policy internalization in terms of how discourse about those activities in general semantically reflects the policies formalizing those activities.Finally, we examine how such internalization varies with the extent of regulation across topics.
RQ2: How do the levels of policy internalization in governed activities relate to ASFI policy extent across different topics?
For an Incubator program to realize its goals, it is important to assess the association between its governance and project outcomes.At the same time, it becomes equally important for aspiring communities to understand behavior associated with communities that succeed in Incubator programs, particularly the extent of community governance as well the impact of foundation governance on such operations.
ASFI lays down three primary criteria to determine if a project has potential and is capable of sustaining development: 1) there is community activity evidenced by at least two releases, 2) the releases are compliant with the Apache license, and 3) the committers of a project are drawn from at least three entities (companies, research groups, etc.) [24].The remainder of the policies serve to help the project achieve those goals.
While RQ1 and RQ2 measure if there is a relationship between formal policy and community governance, RQ3 uses an externally valid measure of project outcomes to determine whether there should be a relationship i.e. whether communities align governance focus or internalize policies in topics with more formal rules, in order to successfully realize their objectives.In particular, it examines if community governance efforts or the adoption of policies around formalization correlates to their graduation odds in the ASFI.
We pursue RQ3 through a project-level regression of all governed activities (frequency of structured, routine operations) among individual projects alongside the policy internalization among such operations (semantic similarity of governed activities to policies) against a binary measure of project success (graduation/retirement from the Incubator).
RQ3: How do governed activities and the extent of policy internalization relate to the success of projects?
5 Data and Methods

Governance Measures:
We pursue two discursive measures of community governance from developer conversations in mailing lists, namely all governed activity and their internalization of Incubator policies.Traditionally public and open access, OSS mailing lists are key to collaboration as they promote transparent peer review [40] and solicit reciprocal contributions [57].Unlike issue tracking and version control logs, these also contain exchanges beyond technical development, such as product planning, community management, ratification of major decisions, licensing, etc. Further, due to explicit ASF policies, all project activity are comprehensively archived across public mailing lists ("If it didn't happen on the mailing list, it didn't happen" [90]).
Prior work has extensively used organizational communications for understanding participant behavior and performance, including 0SS [35,89,31].Li et al. used a grounded theoretic approach to understand the adoption and reception of community codes of conduct from developer exchanges.Affective features in developer messages have been used to predict leadership qualities among OSS developers [31], while Srivastava et al. studied enculturation and employee exit, where they treated individual's linguistic divergence as a measure of cultural fit [74].
We described in Sec.3.2 how routines reflect all prevailing governing norms among projects.We first identify the different governance concerns shared between projects and the Incubator by means of topic modeling of policies and conversations, and represent the following two measures by project and governance topic: Governed Activity: The total number of recurring or routine activities about a governance topic, as discussed in a project's mailing list.Higher presence of governed activity indicates greater governance efforts to structure and routinize community operations.For example, if a community establishes a norm for ratifying releases, future releases will likely follow the established schema.In ASFI projects, such governance is a culmination of the foundation's policies as well as the underlying codes and norms of the community developers.Recurring activities are aggregated over their textual similarity.
Policy Internalization: This measure represents the extent to which governed activities are structured by ASFI policies.Therefore, higher policy internalization in governed activities indicates greater integration of the foundation into the community's governance.
Methods explored to operationalize internalization included evaluating direct compliance/entailment between an observed activity and policies.Such binary measurements were found to be insufficient to account for the drift between formally articulated statements (framed policies) and informal, practical discourse (conversations) or importantly, reflect graded changes along the rates of institutional diffusion [76].For example, observations from initial developer discussions over a release vote, an ASF-specific requirement, to an actual voting event are important to understand the gradual internalization of governing institutions.
For a topical governed activity in a project, we measure policy internalization through its semantic similarity against policies within the respective topic.Semantic similarity is an assessment of meaningful and conceptual relationships between texts [37].Measured on a continuous [0,1] scale, semantic similarity rates text pairs higher (lower) values for agreement (contradiction) [7,85].Moreover, semantic similarity can be used to quantify activities that are neutral but indicate institutional diffusion, through their degree of resemblance in how they invoke roles, designated responsibilities, and requirements outlined by a policy.
Policy Extent: A foundation-level variable indicating the extent of ASFI's regulation across topics.It is represented as the frequency (count) of formal rules overseeing each governance topic, with higher values (number of rules) in a topic indicating greater ASFI regulation.

Project membership and activity
: Projects in ASFI are diverse, and their governance and Incubator outcome may also be subject to community structure, activity levels, etc.Since we are interested in analyzing how governance behavior correlates to project sustainability, our analysis has to simultaneously control for project attributes, such as community size and development intensity.We incorporate four suitable covariates in our analysis through community size (committers), number of commits, code base size (lines of code; LOC), and finally the frequency of interaction among the project developers (developer emails) over project mailing lists.

Datasets
We center our analysis of ASFI governance through a set of 234 comprehensive policies which were coded across the key ASFI documents and guidelines [71].These span multiple sources such as the official Apache Incubator policy manual, the community guide, the Podling Project Management Committee (PPMC) guide, the Apache cookbook, the mentorship guide, the graduation and retirement guides, and finally the release management guide.
In the ASFI, project incubation lasts up to several months followed by an assessment and a formal vote to decide on graduation into ASF for continued support or retirement.Yin et al. scraped all mailing lists across 269 Apache projects from when they joined the Incubator and up to their last day in the ASFI [90].Since we solely focus on norms and activities within communities, we only retain the 'dev' (community developers) subdirectory emails across all projects.We exclude redundant content such as auto-generated emails, for issues posted and resolved, and other development-related notifications (JIRA, Github) through source address-based filtering.Periodic emails were also circulated by the Incubator Project Management Committees (IPMC) or project mentors, which were formal, administrative, and generally concerned progress reporting.All such emails have a fixed format and were identified and filtered through string matching.This mitigates potential bias in measurements from to superfluous policy content from the administration, as our subsequent analysis concerns governance-related behavior within and among community developers only.
For project-level covariates, we obtain commits, lines of code, and the number of active contributors.ASFI projects use GitHub, Subversion, or a combination of both for maintaining their codebase.

Extracting activities
Routines have been studied at multiple levels, from the most nuclear activities to complete processes.The most fundamental unit, the performance program [61,44] is defined as a 'chunk' of scripted activity, generally a routine in itself or part of a larger process.To capture organizational routines from ASF email discourse, email texts and policies were first tokenized into sentences through StanfordNLP's Stanza library [62].We next turn our attention to extracting different activities from within these sentences.This serves several purposes.Firstly, most existing language models, including ones subsequently used, encounter complexity overheads and truncate long sentence inputs beyond a token length.Secondly, sentences can be compound, conveying multiple activities with their specific context, and possibly spanning different topics (Table 2).Therefore decomposing sentences into granular units of analysis like performance programs allows depth and insight in subsequent analysis.
We decompose sentences while preserving their context.Context is important in understanding different routines and their place in the development ecosystem (E.g.'Projects requesting Apache infrastructure' vs. 'Project Management Committee requesting progress report' or 'Projects issuing press release' vs. 'Resolve issues that are release blockers').
To attain fine-grained extraction of different activities and their context nested within sentences, we use semantic role labeling.
Semantic role labeling or SRL [37] is an NLP task that extracts roles (actors, direct or indirect objects, etc.) associated with an action (verb) along with other modifiers from a sentence.Additionally, SRL also extracts constituents with contextual information such as the time of act, manner, direction, goal, purpose, cause, etc [4].
Original Policy: 'After a vote has finished, the ipmc must send a notice email to the board and then wait for 72 hours before inviting the proposed member' Original Sentence: '( 1 ) I'll be away from my computer starting Friday and through the New Year, so I won't be able to do much to help if folks want to release 2.1 during that time ( not even testing ).' (Apache Roller, 12/21/2005) After SRL and reconstitution: 'I'll be away from my computer starting Friday and through the New Year' (Schedules/Events) I won't be able to do much to help if folks want to release 2.1 during that time ( not even testing )' (Release Management) Table 2: Capturing granularity: Sentences spanning multiple, thematically distinct operations.In this example, a developer shares their vacation timeline to the community in general, while also discussing implications for a tentative release.Topics indicated for each activity are inferred as described in Section 5. 3.3 We chose a BERT [18] based implementation of SRL [73] developed by AllenNLP on the Propbank annotation scheme.The model holds a state-of-the-art performance on the English Propbank (Newswire) as well as a test F1 score of 0.864 on the Ontonotes 5.0 dataset.We identify all possible semantic roles associated with each distinct verb from compound sentences.These SRL frames were reconstituted into distinct activities, by reordering the semantic roles and all other contextual arguments for each verb, along with their relative positions from the original sentence.The 723,863 developer emails in our data generated 2,248,950 expressions of activities.
In governance research, rules are specified in terms of grammatical constituents representing the governing (committees, boards, etc.), the governed (e.g.committers), the activities they undertake, and the conditions they entail (e.g.voting before a release) [13].Our policy reference data [71] comprised descriptive policies spanning multiple nested rules (Table.1).Therefore, SRL-based preprocessing was also extended to the policy documents, whereby the 234 policy descriptions from Sen et al. were parsed into 422 individual rules.
Finally, we conduct an additional pre-processing step.Developers often use mailing lists for technical discussions and clarifications.As a result, they often contain stack traces, logs, etc. which may be parsed as regular activities.We restrict our analysis to human-readable, standard English-language data, which can be compared and interpreted against governance policies such as those of ASFI.We detect and retain only English texts using a HuggingFace XLM-Roberta-base model [11] trained for language identification.This reduced the number of extracted activities to 2,029,691.

Governed Activities: Aggregating routines
As described in (Section 3.2), routines are activities carried out time and again, under specific circumstances [10].Unlike well-documented formal policies, routines are more dynamic and span activities dictated by emerging norms and operational priorities.Hence, it is extremely challenging to comprehensively codify activities in a community and train models that can discriminate routine behavior from non-routine ones.
Importantly, we are interested in a pipeline that supports governance analysis across diverse online communities.Since routines are influenced by technological trends, the nature of the product, the specific community, utilities involved, etc., there may arise inaccuracies from data when extending a supervised model specifically built on ASFI data, to other communities and foundations.Based on theoretical definitions of our construct of interest (i.e.governed activities are routine or 'recurring' operations), we leverage alternative learning methods compatible with our goals.We hereby describe our approach to discovering routines as similar activities in email data, through semi-supervised clustering.
We find similar ('recurring') activities through semantic similarity-based aggregation [64].Popular approaches to semantic representations include word level [51,59], sentence level [12,8,84], and more recently language model-based approaches which allow for more advanced representation learning for different semantic tasks.
The biencoder architecture was developed for computationally efficient semantic encoding of texts [64].They involve training a Siamese network of two identical, transformers to generate contextual encodings for two distinct text inputs.The averaged output from each transformer is then subjected to a cosine similarity loss objective function.By the end of the joint fine-tuning, both the transformers are capable of independently generating semantic embeddings for any given text input.Huggingface [87] hosts multiple domain-specific biencoders.We use a general-purpose bi-encoder pre-trained on the domain-relevant corpus from Stack Overflow, a question-answer platform specially used by developers.All transformer-based experiments henceforth were conducted through a single Tesla T4 GPU.
Next, for aggregating encoded texts, we use BERTopic [29].It supports hierarchical density-based clustering or HDBSCAN [48] for most Hugginface binecoders, followed by topic modeling of the inferred clusters.To train the clustering model, we uniformly sample 100,000 activities out of all the 2,029,691 activities previously extracted.Modeling activities across projects together allows for identifying and grouping them under a set of shared governance topics.
To cluster community activities intersecting with ASFI concerns, the 422 rules from ASF policies are passed as initial seeds to BERTopic.For best clustering results, we conducted hyperparameter tuning for BERTopic's HDBSCAN through Density-based clustering validity or DBCV measures [53].DBCV scores rate density-based models from -1 to +1, with higher values indicating better clustering quality.To find hyperparameters returning maximum DBCV, we tune over the following HDBSCAN arguments: minimum cluster size and minimum samples.Higher values of cluster size threshold might lead to the merging of clusters, while sample size promotes dense clustering and more outliers.Both parameters were varied in combinations from 10 (0.0001% of sample size) to 100 (0.001% of sample size).Prior to clustering, BERTopic also uses Uniform Manifold Approximation and Projection or UMAP for dimension reduction of embeddings.The number of neighbors parameter in UMAP decides the trade-off between preserving the global and local structure and was also varied between 10 and 100.We retain the model with the best relative DBCV score.

Topic modeling of governed activities
BERTopic finally conducts TF-IDF across the dense clusters of governed activities to assign them topics.Words from the rules were used to suitably reweigh inverse document frequency of words.Topic coherence metrics [65] supported by Gensim [91] evaluate topic modeling performance on a scale of 0 to 1.Our final model shows a topic coherence C v of 0.683, indicating strong topic correlation.
Policy documents often contain canonical descriptions of norms and processes, that are often dated and removed from practical operations [5,54].After clustering, 106 out of the 422 policy rules were disregarded as outliers, due to negligible mention over emails.A total of 42 distinct topics were identified between ASFI policies and email activities, and 211 topic clusters were discovered among all activities.Around 493,008 activities were found to belong under these 42 governance topics from ASF. Final topic label assignments were deduced based on the assigned policies and top keywords from each topic, and overall domain knowledge of ASFI.

Measuring institutional internalization
For governed activities under any ASFI governance topic, we measure the extent to which they reflect ASFI policies overseeing the same topic.Cross-encoders or poly-encoders [64] are a standard language model for semantic comparison.They treat sentences or text to be compared as simultaneous inputs and attend them jointly for semantic scoring.Biencoders and cross-encoders are often used together for information retrieval and text ranking.While biencoders can encode individual sentences to support high-level clustering over large sets of text, cross encoders are suitable for more precise, pairwise comparison between smaller sets of texts [80].
We use a Distil-RoBERTa base cross-encoder from Huggingface which rates text pairs on a continuous scale of 0 to 1, with higher scores indicating greater similarity The model demonstrated a Spearman rank correlation of 0.87 with respect to the human-annotated scores from the STS text similarity benchmark [7].Using this cross-encoder, we compare every governed activity against all the rules assigned to the same governance topic to find the ones it resembles most closely.The mutual semantic similarity score of the governed activity and the closest policy is used to represent the activity's extent of ASFI policy internalization.Consequently, we obtain internalization scores for all the 493,008 governed activities under each of the 42 governance topics.

Analysis
RQ1 and RQ2 pursue an ASFI-level exploratory analysis of our governance measures along the policy extent.RQ1 compares the proportions of ASFI rules (level of regulation) and project-level governed activity across the topics, while RQ2 follows up by assessing the distribution of ASF policy internalization in activities.
Finally, for RQ3, we examine governance behavior among projects, against their graduation or retirement from incubation.We fit a generalized logistic regression (GLM) binomial model of project-level measurements of governance as well as the covariates, against their respective incubation outcome.We conduct our analysis through the GLM suite (regression, multicollinearity check, and validation of assumptions) supported by the statsmodel package in Python.LASSO-based variable selection is conducted prior to regression and inference, for which we use the group-lasso Python package.We set the significance level of our analysis at the standard p < 0.05.

RQ1: How does Incubator regulation relate to community-level governed activities across different governance topics?
As described in (Section 3.1), we focus our analysis on governance topics shared between the ASFI and its mentored projects.We visualize ASF's policy extent against the distribution of governed activity along topics (Figure 2).A Pearson correlation test between the distributions was found to be 0.23 (p = 0.13), indicating that how communities perform governed activities across topics is uncorrelated with the amount of policy structuring those topics.

RQ2: How do the levels of policy internalization in governed activities relate to ASFI policy extent across different topics?
To explore RQ2, we additionally examine the distribution of internalization scores of governed activities conditioned on governance topics (Figure 3).Higher mean internalization scores indicate that in a particular topic, the projects' practiced routines are more framed by formalized Incubator policy.We observe a trend of generally greater internalization with increasing policy extent: a Pearson correlation test between the topic-wise policy extent and mean internalization scores was found to be 0.744 (p < 0.001).In other words, areas of governance that receive more attention in formal policy also tend to be enacted by participants in a way close to the policy descriptions.

RQ3: How do governed activities and extent of policy internalization relate to the success of projects?
ASFI strives to build meritocratic communities and assesses projects' performance throughout the incubation time frame.As membership and activity levels undergo constant changes in OSS, we average the monthly measures of active committers, developer emails, and commit activity to capture their sustained levels.The code base variable was represented as the net size of the project repository in terms of overall lines of code (LOC) written by the project while in ASFI.Prior work on ASFI has shown that successful projects tend to graduate early [89], so we incorporate the total number of months spent by the project in the Incubator as one of the covariates.To similarly adapt our governance measures, we represent governed activity through the total number of routine activities observed in a project during  incubation, across the mailing list.The overall policy internalization along a governance topic every project was similarly evaluated, by averaging the scores across all the governed activities.The resulting number of predictors was 89, including five covariates and the two distinct governance measures across each of the 42 topics.Six projects were dropped as their commit history was unmeasurable through our metrics tool, leading to 208 observations.Certain project mailing lists did not reflect governed activity under some of the topics, making the governed activity of that topic equal to 0. There are 54 projects with 0 observed governed activity in at least one topic.Rather than dropping those observations entirely, we retained them in a way that minimizes information added to the system through the imputation procedure but allows us to retain the information in the non-missing variables: unmeasured internalization scores were filled through iterative round-robin imputing supported by the Python package Sklearn.This method of imputation, a pythonic implementation of MICE [81] is unbiased relative to other choices we could have made, such as assigning 0.1 Project-level covariates (committers, emails, codebase, and commit activity), as well as governed activity for every topic, were log-scaled to address skew as well as to facilitate comparison along the scale of different projects.Subsequently,  all variables were standardized through z-score standardization.We then addressed multi-collinearity by removing all variables with Variance Inflation Factors > 5. We then performed a logistic LASSO-based variable selection over 5-fold cross-validation and hyperparameter tuning over the log loss.After multicollinearity tests and variable selection, we have a reduced set of 11 significant predictors.
We construct nested linear regressions, whereby we fit four models to assess the contribution from different groups of variables (Table .3).These are the "baseline" model with only covariates as predictors (M1), a second model adding topical governed activity variables to the baseline (M2), a third model adding only policy internalization variables to the baseline (M3), and the final full model including all three groups of variables: baseline covariates, governance activity, and policy internalization measures (M4).For every model, we additionally checked for outlier influence using Cook's distance and found no data points with extreme leverage (D > 1).The assumptions of log odds linearity were validated using the Box-Tidwell test, whereby no interaction terms x * log(x) were found significant.We observe that the predictive efficiency and fit of the models improve with step-wise addition of governance variables, a reassuring sign of valid model construction across the three types of variables.The full variable model M4 was found to be the most parsimonious (∆AIC = 23.05 with second-best model) with goodness of fit at 0.648 (Tjur's psuedo-R 2 ).Further, it showed a weighted F1 score and accuracy of 93.6% and 93.7% respectively.We hereby report our findings based on M4.Factors that correlate positively with a project's chance of graduating include greater internalization of policies related to "Project configuration", "Graduation requirements/Maturity Model", and "Voting protocol/Timeline."Moreover, projects that govern patch-handling activities i.e. more governed activity in "Patches", are associated with higher graduation odds.On the other hand, factors that correlate negatively with successful graduation include high internalization of "Project Wiki" and a higher volume of governed activity on Incubator reporting.
We observe that neither governed activity around nor internalization of the five most highly regulated topics (those on committees, licensing, email communications, and releases) predicts project success.In fact, project success seems to be correlated mostly with the internalization of policies that receive little attention in formal policy.This further complements our overall finding that projects do not run how they say they run, to suggest that, formal policies may not present the full picture of how communities govern to sustain themselves.
Our primary analysis is correlational and not causal.This is important to emphasize because our findings for the "Graduation Requirements" topics are probably a spurious but encouraging validity check: it is likely that the act of a project graduating and conducting necessary protocols explains the positive effect of internalization of graduation policies.Similarly, "Project Wiki" is composed of a policy that is only activated once the Incubator has voted to retire a project.The most likely explanation for its negative effect is that project retirement is causing policy enactment, not the other way around.
To check the robustness and probe some unidirectional interpretations, we perform a post-hoc analysis where we repeat all experiments with a modified policy dataset that excludes these confounding end-of-incubation-related policies that happen after a determination of graduation or retirement has been made.We focus this robustness analysis exclusively on policies that are relevant to the active incubation and growth phase of ASFI projects.Therefore, we removed 34 out of the 234 policies that are generally applicable for projects post-graduation/retirement or only at the terminal stage of incubation (graduation vote, transferring trademarks, or ceremonial protocols of graduation/retirement, etc.).For RQ1 and RQ2, we once again retain the previously observed trend, or lack thereof, between policy extent, governed activity, and internalization.For RQ3, we retain significant effects from three out of the six variables that stood out in our original analysis.These include "Patches" (governed activity), "Incubator Reporting" (governed activity), and "Voting Protocols/Timeline" ("Internalization).As expected, we no longer observe the significant effect associated with 'Graduation requirements' which comprised several policies (now removed) closely related to the graduation event, while 'Project Wiki' which treated post-retirement project wrap-up, was not among the topics inferred from the reduced set of policies.Lastly, the topic 'Project Configuration' does not exert a significant influence on project outcomes.Details are provided in the Appendix.A.

Findings
We find substantial differences between the policy-making attention of the ASFI and community governance across topics.Results from RQ1 (Figure 2) show that overall, policy extent has no significant correlation with the frequency of governed activities observed across topics.Yet through RQ2 (Figure 3), we observe that topics with higher policy extent see greater policy internalization.Therefore, while project governance efforts do not mirror the distribution of policy across governance topics, the internalization of policies is highly correlated with how much formal policy governs that topic.
In RQ3 where we test our governance constructs against project outcomes, we find that neither governed activity around nor policy internalization along the most highly regulated subjects predicts project outcomes.Also, most of the topics correlated with project success are relatively lightly regulated.
Domain knowledge of the ASF Incubator can help us further contextualize the results from RQ3 (Table 3).Rules from the 'Project configuration' topic oversee the steps and requirements for setting up ASF infrastructure.Higher internalization associated with more successful projects likely indicates that the development team is more experienced in navigating and utilizing ASF's resources.
Democratic communities and consensus building are encoded in ASF's functioning ('The Apache Way') and are a hallmark of the OSS movement generally.ASF requires project-level voting for approving releases, appointing members to the project PMC, admitting committers, etc. Observance of ASF's standard voting procedures likely indicates shared understanding and streamlined decision-making.Projects that have high internalization with ASF's policies regarding "Voting protocol/Timeline" are successfully hosting and running those votes according to ASF's policies, and mobilizing community participation.
We find a large negative relationship between the frequency of activities around "Incubator reporting" and the likelihood of graduation.We further investigate and find that projects generally discuss and work on reports only when they are due, except when they 1.miss a deadline and are assigned a new report date, 2. need to keep working to resolve issues in a submitted report, 3. are struggling and asked to report more often.
Projects often lag in reporting when their development stalls and the community is struggling.In such a situation, the ASFI intervenes actively and necessitates more efforts to motivate the projects to meet standards and resume compliance with Incubator requirements.Therefore the effect is likely associated with struggling projects and how the Incubator interacts with them.If this interpretation holds, the mechanism for our correlative findings is that an outside factor ("struggling project") is driving more reporting and reduced graduation chances.

Discussion
Our goal was to investigate the relationship between formal policies overseeing OSS communities and their actual self-organizing tendencies.OSS-supporting foundations create policies to encode their concerns and priorities.ASFI introduces formal hierarchies through various offices and committees to organize traditionally free-form OSS communities.They also include requirements to ensure standards of development and conduct among projects.
Governed activities or routine operations indicate the extent of community governance.Structured activities along a governance topic indicate how developers coordinate and conduct the bulk of their activities from the underlying beliefs and current needs.Therefore, more governed activities are expected as a community seeks to structure and routinize more of its operations.
As communities undergo formalization, their governance may be expected to reflect their overarching policy focus.The conventional perception of OSS formalization anticipates more institutional formalities and obligations (Section.3.2).This may be observed as increasing community attention on domains on which ASFI sets more rules, and ensuing routine activity from such structuring.RQ1 tests whether the attention of community governance aligns with that of formal policies across shared governance domains.
While governed activities reflect the extent of community governance across topics, we are also interested in how communities align formal rules and actual governance behaviors.In their efforts to structure activities, projects may choose formal policies, implement their own norms or a combination of both (Section.3.2).RQ2 further examines if the extent of formal regulation is related to how community governance integrates them, as observable through the policy internalization of governed activities.
Our results from RQ1(Figure 2) indicate that the extent of ASF's regulation does not, in general, seem to proportionally increase the intensity of "on-the-ground" governed operations.At the same time, our findings from RQ2 (Figure 3) suggest that through extensive policy-making along specific concerns, ASFI succeeds in using policy to orient community governance, which shows up through policy internalization in governed activity along domains with more extensively defined policies.
We reconcile the implications of the two approaches to understanding formalization.RQ1 dwells on convergence/divergence in ASFI/community effective governance efforts, i.e. formulating, establishing, and implementing rules and norms to structure activities.Meanwhile, RQ2 examines to what extent community governance incorporates ASFI's formal policies: literally how much communities internalize formal policy's framing of a governance issue.The positive correlation between internalization and policy extent likely indicates that certain governance topics that are extensively codified considerably structure governed activity.Yet results from RQ1 indicate that highly formalized governance topics elicit relatively less or no more governance effort from communities as compared to those where fewer formal rules exist.In fact, in several crucial topics with limited regulation, projects exercise substantial governance efforts to sustain The takeaway is that the effect of more formalization in policy seems to be reflected less in the volume of governance activity it spurs, and more in how closely that activity hews with prescribed standards.
The ASFI's policy coverage is largely administrative, and it outlines appropriate protocols for governance concerns it deems important.Consequently, when projects engage in highly regulated domains, they respect and internalize such specifications.Therefore, while the focus of policy-making may not be reflected in the regular governance concerns of developers, policies still act as a layer of fundamental governance that is seamlessly integrated into communities.Simply put, developers respect policies that are evidently important and extensively specified, but they are also faced with other concerns beyond those where ASF largely institutes policies.
The ASFI's policies show relatively less attention to the technical aspects that constitute communities' main governance activities (issues/patches, artifacts, etc.), suggesting that the foundation defers to the discretion and objectives of developers on these subjects.The generally lower policy internalization along core development concerns may be also explained by the fact that technical regulations in ASF are few and rather basic guidelines and expectations than specific conditions.We hence see considerable governed activity along some of these ('issues'/'patches'/'builds'), reflecting efforts to coordinate fluid communities, channel their contributions, adapt to emerging technology, and meet release targets.
RQ3 examines the association of self-governance and internalization of foundation policies, with the objective success of projects (Table 3).It is based on the implicit assumption that projects will perform governance and adopt policies in a manner that helps them attain their objective, which is to graduate from the Incubator.The Incubator assesses projects based on the diversification of the community, the capability to produce compliant software and consistent releases.Interestingly governance behavior around the more highly regulated governance topics does not stand out as significant discriminants between graduated and retired projects.
Foundation policies may play a role in furthering development, facilitating coordination, and consensus among communities, as analyses showed positive associations between internalization of voting and infrastructure use protocols and odds of graduation.We also find some evidence that community initiative in less regulated governance areas supports project sustainability.Projects that coordinate submission and incorporation of patches more often are both building their community and improving their product, making them more likely to graduate.Such projects were likely able to step up to the limited explicit technical governance to institute their own routines to sustain development.
We have one significant finding around a highly regulated topic: Incubator reporting.We found a negative association between levels of governed activity around Incubator reporting and the odds of graduation.Reporting to the Apache Incubator is intended to motivate project performance as well as track their progress [90].Therefore, it is interesting that more formalization is associated with a reduced likelihood of graduation for a highly regulated topic.We further explain that this effect from Incubator reporting likely does not imply a straightforward causal relation between formalization and success.It also presents a delicate situation for already struggling projects as they are necessitated to focus their governance more towards the priorities of formal policy.This has sometimes proven to be especially burdensome for small projects.Apache Gossip is such an example, where the small community struggled with the overhead of implementing the regular reporting protocols set by the ASFI and was eventually retired.
All in all, communities are bound by foundation requirements, especially in domains that elicit a greater volume of formalization.At the same time, their actual governance concentrates on aspects distinct from the ones in which ASFI regulates the most.Importantly, we find limited support for the argument that projects should embrace formalization, be it in terms of aligning governance focus or internalizing policies in more regulated topics, in order to successfully realize their objectives.Therefore, written formal policies from OSS communities may not be a comprehensive account for how their actual governance unfolds.

Recommendations
Our findings may carry certain implications for community members in the ASF, or the OSS ecosystem more generally.For example, since policy internalization around project configuration, and voting seem to correlate with project graduation, more formal policy (or informal attention) to these topics may help projects succeed.However, we caution against too literal an interpretation of our findings for practice.Our results may be specific to ASF, and as we have seen, some of these effects are unlikely to have a straightforward causal interpretation.
Our most responsible recommendation from this research, for practitioners in technology policy in general and OSS in particular, is to be pragmatic about governance, be cognizant of organizational variability and uncertainty, and be watchful but permissive about letting projects drift in their interpretation of policy.This allows volunteer communities to focus on self-regulation, activity, and enforcement of issues that they identify as requiring more clarity or structure.By subsuming policy development processes to community will, foundations are posed to gain a policy design that is informed by low-level daily experiences of contributors, and enjoys the legitimacy of its membership.

Threats and Validity
The findings presented in this study apply to only the ASF.Future replication across more organizations is hoped to enrich OSS governance research with more general insights.For the purposes of our study, we treat ASFI's standards for graduation as an evaluation of OSS success and viability.The ASFI's stated objectives and standards provide a well-rounded criteria to assess the relation of governance behavior with viable and sustainable communities (Section.5.6).It should be noted, however, that projects sometimes have varied reasons for choosing to graduate or discontinuing incubation.Reasons include but are not limited to their sense of cultural fit, or need for ASF's specific portfolio of support servers.Therefore ASFI graduation, while considered a respected and tested model of evaluation, may not generalize to a conclusive metric of OSS success.
Our work is based on large public mailing lists.While these are the central channels for ASFI projects, they also maintain private lists reserved for certain project businesses, including committer voting, etc.These are restricted from public access and are currently beyond our scope.ASFI leadership discourages the use of these lists as much as possible, and they are typically only used for "personnel" matters such as if a contributor is breaking a project's code of conduct or to vote in new committers.
Our study rests on information extracted by semi-supervised learning.The choice of semi-supervised learning was largely motivated by our constructs (Section 5.3), the limits of supervised learning, and most importantly to facilitate scalable organizational insight.Unsupervised/semi-supervised methods have known limitations, and are particularly difficult to validate.We tuned the performance of our clustering models utilizing established measures such as clustering validity and NPMI-based topic coherence.However, the very high values of R 2 that we report for our models are an encouraging sign that these constructs are credibly capturing important aspects of project governance activity.
We named the resulting topic clusters by examining the most frequent words used in them as well as the policies to which they were assigned.This qualitatively distills the essence of the clusters and makes it possible for us to interpret them for purposes of downstream analyses.Therefore, interpretations of topics and associated effects may vary across researchers and leaves room for reification.Through further checks, we find that the topics found in the main and supplementary analysis are largely even if not perfectly identified.
While we used domain-adapted language models wherever available, some tasks like semantic role labeling and semantic scoring were more specialized with limited models and datasets available.Annotating training data consistent with benchmark datasets is complicated for such tasks and limits the scope of the methodology for replicating results.We used models trained on standard benchmark datasets in such cases.
Certain project mailing lists did not reflect governed activity under all of the 42 different governance topics.This could be attributed to the extent of engagement or varied priorities across projects.For example, resource object management routines are likely exclusive to Java-based projects.Moreover, HDBSCAN sets a lower threshold on cluster size (0.001% of sample size).This leaves a possibility for the merger of minor routines into clusters representing more general themes, or being classified as outliers.
In (Section 5.3), we explain the computing overheads and limits on input size for transformer-based language models, often truncating broader text context in social interactions [89].Moreover, we conduct a granular, performance frame-level analysis of community operations.We encountered a few cases in our dataset where extensive policies with multiple nested or bulleted conditions were truncated, during intermediate preprocessing or parsing stages.Ongoing efforts at supporting longer context windows [79] for representation learning should expand the scope of language models for discourse analysis.

Conclusion
Open source software projects join foundations like the Apache Software Foundation despite the "anti-regulatory" tendency of many OSS developers.They do so because the standardized, streamlined governance systems that foundations operate provide clarity, best practices, mentorship, economies of scale, and lower administrative overhead.Yet OSS projects may simultaneously find themselves benefiting from formal structure and/or constrained by it to varying degrees.
While it is a widely accepted truism that governance in practice often differs from governance in form, demonstrating this at scale, and determining the manner in which formal depictions and ground behavior diverge, has been a challenge.
Articulating fundamental questions about governance practices through NLP methods, particularly language modeling, enables us to quantify the governance behavior of projects, including how they govern themselves and internalize formal policy.

Figure 1 :
Figure 1: Language modeling pipeline for extracting activities, aggregating routine governed behavior, and evaluating internalization.

Figure 2 :
Figure 2: Left: Distribution of ASFI policy extent across governance topics.Right: Distribution of governed activity of projects across different governance topics.Governed activity was not found to be significantly correlated to policy extent.

Figure 3 :
Figure 3: Left: Distribution of ASFI policy extent across governance topics.Right: Distribution of internalization scores within topics.Red and Green markers indicate the median and mean respectively.Internalization is observed to be higher in governance topics which are more regulated.
Stȃnciulescu et al. [78]extracted monthly performance metrics for 218 ASFI projects their incubtion.However, the tooling infrastructure they developed only supported mining software metrics from Git repositories.Moreover, Yin et al. mined project mailing lists up to Jan 2021, including ones that were mostly SVN-based, while Stȃnciulescu et al. span projects from March 2003 up to May 2021.Given these differences, we based our study only on those projects that are common to both datasets.This yielded 214 projects for which both project measures and email data were available.Moreover, there were some differences in the way these data were collected.Yin collected data in time windows of 30 days, whereas the other dataset collected data on a monthly basis (calendar timestamps).To resolve this mismatch, we modified the collection timeline to a 30 days time window in the tool provided by Stȃnciulescu et al, to match the time window in the dataset from Yin et al., and repeated the measurements for our variables of interest for these 214 projects.