Collecting, Analyzing, and Acting on Intersectional, Longitudinal Data and Pass/Fail/Withdraw Rates in Computing Courses

We present the Center for Inclusive Computing's data collection and visualization system, which enables computing departments to track and visualize their enrollment and course outcome data intersectionally and longitudinally. The system tracks the impact of institutional changes in how computing (particularly the introductory sequence) is discovered and experienced by undergraduates as measured by course outcome and persistence data. To date we have worked with and collected data from 52 U.S. computing departments. Collected data spans 2018-present and contains term-by- term, intersectional course enrollment and outcome data for CS 1-3, while also tracking declared majors and persistence to graduation. Drawing on our experience working with these universities we present guidelines for the analysis of intersectional, longitudinal data alongside our recommendations for actionable next steps. We present three case studies grounded in an analysis of CS1, demon- strating how an institution can understand their own computing program and develop interventions-specifically with an eye toward broadening participation in computing.


INTRODUCTION
When academic institutions evaluate the status of a department or program, whether for research purposes or resource allocation, evaluation often hinges on measuring student outcomes as a measurement of retention and a proxy for learning and satisfaction.For example, Pass/Fail/Withdraw (PFW) rates have been used in projects that range from assessing the impacts of face-to-face learning versus distance learning [31], to measuring the impacts of open educational resources [8], to evaluating the impacts of computingspecific programs and interventions [30].Outcomes are often presented as aggregated numbers, either for all students in a class or for students of a certain racial or gender identity.
Over the last two decades, there has been increasing focus on evaluating statistics like pass rates intersectionally, that is, along dimensions of both race and gender simultaneously [25,26] to ensure that the experiences of any particular subgroup of the population is not hidden [13].Indeed, in the key conclusions and recommendations of the the 2021 National Academies report on "Transforming Trajectories for Women of Color in Tech" [22] the authors write ...the lack of disaggregated data poses a major challenge to understanding the nuanced and specific needs of different subgroups of women of color.
Research has shown that Black women in computing have a consistently different experience than Black men and non-Black women [27,28], that all Asian students and Asian women in particular have different experiences than their peers, and that Latinx students have their own unique experience as well [18].
Another limitation in the published literature (and likely in practice) is that studies rarely span more than one term or one year [1,11].Longitudinal tracking allows the evaluation of interventions to determine if they are susceptible to confounding variables, such as whether they depend on a particular instructor, because the instructor of a course can vary from term to term.Obtaining intersectional longitudinal data can be challenging for individual professors, and even for departments, because the demographic data for students is typically centrally held (and often closely guarded) by the university's institutional research office.
In this paper we present a system for collecting such data to track the status and health of a computing program or individual course.Implemented by the Center for Inclusive Computing (CIC) at Northeastern University, 1 The system has been used to collect intersectional, longitudinal data for CS 1-3 alongside declared majors and program completion rates term-by-term at 52 universities in the U.S. Drawing on our experience working with these universities we present guidelines for the analysis of the data alongside our recommendations for actionable next steps.In the remainder of this paper, we first present the state of the art in data collection in the CS education literature.Next, we present our deployed system and data collection efforts followed by the guiding questions that such data can answer.Finally, we illustrate analyses based on these questions with three case studies and present our conclusions.

RELATED WORK
In this section, we focus on surveys of how data is presented in the CS education literature and in Section 3.1 we review other national data collection efforts in the context of our design choices.In 2022, Oleson et al. conducted a review of trends in collection, reporting, and use of demographic data in computing education research [24].They found that 68% of the 510 papers evaluated left the method of collection unclear, 23% used ambiguous aggregate terms to describe populations (e.g., "underrepresented" or "diverse"), 35% had incomplete reporting of demographics and only 10% fully reported both race/ethnicity and gender. 2 In a different study, Decker et al. [11] found that 72.5% of the 112 papers they inspected reported gender; whereas Oleson et al.'s findings were that only 32% of the papers they evaluated fully reported gender-that is all participants were described rather than just some.Indeed, the only demographic attribute that Oleson et al. 's work found to be fully reported a majority of the time was geographic location.
In terms of computing education, there has been a shift toward explicitly considering intersectional identity when designing and implementing programs [19,20].While gender and race in isolation are important factors for student experience, qualitative work has affirmed that there are more complex interactions at play [9].
Previous work posits that collecting data intersectionally is difficult due to the designs of systems themselves; demographic classification systems are often built with an architectural assumption of mutually exclusive classification variables [24].There are a small number of studies that explicitly grapple with the questions of how both race and gender affect students' experiences in computer science [15,18,28,32].Recently, Pournaghshband and Medel take the even stronger stance that computer science not only needs to adopt an intersectional approach to analysis but that this approach should 1 The CIC (www.cic.northeastern.edu) is a national effort to create systemic, sustainable change in U.S. universities to broaden participation in computing.The CIC works with and funds universities to make systemic changes to the way in which they offer their introductory CS sequence with the goal that true beginners to computing can discover, thrive and persist [2-4, 17, 21].Ensuring a pathway for true beginners is important for BPC because those who are true beginners are often from populations that have been historically marginalized in CS.As part of this work, as of Fall 2023, the CIC has conducted all-day sites visits of the undergraduate programs at the computing departments of 54 U.S. universities. 2Statistics were not given on whether these variables were described as independent dimensions or intersectionally.be intersectional along more than two dimensions [26].Some of this work, such as Latulipe et al.'s 2018 study of retention in the computer science major, groups so-called under-represented minorities together, forming a binary majority-minority analysis [15].Other studies, such as Xie et al.'s 2022 work on surfacing equity issues in large CS courses, allow students to select from many ethnicity options but then group Black, Indigenous, and students of color (BIPOC) together when presenting results [32].
The many nuances of collecting and displaying gender identity data are discussed in depth in Oleson et al. 's work [24].These nuances include the fairly recent recognition within western culture that gender is not a binary construct.Many data collection and storage systems make binary gender assumptions and do not accommodate changes in gender identity.These software and systems-based decisions are then imposed upon subsequent analyses.Such data nuance and restrictions apply to other demographic variables as well.For instance, when collecting racial identity data, the norm is that ethnicity is often overlooked except in outlying cases.An example of work that doesn't overlook ethnicity is Lewis et al.'s 2019 study that separated "Asian" into subcategories and described what these subcategories included [16].Including a more nuanced understanding of race and ethnicity often involves a great deal of work on the part of the researchers, such as in Ko and Davis's work, which dealt with the complexity of race and ethnicity by supplementing racial categories with information about the languages participants spoke at home [14].Rather than being the norm, these are outlying cases because they push-back against default systems.
The system we present in Section 3 shows the degree of intersectional data possible to access in higher education without making architectural or widespread policy changes and without conducting additional surveying of students.

SYSTEM OVERVIEW
We collected intersectional data tracking student populations in CS 1-3 from 52 different higher-education institutions in the U.S. from Spring 2018 to Fall 2022, with the median institution having contributed data for 7 terms. 3Institutions submitted data as part of the CIC's grant requirements with funding specifically earmarked for data collection. 4While faculty and departments are always able to make internal data requests, we have found that requests tied to external requirements and funding increase the likelihood and speed of access.Schools submit data for every term for CS 1-3 for the following variables: 5(1) Count of students: non-negative integer (2) Racial Identity: Hispanic/Latinx, any race; American Indian or Alaska Native, Not Hispanic (AIAN); Asian, not Hispanic; Black or African American, not Hispanic; Native Hawaiian or Other Pacific Islander, not Hispanic (HPI); White, not Hispanic; Nonresident Alien, Race/ethnicity Unknown (3) Gender Identity: Gender X/U, Men, Women (4) Major: CS, 6 Other Computing, Non-computing, Undeclared (a) External Transfer: Students who transferred from another institution during the current academic year.7 (5) Course Outcome: Pass, 8 Fail, or Withdraw (PFW).For each permutation of these variables, schools enter the count of students (no missing data is allowed) thus ensuring the data is intersectional.For example, schools would submit that "7 Latina women passed CS1 in Fall 2020" instead of "16 Latinx students passed CS1 in Fall 2020" and "50 women passed CS1 in Fall 2020". 9S 1-3 are the first three required programming courses of the CS major.These courses are typically not taken in the same term (e.g., none of them is Discrete Math).Typically these courses are intro to programming, object-oriented programming, and data structures.If a department does not have a required third course in the CS major then the department designates CS3 as the course that the majority of CS majors take after CS2.Our focus on the intro sequence stems from the observation that across the vast majority of our 52 schools, once a student passes CS3 they do not leave the major.
Table 1 describes the average aggregate characteristics across race and gender in the dataset as a whole for CS1 from 2018 through 2022.Fluctuation in number of institutions each year is due to data collection starting in 2020 and having participant schools submit two years of historical data.As more schools joined the program over time, the number rose, and as schools left, the number dropped.
Note that the analyses presented in Section 4 omit Gender X/U, AIAN, and HPI students due to low representation within the dataset. 10Students with Unknown racial identity are omitted from this analysis for ease of presentation.We cannot release this dataset to the public due to our grant agreements with the 52 universities.

Other National Data Collection Efforts
Prior to designing our data collection, we conducted a landscape analysis of data collected by other organizations in an effort to avoid duplicating efforts.While we did find efforts that collected data intersectionally by race/ethnicity and gender, we found that the level of granularity we sought wasn't available.For example, the National Center for Women in Technology's (NCWIT) Tracking Tool collects applicants, acceptances, new enrollments, and declared majors along with attrition, retention, and completion data by major, broken down by gender and race and ethnicity.However, the tool does not collect term-by-term, course-level enrollment and outcome data (i.e., PFW).In addition, as noted in ACM's Retention in Computer Science Undergraduate Programs in the U.S. by Stephenson et al., there are some inconsistencies in the presentation of the intersectional data among universities [29].We also examined whether the National Student Clearinghouse's Postsecondary Data Partnership (PDP) could be utilized [6].PDP collects institutional and student level data, including course enrollment and outcome data by gender and race/ethnicity.Unfortunately at this time, program of study data (student major) is only available for initial term of enrollment.This means that PDP is not able to track the difference in outcomes between majors and non-majors longitudinally.Both the Integrated Postsecondary Education Data System (IPEDS) [12] and CRA's Taulbee report [33] collect intersectional graduation data annually and Taulbee also tracks majors but neither collect term-by-term course outcome data.

Data Collection Limitations
Three primary issues place limitations on our data collection system: 1) the need to protect student privacy; 2) the lack of available Drop/Add data; and 3) that the demographic data collected by universities uses the categories defined by U.S. government.Because of student privacy we cannot track any particular student's pathway through the major and instead only can track outcomes of a group as a whole longitudinally.As not all departments can track Add/Drop data, we restricted our data collection efforts to the data that all departments track.Racial identity categories come from IPEDS definitions. 11Similarly, gender identity is also governed by current IPEDS recommendations and individual institutions' tracking systems.As of 2023, not all institutions allow students to update their records to a gender outside of man/woman.
Although a more granular view of intersectional demographics would have been possible using student-survey data, we prioritized being able to get a complete snapshot of the student population.Surveys come with a host of issues: low response-rates, over-surveying, and dependence on individual departments to distribute surveys in a timely manner.Instead, we designed our system to collect data that all institutions already collect, ensuring that we get a complete, rather than sampled, snapshot and that institutional research offices can pull historic data as needed.Another limitation of our system is that we do not collect student data regarding disability or finances.Conversations with the schools we work with indicated that the institutionally held data on student disability and finances would not be accessible even to the requesting department or college.

CASE STUDIES AND GUIDING QUESTIONS
In this section we demonstrate example analyses that an institution might conduct given their own intersectional data.First, we present recommended questions and actionable next steps.
Q1: What are your PFW rates by term and intersectional identity?Are there any terms when one of these rates is particularly high/low?(See Sections 4.1, 4.2, and 4.3) Next Steps: Identify commonalities for terms with low/high rates.Are these terms that were taught by a specific instructor?Is this instructor more suited to teaching upper-division versus lowerdivision courses?Do the terms have different course content or grading schema?Are there indications that different intersectional groups are having different experiences?For departments that do not enforce common assessment (same exams/assignment for all sections) [4], are there differences in outcomes among sections?Q2: Do any trends emerge for specific intersectional identities?(See Sections 4.1 and 4.3) Each case study that follows represents our experience working directly with three of our partner schools and their interventions and reflections.

Case Study: An Intervention that Reduces Withdrawal Rates
For the first case study, we focus on Q1, Q2, and Q4.In this example, we begin from a non-intersectional standpoint to look at aggregate PFW rates and then use an intersectional lens to bring increased understanding.This example showcases the identification of problematic withdrawal rates, the implementation of an intervention to counteract these rates by addressing the different levels of prior experience in a CS1 course, and the use of intersectional analysis to verify the success of the intervention.Interventions such as this one are based on broad prior work that shows addressing the difference in experience level is a key component of BPC efforts and student experience in CS1 [2,5,7,10,23].Institution X, a large R1 research university in the U.S., had a pass rate of 0.7 in their CS1 course in Fall 2018. Figure 1 shows the evolution of this institution's pass rate in CS1 over four years.Looking more closely, Inst.X's fail rate was only 0.05 while its withdrawal rate was 0.25 in Fall 2018.This begs the questions: 1) why are students withdrawing?and 2) are all students having a similar experience?We focus on the second of these questions.
Inst.X analyzed the PFW rates of groups for all intersectional identities.When pass rates were low in 2018, women were withdrawing at higher rates than their male peers-0.33 versus 0.24. 12ust describing the rates by gender alone is not sufficient to answer the question of "are all students having a similar experience?"Tables 2 and 3 show the evolution of withdrawal rates for women and men, respectively, separated by racial identity. 13These tables show that within gender identity, not all subgroups are having the same experience in CS1-that both race and gender are critically important to consider.
Having identified a problem (high withdrawal rates) and the groups of students most susceptible to the problem, Inst.X implemented an intervention in Spring '19.They bolstered curricular supports and developed a new program to better meet students with differing levels of prior coding experience.Figure 1 shows a Table 3: Withdrawal rates at Inst.X by race for men remarkable improvement in pass rates overall.Beyond relying on verbal reports from people at Inst.X that this intervention was succeeding, we can confirm its success for all intersectional identities across race and gender.In Tables 2 and 3, we see the rapid decline of withdrawal rate across gender from Fall '18 to Spring '19.We further inspect Spring '19 by subdividing the data based on both gender and race and observe that the withdrawal rates for women plummeted.During this term, all populations of women with more than five students (Asian, Non-Resident, and White) experienced large drops.This is a good indication of the success of the intervention because when it was implemented, all intersectional identities analyzed benefited from the intervention and the variability among groups was greatly reduced.While men did not experience withdrawal rates as low as women 14 we see a significant improvement in variability of experience among all intersectional groups.It is important to continue tracking intersectional data over time to reveal if an intervention is durable in its "sticking" power.When we inspect the data after Spring '19, we can add nuance given the shifting populations of the course as a whole (rising populations of Black and Latina women in particular) as well as see changes within subgroups over time.In Table 3 in particular, we see that between Spring '20 and '22, Black men are withdrawing at a higher rate than their non-Black peers.This indicates an area for further improvement in the future.We see similar notes for Black and Latina women in Table 2.All three of these groups have lower representation in CS1 at Inst.X than other intersectional identities, representing just 3.9% of the students in CS1 on average.
Looking at data in this way enables an institution in identifying problem areas and answer guiding questions to identify solutions, 14 Initially, the intervention actively recruited women of all races, but did not exclude or discourage men.It now focuses on all students without coding experience.

Case Study: Sawtooth Graphs
In this case study, we focus on Q3.In our dataset 14 of the 46 schools on a semester schedule demonstrated a significant sawtooth pattern in pass rate.That is, when comparing pass rates in fall terms versus pass rates in spring terms, the difference between the two was not only visually identifiable but statistically significant at p-value < 0.1.To run this analysis, we conducted two tests: Pass rates.We conducted paired t-tests to determine if pass rates from fall terms were significantly different than pass rates from spring terms.This flagged 8 institutions.
Change in pass rates.We calculated the change in pass rate from the previous term for each term (e.g., a 0.91 in Fall '18 to a 0.71 in Spring '19 is a change of -0.23), then grouped fall changes together and spring changes together and conducted t-tests to determine if changes in pass rates were significantly different.This flagged 11 institutions, of which 5 had already been flagged by the pass rates.Comparing raw pass rates captures schools like Inst.Z, an R1 public university, where pass rates in fall terms are between 0.80 and 0.92 and pass rates in springs terms are between 0.68 and 0.76, as shown in Table 4. Schools like Inst.Y, a large R1 land-grant state school, are not flagged by raw pass rate differences because there is some overlap in pass rates between falls and springs at this institution.However, springs are improvements over falls so comparing change in pass rates does capture schools like Inst.Y.
All 14 institutions have a characteristic sawtooth pattern in pass rates when inspected visually, with the majority having higher pass rates in the fall than the spring.The sawtooth pattern is relatively common for a number of reasons.For example, at Inst.Z, next steps would identify that a high number of students who take CS1 during an off semester (spring, in this case) are either re-taking the course because they did not pass previously or because they did not enter the institution intending to major in computer science.Indeed, discussion with Inst.Z confirmed that the spring population for CS1 includes non-majors, whereas the fall population does not.
These observations should guide future interventions, which need to take into account the different populations in the course in spring semesters.Inst.Z might further benefit from asking whether or not this is a CS1 course that in fact requires previous coding experience.A third analysis would look at who is teaching in fall/spring.As a final example, we do further analysis of Inst.Z as referenced in Section 4.2.In this case study, we focus on Q1 and Q2.Examining overall pass, withdrawal, and failure rates, shown in Figure 2, we observe three trends: 1) the overall pass rate is going down over time; 2) there is a strong sawtooth pattern in which fall terms have significantly higher pass rates; and 3) the overall decrease in pass rate is largely driven by increases in withdrawal rather than fail rate, though the proportionality over time is shifting.The follow-up question (Q2) of whether or not all intersectional groups are following similar trajectories is illuminated by visualizing withdrawal rates by both race and gender, shown in Figures 3  and 4.These graphs show that the increase in withdrawal rate is unevenly experienced across intersectional identities.
Inst.Z is an HSI.Notice in Figures 3 and 4 the spike in withdrawal rate for Latinx students in Spring '21.Not only are these students withdrawing at higher rates than other students, they were 26% of the class in Spring '21 when Latino men withdrew at a rate of 0.44 and Latina women withdrew at a rate of 0.62.The intersectional subgroups that made up more than a quarter of the course's population withdrew at alarmingly high rates.While we can make overall hypotheses about why pass rates were so low in Spring '21 by looking at general trends in Figure 2, we can't understand the full picture until we understand that Latinx students and Latina women in particular were differently affected that term than students of other intersectional identities.
This analysis shows PFW rates should be carefully tracked for Latinx students to see if Spring '21 was an anomaly, or if further intervention is warranted.Inst.Z further suggested that the rates in these groups may be influenced by COVID policies and procedures.In addition to taking the steps suggested in Section 4.2, these hypotheses could be investigated by conducting focus groups with students of different intersectional identities who have taken and attempted to take CS1 at Inst.Z.

CONCLUSIONS
This work shows that gathering intersectional, longitudinal data is not only possible but it is also essential to understanding the overall health of an educational program.This is particularly important in computing given that prior work has shown that different groups of students have vastly different experiences in our classrooms.We show that institutional research offices have this data and are willing to provide it given the appropriately formatted ask.We used our experience gathering this data from 52 different institutions to develop guiding questions and actionable next steps.Finally, we demonstrate analyses based on these questions that an individual institution can perform to gain insight into their programs.The three case studies highlight that while aggregated views of data are sometimes helpful, inspecting it with an intersectional lens is essential to measuring both the efficacy of interventions and identifying pain points that may need to be addressed.We welcome additional schools in our data collection program, not linked to grant funding.

Figure 2 :
Figure 2: Pass, withdrawal, and fail rates over time at Inst.Z

FA. 18 Figure 3 :
Figure 3: Withdrawal rates for Inst.Z by race for women

FA. 18 Figure 4 :
Figure 4: Withdrawal rates at Inst.Z by race for men

Table 1 :
Description of aggregate counts in the dataset for enrollment in CS 1 Next Steps: Conduct focus groups with students in areas of concern.Why are these specific students failing or withdrawing at higher rates, in their opinion?Is this a trend across terms, for one term or for one type of term (e.g., Falls)?Are there differences by instructor?Q3: Do course outcomes for different terms at your institution follow a predictable pattern (such as a consistent discrepancy between fall and spring cohorts)?(See Section 4.2) Next Steps: Same as Q1.Analyze student population changes termto-term, taking care to see when non-majors or major-discoverers are taking the course.Are the differences based on the instructor?Q4: Did your institution launch any interventions?What happened during these terms and those following?(See Section 4.1) Next Steps: Are there numeric changes for any intersectional groups following these interventions?Do the changes persist over multiple terms?Check both the course with the intervention (e.g., CS 1) and the course(s) following (e.g., CS 2).

Table 2 :
Withdrawal rates at Inst.X by race for women

Table 4 :
Pass rates and percent change between semesters for Inst.Y and Inst.Z that exhibit sawtooth patterns e.g., "is this a sustained problem or a one-off occurrence?" or "is this an issue that is only associated with fall or spring term or is it associated with both terms?" or "is one intersectional group having a different experience in the classroom than the others?"An affirmative answer to the last question would indicate that the environment itself may be hostile to one or more sub-groups of students and Inst.X should conduct student focus groups.