Beyond the Boolean: How Programmers Ask About, Use, and Discuss Gender

Categorization via gender is omnipresent throughout society, and thus also computing; gender identity is often requested of users before they use software or web services. Despite this fact, no research has explored how software developers approach requesting gender disclosure from users. To understand how developers think about gender in software, we present an interview study with 15 software developers recruited from the freelancing platform Upwork as well as Twitter. We also collected and categorized 917 threads that contained keywords relevant to gender from programming-related sub-forums on the social media service Reddit. 16 posts that discussed approaches to gender disclosure were further analyzed. We found that while some developers have an understanding of inclusive gender options, programmers rarely consider when gender data is necessary or the way in which they request gender disclosure from users. Our findings have implications for programmers, software engineering educators, and the broader community concerned with inclusivity.


INTRODUCTION
In order to use online services or other software programs, gender identification is often requested from users on sign-up.While this disclosure may be necessary for certain purposes, like providing medical care or helping people find community based on identity, gender is also collected for unclear or unessential reasons, such as advertising [13].This presents privacy concerns for anyone who does not desire to share their gender with others online-such as the women who face harassment in video games [22].Additionally, unnecessary disclosure may also present issues of erasure and other trauma to transgender, non-binary or other gender non-conforming people.
Despite its universality in society, gender is not a single uniform concept.While identified in many cultures as a binary determined entirely by the two most prevalent sexual phenotypes, gender identity is described more accurately by the American Psychological Association as "a person's internal sense of being male, female or something else" [5].
Gender identity is distinct, although theorized to be closely related to both "gender expression"-the way one presents or performs their gender to society-and one's gender assigned at birth, which is usually determined by physical characteristics [6].Increased understanding and acceptance of those with marginalized gender identities has led to changes throughout society, such as an increased focus on bathrooms as one of the final realms of public gender segregation in the United States [53].Online spaces have also introduced more inclusive gender fields, with Facebook now providing a free-form gender entry field rather than a fixed list [70].
Even with these promising changes, trans and non-binary individuals still face significant challenges in navigating computers and software, such as an inability to correctly gender themselves on digital forms and digital forms that do not allow them to opt-out of disclosure [85].Even when options are present for users to choose a gender which matches their identity, they may face erasure in the way that gender data is used.For example, a non-binary associate found that their employee profile at their place of work was accompanied by a silhouette which did not match their gender presentation because they had selected a gender which better matched their identity (see Figure 1).As they were not fully open about their non-binary identity with coworkers, the unanticipated use of gender data caused them privacy harm as well.Prior work has also found that advertisers connected to sites with "inclusive" gender options often still put users into binary boxes for advertising purposes [12,13].
"Male" Icon "Female" or "Diverse" Icon Fig. 1.An example of a bad application of user gender data: the website allows users to select one of three gender options ("male", "female" or "diverse") which are used to determine the default profile picture for the website.Gender and profile picture should be uncoupled.
In order to contribute to the growing body of literature on how gender is treated in technological spaces and better understand why developers write "shitty code" [18] with respect to gender, we present a first-of-its-kind study on developers and gender.Specifically in this paper we aim to answer the following research questions: • How do developers ask users for gender?
• How do developers use user gender data?
• What advice do developers receive about using gender data?
To answer these research questions, we completed 15 open ended, recorded interviews with software developers, asking each questions about gender and gender disclosure.We also collected and categorized 917 Reddit posts with a total of 11021 comments.While distinct, these methods complement one another by providing alternative perspectives into the topic.
Through our analysis, we found that while most developers interviewed were receptive to inclusive gender forms, the code snippets present in our Reddit dataset overwhelmingly used binary gender disclosure forms with only "male" or "female" options.We also found that there was very little discussion of proper uses or treatment of user gender data on Reddit.Combined with our interviewees' seeming lack of previous exposure to considerations around how they should ask about or use gender data, this strongly suggests that developers have almost no exposure to inclusive practices with respect to gender.We ultimately recommend that more research should be done to collect a broader range of developer perspectives and confirm our initial findings.Additionally, we also suggest that software engineering and computer science educators consider highlighting research-backed, inclusive gender practices in courses to ensure that developers are exposed to best practices before they enter the workforce.
The rest of the paper proceeds as follows: Section 2 discusses necessary background information and other related work; Section 3 describes in greater detail our research methodology for conducting both the interview study and our analysis of Reddit data; Sections 4 and 5 describe the results and key findings from our analysis of Reddit data and interviews respectively; Section 6 discusses the broader implications of our findings and potential future work; and Section 7 concludes.
Manuscript submitted to ACM

BACKGROUND AND RELATED WORK
In this section, we discuss essential background research which informs our methodology.We start with a positionality statement to help readers understand our perspective and motivations for completing our research.We then move to discuss research from the field of queer studies, which grounds our methodology.Finally, we discuss the prior work in HCI which has analyzed gender, as well as previous work looking at the practices of developers.

Positionality Statement
We acknowledge that our identities as researchers inform and shape the direction of our research [59].This paper draws on queer concepts of gender, and also discusses the impact of developers' treatment of gender on the experiences of transgender, non-binary, and other gender non-conforming people.The authors are cisgender members and allies of the LGBTQ+ community.The authors worked closely with trans and non-binary colleagues and friends, who gave feedback on our methodology and supported the project but opted not to participate as authors in favor of their own research projects.While we acknowledge that deeper insights can often be generated by those directly impacted by the identities being studied, we also acknowledge that engaging with somewhat toxic attitudes towards one's own identity can be tiring and traumatizing.We therefore welcome criticism of our endeavor, and hope to progress research that directly impacts our marginalized friends and colleagues from our privileged position as enthusiastic allies.While we pull heavily on research from the field of queer studies, our goal is not to theoretically advance the field, but rather to make an assessment of developers' knowledge, attitude and practices with regards to gender.We hope our research will positively impact how developers engage with gender so that queer people's marginalization in computing decreases.

Gender, Sex, and Identity
In the modern societies of North America and Europe, the dominant paradigm has held gender to be a strict binary determined by physical sexual characteristics. 1 This "cisgenderism" [4] is contrasted by the modern clinical and scholarly approaches to gender which accept not only gender but also sex as social constructs [24,50].Not only is the conflation of sex with gender wrong, but researchers have identified a number of different ways sex anatomy can develop (intersex traits) which can cause individuals to fall beyond the traditional male or female sex boxes [48,71].Sex is therefore not an immutable property inherent to the universe, but rather a set of socially determined categories based on physical and genetic characteristics-usually male or female [23,50,92,93].
Gender, by contrast, is a term which can be used to refer to several distinct but interrelated ideas: • Gender Identity: A person's inherent sense of being a man, woman, some combination of man or woman, or some other gender entirely [5,6].• Gender Presentation: The way in which one 'does' gender through both their physical appearance and the way they act [23,50,92,93].
• Gender Role: The way in which people of a particular gender are expected to act by society [23,50].
One may or may not present their gender in a way that makes others view them as a particular sex [92].These categories are highly contextual 2 and may vary over time with regards to a single individual [52].Like many other attributes, 1 This statement does not indicate that being transgender, non-binary, or otherwise not male or female is a new concept; non-binary conceptions of gender identity have existed across time and many different cultures [50]. 2 For a simple example: one may publicly identify as the gender with which they were assigned at birth, but privately identify with a different gender identity.
gender is not an isolated variable and its expression is influenced by other identities-such as disability [19].Within this framework, transgender people are those who transition from one gender to another [38,89].This includes non-binary trans people-those trans people who do not identify with either male or female genders.There are also people who do not identify with the label trans but do identify as non-binary [63,73].
Despite the scholarly recognition and increased public conversation of these concepts, trans and particularly nonbinary gender identities are not well understood in many modern societies [33,65].Trans and non-binary people still face both overt and subtle discrimination from friends and family [31,32], others in public [56], and even medical providers [74,84].They also face larger structural barriers in attempting to live their gender [82], like a lack of governmental recognition of their gender identity [28] or, in the world of research, the inability to change names on past research papers [91].Like many forms of discrimination, these issues are not uniform and are exacerbated by overlapping forms of oppression like racism and poverty, as Crenshaw first described in the concept of intersectionality [8,[25][26][27]61].
In this paper, we focus our analysis of development practices on a broad category of discrimination against transgender and gender non-conforming people-erasure.Erasure is the systematic practice of ignoring or minimizing the existence of transgender and non-binary people [69].Erasure can have deep impacts on quality of life, as it contributes to the discrimination described above in myriad ways, such as through the denial of medical care [10].Even in seemingly mundane cases-like a customer feedback form which only includes male or female gender designations, excluding non-binary identities [29]-such erasure adds to the many microaggressions which trans and non-binary people face [68] and which contribute to their increased rates of health problems [8].Addressing problems of erasure in software, as we do by analyzing developers behaviour and perceptions of gender, is therefore extremely important.

Classification in Social Computing
In the HCI community, especially in social computing, research on classification [16] has long contributed to the scientific discourse: The origins, potential impacts and also potential harms on society are discussed both in the context of historical developments [14,16], as well as in the context of qualitative coding, which, again, assigns categories and labels data [34].Notably, Bivens et al. [13] examine how gender is treated on social media platforms, both from the perspective of a user and an advertiser.Using a walk-through approach, the authors elucidate how various design decisions act to integrate gender categorization into the platforms.They ultimately find that-even among platforms which do not directly collect user gender data or which include freeform gender fields-a binary conception of gender is "baked in." They also convincingly argue that social media platforms' central role in the advertising industry gives them large influence over categorization practices throughout the entire computing ecosystem.Although we explore the problem from a different perspective, our work is motivated by the same problems as Bivens et al. [13], and our analysis is informed by their results.Our work, informed by prior work on classification and its impacts, considers the perspective of developers, who, whether or not they are aware of this, are performing classification work when implementing gender in software; we therefore contribute to and advance this conversation in CSCW.

Gender and Computing
Historically, work in the field of human computing interaction (HCI) surrounding gender has focused on traditional binary differences [17,20,90] such as how gender differences impact people's use of problem solving software [11].In line with the expanding definitions of gender accepted both in academia and public life, there is now an increasing body of research in HCI which looks specifically at the experiences of transgender, non-binary, and gender non-conforming people with technology.Prior work has looked at the ways that technology is used to create community for transgender Manuscript submitted to ACM and non-binary individuals and barriers they face to doing so [41,75,76]; the problems caused by artificial intelligence's application to gender [51], such as in facial recognition [78,80] and automatic gender recognition [44]; and the unique experience of gender transition and disclosure online [37][38][39][40]43], among other areas [42,49,88].While we do not directly investigate the problems which non-binary, transgender, and other gender non-conforming people face, such work is essential to understanding our project as it motivates our research.By understanding the approach that developers take towards gender, we hope to contribute to the reduction of such barriers to non-binary and transgender people's uses of technology.
There has also been some research which looks specifically at web forms.Most notably, Scheuerman et al. [77] present an evaluation of various gendered web forms with 350 non-binary participants, finding that forms which only offer male/female options are uncomfortable to participants.The paper, while not specifically studying developers as we do, recommends that developers consider alternatives to asking about gender-such as instead asking about pronouns-or offer inclusive gender options if gender is indeed necessary [77].We build upon this research by using the paper's insights to evaluate developers' use of gender.We were also motivated by the experience of the queer researcher Dr. Katta Spiel, who has discussed their experience with technological infrastructure and gendered forms in several previous publications [85,87].Their recommendations relating to survey construction and gender-written with two other researchers-also informed our screening survey form construction [86].

Developers and Reddit
One final area of related work which is important to discuss is prior work which has looked at the behaviour of developers as well as those studies which have used Reddit in particular.Due to their essential role in creating software, programmers and other developers have long been studied in computing research [72,81].Researchers have researched both the practices of developers-like where they log [30] and how they test applications [60]-as well as their perspectives on specialized topics, like privacy [36], security [2], and usability [46].To the best of our knowledge, we are the first to look specifically at practices and perspectives with respect to gender.
Researchers have specifically analyzed the social media website Reddit as a community of practice for developers [54,55].Reddit is a news aggregation and discussion forum that was founded in 2005.One of the key features of the website is the sub-forums called "subreddits" which are almost all community-created and moderated.Users can post either links or "discussion" posts on a subreddit.Each post is accompanied by a comment section in which others can reply to the post.Both posts and comments include voting buttons, allowing users to "upvote" or "downvote" posts to affect the order in which both posts and comments are displayed to others.An upvote increases a post or comment's score, while a downvote decreases a post or comment's score.Reddit, thus, presents a rich source of both quantitative and qualitative data about users' perspectives.[67] There are a number of communities centered around learning or discussing programming, with over 3.5 million users subscribing to the "programming" subreddit3 alone [1,45].While there are other websites like StackOverflow4 which are more directly about programming, Reddit's discussion-oriented format makes it a rich repository for qualitative perspectives on programming.Researchers have used Reddit [58] and StackOverflow data [62,94] to gain insights about how developers think about specific topics, such as privacy [58] and security [62,94].We contribute to this larger body of work by exploring how developers discuss gender on Reddit.

METHODOLOGY
In this section, we explain our methodology, beginning first with a description of our collection and analysis of programming sub-forums on Reddit before describing our interview process and analysis.While described and analyzed separately, these methods complement one another in the following ways: participants in an interview study may, intentionally or not, misrepresent their true opinions or practices.Reddit is a pseudo-anonymous space wherein users may feel more comfortable expressing their true opinions.Reddit also reflects a subset of the public conversation and practice around gender, which was not assessed in our interviews.Similarly, in interviewing developers, we were able to directly interrogate issues which were very infrequently discussed in our Reddit data.We believe both of these studies provide valuable insight.

Reddit Study
The first part of our study consists of qualitative analysis of posts on programming subreddits on Reddit, with 917 posts and 11021 comments collected on June 22nd, 2021 using Reddit search, seeded with terms related to gender.By analyzing posts on Reddit, we are able to get insight into how programmers speak to one another about gender data outside of the formal environment of an interview.We are also able to more quantitatively evaluate the various ways that developers use gender data in applications.

Data Collection.
The first step in the data collection process was to select subreddits for analysis.Criteria for subreddits to be included were broad: first, they had to be focused on programming or user experience/user interface design; second, they could not be focused on a specific programming language as our focus is on general software development practices, rather than any particular language.Additionally, we felt that subreddits focused on a particular language may be less likely to include discussion on gender.From those subreddits which met these criteria, we selected 11 covering a broad range of topics from web development (/r/webdev and /r/web_design) to game development (/r/gamedev).The full list of subreddits used-and the number of posts collected from each-is in Table 1.
In the remainder of the paper, relative size comparisons for subreddits is based on the number of subscribers listed in this table.After selecting subreddits to investigate, we then created a list of search terms which we felt would produce germane results.This included general terms related to gender-like "gender" and "sex"-as well as terms Manuscript submitted to ACM
Reddit posts were collected using the Reddit search API5 seeded with relevant search terms and limited to one of the programming/design related subreddit at a time.While using the Reddit search API may exclude some user-or moderator-deleted posts which are preserved in other archives, this method of data collection has the benefit of mimicking more accurately how an average user would use the website.The content, metadata, and comments for each post were then collected and downloaded on June 22nd, 2021.Using this method, 917 posts and 11021 comments were collected.

Analysis.
Prior to further analysis, it was necessary to evaluate posts, as many were not germane to gender.The collected threads were placed into three categories: • Relevant Discussion: Threads which explicitly give or request advice on gender disclosure or the appropriate way to use/implement gender data.• Using Gender: Threads which were not about gender disclosure but included code or other descriptions of using gender in software; we further analyzed such threads for the representation of gender used in the program.
(e.g.binary, binary with an "other option, " free entry, etc.) • Irrelevant Discussion: Threads which were not about user gender data at all, such as those which only used a particular term in passing; this category includes discussions of gender identity in fictional media like video games.
These categories allow us to investigate all three of our research questions.Posts in the "relevant discussion" category reflect the advice developers receive on using gender data, while the "using gender" category gives insight into how developers integrate gender into software.Categorization was completed by the primary author in consultation with the other authors.
After categorization, we calculated summary statistics on Reddit posts in each group, including median "upvote ratio" (the ratio of upvotes to downvotes on a post), median number of comments on each post, and the median number of upvotes.The upvote scores are not exact, as Reddit does not provide the number of downvotes or the exact upvote ratios for posts to avoid vote manipulation.We also conducted in-depth qualitative analysis of the significant posts which discussed, requested, or gave advice on proper practices for handling gender.The complete results of our analysis are in Section 4.

Limitations.
Our Reddit procedure has some limitations.First, as mentioned above, our sample is intentionally biased and cannot be said to reflect the general rate or nature of all conversations on user gender data on programming Elijah Bouma-Sims and Yasemin Acar subreddits; this bias is necessary to analyze posts on gender data which are likely not common.Second, due to the use of Reddit search to collect data, our results may not be exactly replicable; we plan to preserve our data for 10 years and will provide it to interested researchers. 6Third, the level of anonymity afforded by Reddit may affect the opinions users express or the manner in which they express their opinions.Finally, while we have attempted to clearly define our analysis categories, there is inherent subjectivity which cannot be avoided.Ultimately, despite these limitations, we hope to provide valuable initial insight into discussion of gender data among developers.

Interview Study
In addition to our Reddit study, between June and August of 2021, we conducted a semi-structured interview study of 15 developers, most of whom had worked with gender in software.These interviews allowed us to ask developers directly about user gender data and gain more insight into their thinking around gender.The interviews were typically around 20 minutes in length and conducted in an open-ended fashion.As much as possible, the interviewer avoided prompting interviewees, instead allowing them to speak on topics in their own words.
3.2.1 Participant Recruitment.Our eligibility criteria were broad in order to capture a wide spectrum of experience from recent graduates of engineering schools to those who have worked in software development for years.Interviewees were required to be at least 18 years old, comfortable with conducting the interview in English, and they needed to have at least a year of professional programming experience or a degree in a relevant field-such as computer science.
We established eligibility via a pre-screener.
Participants for interviews were recruited in two main ways.First, we posted an advertisement on the freelancing platform UpWork. 7This advertisement included short screening questions based on our eligibility criteria to prevent spam or ineligible participants.All except one person who responded to the posting were eligible.Using UpWork's tool, which allows users to directly invite freelancers suggested by the platform, we also sent the advertisement directly to several freelancers in the United States in order to diversify the geography of the interview pool 8 and increase the number of participants.Interviewees were also recruited using an advertisement on the social media platform Twitter.The advertisement was posted as an image by one of the authors on their public Twitter account.The Twitter account had at least 200 followers 9 at the time that the tweet was posted.The tweet received over 7,000 impressions during the study, according to Twitter's analytics.Ultimately, six participants were recruited through UpWork and eight participants were recruited through Twitter.One of our interviewees was recruited in a distinct manner.In order to ensure that we got the perspective of a developer who had worked with gender and marginalized groups, we reached out directly to a programmer who had previously worked on a website for transgender and gender-non-conforming people.This participant was asked the same questions as other participants, but their unique insights are discussed separately when appropriate.
We interviewed until we reached theoretical saturation; that is, no new ideas were discussed in the interviews [21].
All interviews were conducted by the same author-a man in his 20s.

Interviewing
Participants.Once participants were shown and digitally signed an informed consent form, they completed a pre-interview survey about their demographic and professional background to better describe the diversity of our sample.This survey also allowed us to focus on developers' qualitative experiences in interviews.The complete survey can be found in Appendix A. In the interviews, we were interested in our participants' professional experiences with gender forms/requests/coding in software development, and we developed our semi-structured interview guide to reflect this.We discussed our interview guide with one non-binary developer who is also actively engaged in LGBTQ+ community building.Finally, we conducted a pilot interview with another researcher to confirm both the length of the interview and that questions were appropriate.The pilot interview is not included in the results.The full interview guide can be found in Appendix B.
Our interview guide consists of three parts: pre-interview procedure, interview questions, and example gender options.The pre-interview procedure included instructions on how to introduce oneself, how to give an overview of interview procedure, and other reminders.While only one author conducted all interviews, having a defined preinterview procedure helped ensure that interviews were performed consistently.Once the interview began, we moved to the second part of the interview guide which includes a list of questions.The exact phrasing of questions varied slightly between interviews.The interviewer also occasionally asked follow-up questions for clarification.Our initial questions discuss general and gender-aware programming experience.We then move to discuss how participants would hypothetically handle gender data, including asking about gender, whether they consider gender private, and finally whether they have any concerns about using gender data.
After all other questions were answered, we then displayed example gender disclosure options contained in the third part of the interview guide.We discussed each with the interviewees to get their perspective on whether each option is appropriate and when it may be useful.We first showed them the least inclusive gender disclosure prompt, which included only "male," "female," and "prefer not to answer" options (the "binary" option).We then showed a disclosure prompt which included the same options as the previous example as well as a third option of "diverse, " as is used in government forms in Germany [35] (the "ternary" option).Finally, we showed them our most inclusive option, which was developed based on the recommendations in Scheuerman et al. [77] as well as other works discussed in Section 2. This example identified the data being asked for as "gender identity" rather than "gender, " as was asked in the previous questions.It included the options "man, " "woman," "non-binary, " and "self-identify" with a text box.We finally showed them the current ISO/IEC 521810 standard for storing human sex: 0= not known; 1= male; 2 = female; 9 = not applicable. 11See Appendix B for the full texts and options of all variations.Finally, we concluded the interview by asking if participants had anything else they wanted to share or discuss.After stopping the recording, we also asked if they had any questions for the interviewer before the interview call was terminated.

Ethics & Data Protection.
Prior to participation in the interview study, participants consented to participation in the study as well as audio recording via our consent form, in which we listed our data collection, storage and use practices.We informed them about their right to withdraw from the study at any time without any repercussions or loss of benefits, as well as the option to skip any questions in the interview.We also agreed to quote interview participants in a non-identifiable manner.While this may limit the quotes or descriptions we can report, it also allowed interviewees to speak freely about their experiences.We use quotes throughout Section 5 to enhance our analysis.Finally, we answered participants' questions about procedure and the purpose of the study, and debriefed after the interviews.
All interviews were conducted using the Zoom 12 meeting service.Meetings were recorded via Zoom's built-in recording feature.While the interviewer had video conferencing enabled for all interviews, participants were free to enable or disable video as desired.Visual aspects of interviews were not analyzed.Audio-only recordings were then stored in a secure, VPN-accessible institutional cloud and transcribed by a GDPR-compliant service.Transcripts were pseudonymized and also stored in the secure cloud, both for storage and for collaborative coding.Participant data required for payments was stored separately and only used for payments.Participants were paid US $30, as we expected interviews to last up to 30 minutes, and wanted to pay US $60/hour, the average rate for Upwork developers.In practice, the interviews were shorter (mean = 18.0 minutes, std = 3.24 minutes).The study was approved by our institution's ethical review board and data protection office.

Interview Data
Analysis.Once all interviews were complete, we used deductive and inductive coding for our data analysis [21,66].We began with an a priori code list that corresponded to our interview and research questions.
Following this, the first author developed a codebook by inductively going through the interviews, continuously checking in and discussing with the team, and operationalizing the codebook by formally defining the codes.Once the codebook was stable, the first author coded all transcripts in coordination with the research team.The codebookincluding definitions-can be found in Appendix C. Throughout and following the coding process, we met and discussed emerging concepts, themes, and relationships, which allowed us to explore how developers relate to gender when programming.In Section 4, we discuss the results of coding as well as other significant observations we made.

Limitations.
Like any interview study, our participants' responses are subject to biases, including recall, selfreport and social desirability bias.In addition, we recruited developers from Twitter and Upwork.Demographically, our sample skews young and towards men, which is sadly reflective of the general state of gender representation among developers. 13While we do not claim a representative sample, we did try to mitigate recruitment biases by (a) recruiting specifically from within and outside the United States; (b) recruiting with and without use of our own social networks; (c) recruiting from within and outside the "progressive, gender-inclusive bubble" that the authors mostly live in.Since the smaller sample of qualitative participants does not support generalizability, we did not perform testing for differences based on either demographics or recruitment channels.Especially in contrast to the Reddit data, the presence of an interviewer may have affected the interview results.Interviewees may have been less willing to express a preference for noninclusive gender options when speaking with a researcher as opposed to those posting on an anonymous forum (i.e.social desirability bias).Additionally, developers who agree to participate in a study explicitly about gender practices may be more inclined towards inclusive practices.Indeed, one participant shared-after the interview session-that they responded to our Upwork ad because of their interest in accessibility and inclusively.While these biases may have influenced participants' stance towards gender, they did report noninclusive practices, and we think that our interview results give meaningful insights into how developers program gender.

REDDIT RESULTS
In this section, we present the results of our study of discussion of gender on Reddit.We start by describing our dataset in broad terms before moving to discuss specific aspects of the data.
Manuscript submitted to ACM Post Year

Overview
The vast majority of posts in our dataset were irrelevant to our research questions (see Table 3).Very few threads discussed appropriate practices towards user gender.Out of the 917 total threads, only 16 (1.7%)either offered or requested advice on how to use gender data in software.The median upvote ratio of those posts was 0.57, which is much lower than the overall median.244 posts discussed topics other than gender data but did include code or other descriptions of using gender data in a program.
As shown in Figure 2, most posts are from within the last 5 years.58% (533) of posts in the dataset are from 2017 or after and only 5.8% (53) of posts in the data set are from 2011 or before.This result is in line with the exponential growth in Reddit posts over time reported by Medvedev et al. [67].This observation may also be attributable to an increase in conversations on gender over time, but we do not evaluate this assertion.Regardless, this does indicate that our dataset is mostly made up of recent content rather than older posts which may not reflect the current state of discourse.
Before moving to discuss the two main categories of post in detail, we briefly want to touch on the "irrelevant" category.We did not perform a systematic evaluation of irrelevant posts; however, during categorization, we did notice that much of the content in the category was about gender diversity and inclusion in the field of software development.
While analyzing such conversations is beyond the scope of our paper, the large number of such threads may be a reflection of the increased focus on gender diversity in contemporary life and particularly in technical fields like software development [47].

Direct Discussion of Gender Data
As shown in the previous subsection, we found very little content which discussed appropriate practices with regards to user gender data.Of the 16 posts on the topic, half were links to external websites with discussion on using gender data ("link posts").The remaining eight posts were self-contained discussions about gender data ("discussion posts").The majority of the posts in both subcategories spawned very little conversation.Two of the posts received no comments.
11 posts received between 0 and 10 comments, and three posts received 20 comments or more.12 (75%) of the posts were made after 2017 while only one of the posts were from before 2011 (6.3%), roughly mirroring the overall dataset's bias towards more recent posts.
The post which spawned the most discussion was a link post on the /r/programming subreddit.The post focused on Google Cloud Vision's choice to no longer return gendered labels like "man" or "woman, " and it had over 480 comments.
It also had the highest score of the subset of relevant posts with 130 upvotes (0.72 upvote ratio).The post consisted of a link to Google's AI ethics principles, including discussion of avoiding gender bias in programming AI [3].The post title copied language directly from Google and seemed neutral towards the Google guidelines.In contrast, the vast majority of comments were against removing gendered labels from the API, with the most highly voted comments suggesting that "lunatics" or "inexperienced college grads" have taken over Google.While some comments expressed positive sentiment towards removing binary conceptions of gender from the API, they almost all had negative voting scores or were marked as "controversial" by Reddit-indicating that despite having a positive score, they received many negative votes.
Only two other link posts had a positive score.One from the /r/coding subreddit, with a 0.63 upvote ratio, suggested that developers should use functions to easily program custom pronouns based on user gender.It recommended the use of singular "they/them" pronouns as a default option.Comment feedback was largely positive.There were a few comments which expressed light criticism, stating that the blog post was trivial.Users also pointed out that the proposed function only worked for English.This post only very lightly touches on conceptions of gender, which may have helped it avoid the harsh reaction levied towards other threads.
The other positively received link post-from the /r/userexperience subreddit-led to an article on creating more inclusive gender options and had an 0.70 upvote ratio.The advice in the article generally mirrors that of Scheuerman et al. [77], recommending that programmers avoid asking for gender when possible, give a good reason to ask about gender when it is necessary, and include inclusive options in all gender forms, among other advice.It may be undercut, however, by the only reply stating that the "gender diaspora" has gotten "crazy." All of the other link posts have a 0.50 upvote ratio or below, indicating that such posts are not well-liked by the programming communities on Reddit.The least well-received post-with an upvote ratio of 0.22-was a link to an Manuscript submitted to ACM Other (Specified) 20 Table 5. Types of gender fields in the Reddit dataset.

Representation of Gender Frequency
Unknown 149 Binary 80 Binary with third option 9 Facebook/Instagram 5 abstract discussion of the construction of gender for artificial intelligences, which received no comments.Of those posts which do have comments, the criticism seemed to mirror that present under the post on Google Cloud Vision: dismissal of the topic as unworthy of discussion.
Discussion posts in the dataset were largely confined to the smaller subreddits in our corpus, including /r/userexperience, /r/UI_Design and /r/web_design.All had scores of 0.50 or above, and all except one received comments.Discussion posts covered a wide variety of topics, from how to tailor user experience to particular gendered audiences to whether the term "sex" or "gender" was most appropriate for a particular form.While the variety of posts and small sample size means that it is difficult to generalize, the advice was largely inclusive.For example, the most highly voted response to a discussion post which inquired about the best practices for requesting user gender gave advice similar to the best practices described by Scheuerman et al. [77].The most germane critique of the advice users were given in response to their questions is that, as one might expect from a casual internet forum, advice was not sourced and relied heavily on personal experience.
Largely, then, Reddit users looking for advice on gender seem to receive inclusive advice when they request it through discussion posts.On the flip side, link posts about gender, particularly on the larger subreddits in our corpus, seem to receive dismissive or, at worst, extremely negative reactions.One reason for this disparity may be the different topics of the subreddits.For example, the /r/UI_Design and /r/userexperience subreddits are focused on the human aspects of software development in contrast to the broad technical focus of larger subreddits like /r/programming or /r/compsci.While we do not have any data on the actual audiences of the communities, we believe that this topical difference may have fostered a community which was better able to talk about gender.
The most concrete takeaway from this data, however, is that very little discussion on best practices towards gender occurs on Reddit programming communities.While our inability to find posts on handling gender data is not absolute proof that these conversations are not happening somewhere on Reddit, they seem to not be occurring on the larger subreddits in our corpus.

Other Uses of Gender Data
While our main aim in analyzing Reddit was to observe the advice that programmers give one another, we were also able to use our dataset to look at some of the ways that programmers are using gender.As our sample is intentionally biased towards particular keywords, we cannot assume that the statistics reflect the broader nature of all programs-or even all code posted on Reddit.However, our analysis contributes to the larger view of how programmers use and view Manuscript submitted to ACM Elijah Bouma-Sims and Yasemin Acar gender.
Table 4 shows the use cases of gender data found in our dataset, with more than 5 occurrences in the dataset.The vast majority of posts (142) did not specify a use for gender data.Of the posts which did specify a purpose for collecting gender, the most common use cases were audience demographics (35)-such as for advertising or developing a business plan-and artificial intelligence or machine learning ( 19)-such as classifying photos.As the posts used to collect this data were not explicitly about gender, we cannot necessarily infer anything from the large number of threads which do not explain why they collect gender data.It is notable, however, that the most common specified use case is not directly necessary for the functioning of an application: demographic analysis.
The other data point collected when categorizing posts was the representation of gender used in the application, such as binary or binary with a third option.Similar to categorizing the uses of gender data, most posts (149) did not contain enough information to determine the representation of gender.Some of the "unknown" posts did include information about the type of variable used to store gender information, with 22 specifying gender as a "string" type variable and 16 specifying gender as a "char" type variable; we cannot infer from this how exactly the program implements gender.
Overwhelmingly, the threads which include information on gender "type" featured binary conceptions of gender, either by storing gender as a "boolean" type or by only including "male" or "female" options in their code.A small number of posts (nine) included an "other" option along with the"male" and "female" options.We also saw some posts (five) which used gender values imported from Instagram or Facebook.Ultimately, then, most of the Reddit posts which feature programs which use gender data-and include enough detail to identify an approach to gender-use noninclusive, binary options.These data, particularly in light of the hostility observed in direct discussions of inclusive approaches to gender information, suggest that the dominant conception of gender held by developers on these subreddits is essentially binary.That is to say, they likely view gender as consisting of two opposite categories-"male" or "female"-which are universal.
One important caveat to this finding is that not all of the posts are real world examples.Many posts explicitly mentioned that code excerpts were from school projects. 14Additionally, the large prevalence of posts from the subreddit /r/learnprogramming in the dataset means that even non-school projects may still be learning exercises.These data are still valuable, however, as such posts still reflect the way that people learn how to use gender data in programs.
Further, early in a software programmer's education, school projects might be the most appropriate location to broach the subject of gender.
In summary, we observed very little direct discussion about gender in our Reddit dataset.In the few posts where users requested advice on how to treat gender data, advice was generally inclusive.Unfortunately, general discussion about inclusive concepts of gender was faced with hostility.Moreover, binary concepts of gender were the most common observed in practice on Reddit.With these findings in mind, we now move to look at the results of our interview study.

INTERVIEW RESULTS
In this section, we present the results of our interview study.We start by discussing the results of coding before moving to qualitatively discuss other significant findings.When relevant, we specifically highlight the views of the participant who had previously worked on a social media community for transgender, non-binary, and other gender non-conforming people.While this participant's experience is likely not typical, we highlight it to elevate their unique perspective on developing for marginalized communities.This interviewee will henceforth be referenced as the "expert 14 Several posts from different users appeared to reference the exact same homework exercise on programming a BMI calculator.
Manuscript submitted to ACM participant." At points, we compare our findings to those from the previous section for illustrative purposes.Due to the different methodologies, however, we do not draw definite conclusions about how participants may differ from users in our Reddit sample.

Interviewee Demographics
Using the methods described in the previous section, we recruited 15 participants.Eight were from the United States, two were from Germany, two were from Pakistan, and three were from other countries.Participants ranged from 19 to 39 years old (median = 23).Twelve were men, two were women, and one participant chose the option to self-describe, but did not fill in the text box.Six were white or of European descent, four were South Asian, two were Hispanic or Latino/a/x, one was Middle Eastern and one was Black or of African descent.Participants had educational backgrounds ranging from being a high school graduate to having a master's degree, with six participants reporting a bachelor's degree, three reporting a master's degree, two reporting some time at college, and two reporting a high school diploma.
Of those with degrees, seven reported having a degree related to computing, such as computer science or engineering, while one reported having a degree in the social sciences.Finally, participants reported one to ten years of development experience (median = four years).

Coding Results
Our interviewees were roughly evenly divided in terms of development background.Four participants (26%) identified themselves as primarily frontend developers, five participants (33%) described themselves as backend developers, and five participants (33%) described themselves as working on both backend and frontend development and were coded as fullstack developers.One participant did not give a clear answer to the question.Almost all interviewees had experience on projects working with both personal data and gender data: 14 participants (93%) reported experience with user personal data and 11 participants reported previous experience with gender data (73%).
As Table 6 shows, interviewees brought up many similar applications of gender data to those observed in the Reddit dataset.Participants brought up uses of gender data throughout the interviews, but most commonly after they were asked about their experience using gender data in programming applications.Demographic analysis was the most frequent (six times) use mentioned by our interviewees and it also appeared most frequently in our Reddit dataset.
While dating was frequently mentioned (four times) by interviewees as a use for gender information, none had worked on a dating application before.This use could indicate a conflation of gender with sexuality, but we find it more probable that this expresses the assumption that gender is an essential part of dating.Interviewees also discussed using gender in other ways where it is likely not essential.Legal/financial purposes and AI/ML came up in multiple interviews.In the case of legal/financial purposes, one interviewee mentioned that they thought gender was required by regulation: "if you sign up for contracts with some companies, most times you need gender data for verification.The project I worked on before was for a ... telecommunications company... and they sure collected gender data when someone signed up there."We were not able to verify what regulation he was referencing, but his statement seems to suggest that he believed it was required because it was collected.There may not have been an actual legal requirement.
Table 7 shows the different ways interviewees stated they would treat gender data.In contrast to Reddit, most interviewees (seven) expressed that gender should be treated as a spectrum-with many discussing that users should be given the opportunity to freely enter any gender, even if other options were given.For example, one participant described the following as how they might collect gender data: "I guess the easiest way to do it would just to be having [sic] some string input field, like 'what gender do you identify as?' And then maybe to write... something like 'you Manuscript submitted to ACM Table 6.Uses of gender discussed in interviews; frequency represents number of interviews in which the use case was brought up.

Use of Gender Code Frequency
Audience/Client Demographics 6 Dating 4 Healthcare 3

Diversity Protection 3
Legal/Financial Purposes 3 No Specific Purpose 3 AI/ML 2 3rd Party Telemetry/Advertising 1 Table 7. Opinions towards treatment of gender discussed in interviews; frequency represents the number of interviewees that were coded with a particular code.

Treatment of Gender Code Frequency
Spectrum 7 Binary with Other 3

Dependent on Audience 2
Binary 1 don't have to answer this.' Make it an optional field kind of thing, just giving them the option to place whatever they feel in that field like as opposed to giving them a drop down menu or some sort of binary selection." While the quoted participant used hedging language like "I guess" or "maybe," the participant seemed to have a strong grasp of users' needs.Only one interviewee expressed that gender should be a strict binary, justifying their view by saying that "some things we need in male and some female."Notably, two interviewees were unclear or unsure how they should treat gender data.This reflects a general uncertainty which most interviewees had towards our questioning about gender, with interviewees often requiring clarification or time to think before they could discuss what choices users should be given.
This result does not necessarily indicate that participants conceptualize gender in the manner that their responses were coded.The interview questions focused on developer practice-rather than each individual's concept of gender.A participant who indicated that users should be given the option to self-describe their gender-alongside binary choicesmay still passively hold to a noninclusive understanding of gender.Additionally, as participants are self-reporting behaviour, their true practice may vary.For example, one participant referred to the ternary gender disclosure field as adding a little "wokeness," in contrast to the binary gender disclosure field.This suggests that, despite reacting positively to the more inclusive gender options, he may have felt that additional gender options were more of a political statement than a way to ensure more users are included.
The expert participant discussed at length the approach to gender taken by their social media community.Rather than requiring users to report their gender identity, the community allowed individuals to report their pronouns.The Manuscript submitted to ACM site also provided the option for users to tag posts with descriptors like "transwomen"15 so that they could find others who shared their identity on the website.This approach is in line with the best practices discussed in Scheuerman et al. [77].It also maximizes users' ability to use the website without unnecessary or uncomfortable disclosure, while also enabling people to find communities which shared their identity.
One aspect of developer views towards gender data which we were unable to evaluate with Reddit was developer thoughts on privacy.While gender is not private for most, gender privacy can be important to marginalized communities and we wanted to see if developers considered such use cases.While not initially part of the interview procedure, the interviewer prompted participants by asking them to think about the privacy of data as a range from the most private thing-like one's tax ID number or bank account password-to completely public things-like a username on a forum.
We did not evoke academic models of privacy as we were more interested in developers' immediate perceptions than deeper evaluation.
Even more so than the previous questions on gender data, developers had to spend time considering their answer.
Most (seven) settled on gender information being equivalent to other personal data without special consideration.Four interviewees felt that gender information was entirely public, while two interviewees felt that gender information was extremely private.For example, one of the participants who felt that gender information was extremely private stated: "I absolutely think gender data is quite private... if you ask me on a scale of one to five, I rate it five, where five is the one I think is the most private data, and I feel people would be more, in terms of the questions that I'd ask, I think gender was the most sensitive one.And that was the reason I had not made it mandatory.People can answer, or they may skip it.It's up to them.But, yes, I completely do believe that gender data is probably one of the most sensitive or private data."Finally, one interviewee felt that the privacy of gender information was dependent on audience, specifically mentioning that certain marginalized communities may feel that gender information is extremely private while others might not.Six interviewees also discussed opt-in permissions for gender information, allowing users to choose whether they shared a data point or not.Largely, then, interviewees, while expressing an understanding that personal data requires protection, did not have knowledge of the unique privacy concerns that may surround gender data.
The expert participant discussed their community's general approach to privacy in detail.They collected almost no private information about users, with profile pictures and usernames being the only potentially directly identifiable information shared by users.Additionally, users had the ability to create "private" accounts which were only visible to other users of the platform.This protected users from the public indexing of their posts or profile.Although this was not mentioned by the participant, this likely helped protect users from the data collection present on other websites which can lead to unwilling gendering through advertisers [12].
The final coded results from our interviews concern developers' reactions to different example gender disclosure forms and the ISO/IEC 5218 standard.An overview of the results can be seen in Figure 3. Developers reacted most positively to the most inclusive gender selection but did not necessarily have significant critiques for other options.
Developers had trouble understanding the second displayed option, which added "diverse" to the binary options.This included the developers from Germany who might be expected to be most familiar with the options.The ISO/IEC 5218 standard was uniformly received poorly, with only one of the interviewees having seen the standard previously.
Several interviewees correctly pointed out that the standard was for sex rather than gender.Notably, two intervieweesincluding the developer who had previously worked with marginalized communities-mentioned that while our most inclusive gender option was the best of the three gender disclosure fields, it could be made better by replacing it with a text box.Two interviewees reacted strongly negatively to the most inclusive field, with one stating that identifying with multiple terms is not typical and another stating that people might not want to share if they are non-binary.While this latter concern is valid, it is more of a reason not to collect gender data rather than a reason to use less inclusive forms.

DISCUSSION
Here we discuss the major takeaways of this paper, make recommendations to educators and developers, and suggest future research directions to improve the handling of gender in development.

Comparison of Reddit and Interview Study Results
Returning to our framing research questions, our two studies provide the following results.
1. How do developers ask users for gender?The plurality of our interviewees showed a preference for treating gender as a spectrum.Additionally, the most inclusive gender disclosure forms presented in interviews received the most positive reactions.In contrast, in our Reddit dataset, the majority of analyzed code snippets relied on a binary conception of gender.Based on the totality of our evidence, as well as knowledge of prior research, we find that programmer practice 16 still relies on binary disclosure forms.Our interviews may indicate increasing acceptance of more inclusive disclosure forms, but this result may also be an artifact of the law of small numbers, bias in recruitment, or other biases in study techniques (see the "Limitations" portion of Subsection 3.2).

How do developers use user gender data?
The two studies speak most to the initial purposes of gender data.
Audience demographic analysis was both the most common use case found in our Reddit dataset and the most common use mentioned by interviewees.Artificial intelligence and machine learning uses of gender were the next most common use cases found on Reddit, followed by health or fitness applications.Interviewees more often mentioned dating applications, with only two interviewees mentioning artificial intelligence and machine learning as uses for gender data.These results suggest that gender data is used in a variety of situations, including multiple ways which may not be essential to the primary function of an application.Moreover, any utility gained by collecting gender to understand audience demographics, for example, is eroded when an overly narrow, inaccurate disclosure form is used.
3. What advice do developers receive about using gender data?This research question was primarily addressed using our Reddit data.From the (admittedly small) amount of discussion of gender disclosure and data use that we observed, users who requested advice received inclusive recommendations.Beyond the specific context of asking for advice, inclusive notions of gender seem to be viewed with hostility on Reddit.Overall, then, the treatment of gender data seems to be a topic that is largely not discussed by programmers in the communities we observed.This conclusion is consistent with the apparent uncertainty of many of our interviewees in answering some of our questions regarding gender.
Overall, then, while there are positive signs of more inclusive practices, mainly among our interviews, the public conversation and practice found on Reddit suggest that more work is necessary to improve how developers approach gender in software.The binary conception of gender seems to be dominant, at least on Reddit.This finding aligns with the experiences of non-binary individuals with gender representation in software and prior research on the topic [77,85,87].The common usage of gender data for audience demographics-seen in practice on Reddit and mentioned commonly by interviewees-suggests that developers view gender information as an important data point for categorizing users.This practice is almost certainly influenced by the demands of online monetization, with genderrelated code on Reddit including explicit references to advertising services.Prior literature [12,13] has identified how advertising considerations have influenced the design of gender categorization in social media services.Demographic classification along the lines of gender collapses the nuance inherent in one's identity, especially when only two categories are employed.

Towards Better Gender Programming Practices
How might developers handle gender data better?The answer in many cases may be to avoid encoding gender data altogether.Particularly in contexts where gender data is non-essential to the functioning of a system, this approach would help to avoid the pitfalls associated with gender categorization.For example-while behavioral advertising has its associated harms [15]-it makes far more sense to target ads based on purchasing habits than to target ads based on gender data.The former avoids the assumptions and stereotypes intrinsic to the latter.For social media platforms, it makes the most sense to only collect pronouns [77].As described by our expert participant, this need not prevent the formation of community based on gender identity via user-generated content.In the domain of computer vision, Scheuerman et al. recommend that image labeling systems "embrace gender ambiguity" and label gender-neutral features like "beard, makeup, dress" rather than attempting to force all humans into a false gender binary [78].
Importantly, the commitment to removing gender must be more than skin-deep.Removing users' option to disclose their gender identity while still categorizing them into a binary on the back end serves only to obfuscate how gender is "baked" into a system [13].Additionally, gender prediction is likely to exclude non-binary individuals and generally risks misgendering individuals [44,78,80].
It is not necessary to exclude gender from computing altogether.While there does not exist a single, all-encompassing approach to gender in computing, prior work on gender inclusivity in HCI research provides some guidance.As described in Section 2, gender is not a single concept but rather many overlapping ideas which vary with context Manuscript submitted to ACM and time [52].Even a well-designed gender disclosure form 17 is necessarily limited to its context.Developers should consider what definition of gender they are using and what measure best corresponds to this definition [52,79].Users should be given enough information to understand what is being asked of them and how it will be used [52,79].
Developers should not take a single measure out of context and extrapolate about other aspects of gender.For example, the pronouns one uses should not be used to make assumptions about one's gender identity as done on Facebook [13,79].
Similarly, if inclusive options are given, they should not be purely aesthetic.That is to say, it is not appropriate to collect non-binary gender data but then simply exclude those people from analysis or normal treatment for not fitting within the binary paradigm.Returning to the example of online advertising and social media, Facebook's choice to allow free-form gender disclosure is good, but it is undermined by the practice of essentially ignoring non-binary individuals in ad-targeting [13].
This approach to gender is more complicated than declaring bool gender and calling it a day.We freely confess that an inclusive approach will require more effort than any reductionist approach.Further, adoption will likely require those with power to advocate for those who are ignored by the binary gender paradigm.These hurdles do not excuse inaction, however, as all users deserve systems that do not deny their existence.To quote Spiel et al. "What we cannot do is simplify [gender] or say that non-binary inclusion is just an additional checkbox, or even that a one size fits all solution exists for a population whose very existence denies the idea of simple fixes or classifications.There is no easy, single answer here, but that the work is hard is not a reason to avoid" [87].
In addition to our recommendations for developers, we would also suggest that change is necessary in the area of software engineering education.While individuals may expand their understanding of inclusive practices through private study, course designers should consider explicitly discussing inclusive practices for gender data as part of their curricula. 18We understand that course time is limited, but by doing so, educators could help ensure that their students enter the workforce with a nuanced understanding of gender.At the very least, educators should avoid using exclusively binary gender forms in coursework or examples.On Reddit, we observed several examples of code from coursework employing exclusively binary gender disclosure forms.Such practices normalize cisgenderism and perpetuate the bad designs that motivated our research.
Finally, it is worth considering the role of large technology companies in determining how gender data is handled.In addition to being influential through their sheer prominence, the high degree of interdependence in modern software means that decisions made about gender data on a platform may have an inordinate impact on other systems.For example, Bivens et al. [13] identified how the central role of social media services in online advertising may offer them the ability to shape how advertisers view gender.In line with this conclusion, we identified five cases in our Reddit dataset wherein code-snippets used gender data pulled from Facebook or Instagram.While this finding in no way abrogates the responsibility of smaller companies and individual developers to handle gender appropriately, it does demonstrate how pushing more inclusive practices at the largest software industry players may result in a cascade of change.Systemic change is, of course, easier discussed than done, but we must fight for it if we are to make computing accessible for all.

Future Work
Before concluding, we would like to outline some potential directions for future research on gender and software development.While we feel our findings stand on their own, we encourage work that examines the same or similar questions on a larger scale-for example, through a widely distributed survey.Another option would be to systematically analyze practice through publicly available repositories of code.Looking at the open source community (e.g. using World of Code [64]) may also provide the ability to analyze how such decisions are made through the associated discussions [9].Commercially available software might be analyzed through static analysis.Such research would offer a wider perspective on how developers handle gender data.These methods may also be used to explore the secondary usage of gender data.While both our interview and Reddit study suggest that audience demographics are the primary purpose for gender data in software, this almost certainly is not the end of the story.Without further study, we cannot directly comment on how data may be reinterpreted into a binary paradigm despite being initially collected in more inclusive ways.
Lastly, it is important to investigate the best methods for increasing inclusive gender practices among developers.We have made some general suggestions in the preceding subsection about software engineering education, but deliberate research is necessary.One direction would be to create educational interventions which teach about issues of gender inclusivity as has been done with other issues of diversity, equity, and inclusion [57].Another fruitful line of research may be to develop tools that assist developers by finding gender inclusivity issues-just as technology is used to identify other accessibility issues [83].Importantly, this research must also be paired with advocacy and more explicit discussion of concepts of gender in computing. 19As long as the binary paradigm remains unchallenged, development practices will not change.

CONCLUSION
In this paper, we have presented a mixed-methods study investigating how developers request gender disclosure and use gender data.Through our analysis of both interviews and Reddit posts, we found that developers rarely discuss or receive advice on inclusive practices with regards to gender information.We also found that much of the code posted to Reddit uses a strict binary conception of gender.These failures contribute to the systematic, technological erasure of non-binary and other gender non-conforming individuals.Ultimately, we hope that our findings serve as a platform not only for increased scrutiny of how developers treat gender but also for deeper discussion on concepts of gender.Such work is essential in order to improve the lives of transgender, non-binary, and other gender non-conforming people.

Fig. 3 .
Fig. 3. Interviewee reactions to example gender disclosure forms and the ISO/IEC 5218 standard; frequency represents the number of interviewees who reacted in a particular way.
they focus on non-user facing aspects of an application Frontend Developer indicated that they focus on user-facing aspects of an application or web page Fullstack Developer indicated that they do both frontend and backend developthey had previously implemented applications that used gender data No Developer indicated that they had not previously implemented applications that used gender data Use of Gender Audience or Client Demographics Gender used to get insight about the makeup of their clients or application audience Diversity Protection Gender used to ensure users from diverse backgrounds receive equitable treatment Dating Gender used to coordinate preferences on dating applications or websites AI/ML Gender used for artificial intelligence or machine learning Healthcare Gender used for healthcare purposes, like intake or prescriptions 3rd Party Telemetry/Advertising Gender used to enable advertising functionality in applications or websites through a third party service like Google.Legal or Financial Purposes Gender used to enable legal or financial services, like banking or insurance.No Specific Purpose Gender collected without a specific purpose for its use Experience with Personal Data Yes Developer indicated that they had previously implemented applications that used a user's personal data No Developer indicated that they had not previously implemented applications that used a user's personal data Manuscript submitted to ACM

Table 1 .
Subreddits used for data collection, with number of subscribers on date of data collection, number of posts collected, and age of subreddit as reported by Reddit.

Table 2 .
Search terms used for data collection with number of posts collected.

Table 3 .
Overview of post statistics in the Reddit dataset.Fig. 2. Number of posts from each year in the Reddit dataset.

Table 4 .
Use cases of gender in the Reddit dataset.