ABSTRACT
Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.
To better understand the reasons behind this disparity, and offer support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic "gender guessers", based on cues derived from an individual's online presence, such as their name and profile picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online profile.
In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.
References
- S. Argamon, M. Koppel, J. Fine, and A. Shimoni. Gender, genre, and writing style in formal written texts. Text, pages 321--346, 8 2003.Google Scholar
- G. E. A. P. A. Batista, A. C. P. L. F. Carvalho, and M. C. Monard. Applying one-sided selection to unbalanced datasets. In Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence, pages 315--325, London, UK, UK, 2000. Springer-Verlag. Google Scholar
Digital Library
- C. Blevins and L. Mullen. Jane, John ... Leslie? a historical method for algorithmic gender prediction. Digital Humanities Quarterly, 9(3), 2015.Google Scholar
- A. Capiluppi, A. Serebrenik, and L. Singer. Assessing technical candidates on the social web. IEEE Software, 30(1):45--51, 2013. Google Scholar
Digital Library
- P. A. David and J. S. Shapiro. Community-based production of open-source software: What do we know about the developers who participate? Information Economics and Policy, 20(4):364--398, 2008.Google Scholar
Cross Ref
- B. for Labor Statistics. Employed persons by detailed occupation, sex, race, and hispanic or latino ethnicity, Feb. 2015.Google Scholar
- L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2(1):193--218.Google Scholar
Cross Ref
- X. Lu, H. Chen, and A. Jain. Multimodal facial gender and ethnicity identification. In Advances in Biometrics, volume 3832 of Lecture Notes in Computer Science, pages 554--561. Springer, 2005. Google Scholar
Digital Library
- A. Marder. Stack overflow badges and user behavior: an econometric approach. In Proceedings of the 12th Working Conference on Mining Software Repositories, pages 450--453. IEEE Press, 2015. Google Scholar
Digital Library
- A. Michailidou and A. Economides. Gender and diversity in collaborative virtual teams. Computer-Supported Collaborative Learning: Best practices and principles for instructors, pages 199--224, 2007.Google Scholar
- P. Morrison and E. Murphy-Hill. Is programming knowledge related to age? an exploration of stack overflow. In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on, pages 69--72. IEEE, 2013. Google Scholar
Digital Library
- D. Nafus. 'Patches don't have gender': What is not open in open source software. New Media & Society, 14(4):669--683, 2012.Google Scholar
Cross Ref
- G. Robles, L. Arjona-Reina, B. Vasilescu, A. Serebrenik, and J. M. Gonzalez-Barahona. FLOSS 2013: A survey dataset about free software contributors: challenges for curating, sharing, and combining. In 11th Working Conference on Mining Software Repositories, Data Track, MSR, pages 396--399. ACM, 2014. Google Scholar
Digital Library
- J. M. Santos and M. Embrechts. On the use of the adjusted rand index as a metric for evaluating supervised classification. In International Conference on Artificial Neural Networks, volume 5769 of LNCS, pages 175--184. Springer, 2009. Google Scholar
Digital Library
- J. Terrell, A. Kofink, J. Middleton, C. Rainear, E. Murphy-Hill, and C. Parnin. Gender bias in open source: Pull request acceptance of women versus men. PeerJ, (4:e1733v1):1--25, 2016. This report has not yet been peer-reviewed, and thus the findings should be considered preliminary.Google Scholar
- S. Turkle. The Second Self: Computers and the Human Spirit. MIT Press, 2005.Google Scholar
Cross Ref
- B. Vasilescu, A. Capiluppi, and A. Serebrenik. Gender, representation and online participation: A quantitative study. Interacting with Computers, 26(5):488--511, 2014.Google Scholar
Cross Ref
- B. Vasilescu, V. Filkov, and A. Serebrenik. Stack Overflow and GitHub: Associations between software development and crowdsourced knowledge. In International Conference on Social Computing, Social Computing, pages 188--195. IEEE, 2013. Google Scholar
Digital Library
- B. Vasilescu, D. Posnett, B. Ray, M. G. J. van den Brand, A. Serebrenik, P. Devanbu, and V. Filkov. Gender and tenure diversity in GitHub teams. In CHI Conference on Human Factors in Computing Systems, CHI, pages 3789--3798. ACM, 2015. Google Scholar
Digital Library
- B. Vasilescu, A. Serebrenik, and V. Filkov. A data set for social diversity studies of GitHub teams. In 12th Working Conference on Mining Software Repositories, Data Track, MSR, pages 514--517. IEEE, 2015. Google Scholar
Digital Library
- M. Veeningen, A. Piepoli, and N. Zannone. Are on-line personae really unlinkable? In J. Garcia-Alfaro, G. Lioudakis, N. Cuppens-Boulahia, S. Foley, and W. M. Fitzgerald, editors, Data Privacy Management and Autonomous Spontaneous Security, volume 8247 of LNCS, pages 369--379. Springer, 2014. Google Scholar
Digital Library
- R. T. A. Wood and M. D. Griffiths. Why swedish people play online poker and factors that can increase or decrease trust in poker web sites: A qualitative investigation. Journal of Gambling Issues, 21:80--97, 2008.Google Scholar
Cross Ref
- E. Zhou, Z. Cao, and Q. Yin. Naive-deep face recognition: Touching the limit of LFW benchmark or not? Technical report, Megvii, Inc., 2015.Google Scholar




Comments