10.1145/2901739.2901777acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Recognizing gender of stack overflow users

Published:14 May 2016Publication History

ABSTRACT

Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.

To better understand the reasons behind this disparity, and offer support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic "gender guessers", based on cues derived from an individual's online presence, such as their name and profile picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online profile.

In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.

References

  1. S. Argamon, M. Koppel, J. Fine, and A. Shimoni. Gender, genre, and writing style in formal written texts. Text, pages 321--346, 8 2003.Google ScholarGoogle Scholar
  2. G. E. A. P. A. Batista, A. C. P. L. F. Carvalho, and M. C. Monard. Applying one-sided selection to unbalanced datasets. In Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence, pages 315--325, London, UK, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Blevins and L. Mullen. Jane, John ... Leslie? a historical method for algorithmic gender prediction. Digital Humanities Quarterly, 9(3), 2015.Google ScholarGoogle Scholar
  4. A. Capiluppi, A. Serebrenik, and L. Singer. Assessing technical candidates on the social web. IEEE Software, 30(1):45--51, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. A. David and J. S. Shapiro. Community-based production of open-source software: What do we know about the developers who participate? Information Economics and Policy, 20(4):364--398, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. for Labor Statistics. Employed persons by detailed occupation, sex, race, and hispanic or latino ethnicity, Feb. 2015.Google ScholarGoogle Scholar
  7. L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2(1):193--218.Google ScholarGoogle ScholarCross RefCross Ref
  8. X. Lu, H. Chen, and A. Jain. Multimodal facial gender and ethnicity identification. In Advances in Biometrics, volume 3832 of Lecture Notes in Computer Science, pages 554--561. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Marder. Stack overflow badges and user behavior: an econometric approach. In Proceedings of the 12th Working Conference on Mining Software Repositories, pages 450--453. IEEE Press, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Michailidou and A. Economides. Gender and diversity in collaborative virtual teams. Computer-Supported Collaborative Learning: Best practices and principles for instructors, pages 199--224, 2007.Google ScholarGoogle Scholar
  11. P. Morrison and E. Murphy-Hill. Is programming knowledge related to age? an exploration of stack overflow. In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on, pages 69--72. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Nafus. 'Patches don't have gender': What is not open in open source software. New Media & Society, 14(4):669--683, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  13. G. Robles, L. Arjona-Reina, B. Vasilescu, A. Serebrenik, and J. M. Gonzalez-Barahona. FLOSS 2013: A survey dataset about free software contributors: challenges for curating, sharing, and combining. In 11th Working Conference on Mining Software Repositories, Data Track, MSR, pages 396--399. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. M. Santos and M. Embrechts. On the use of the adjusted rand index as a metric for evaluating supervised classification. In International Conference on Artificial Neural Networks, volume 5769 of LNCS, pages 175--184. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Terrell, A. Kofink, J. Middleton, C. Rainear, E. Murphy-Hill, and C. Parnin. Gender bias in open source: Pull request acceptance of women versus men. PeerJ, (4:e1733v1):1--25, 2016. This report has not yet been peer-reviewed, and thus the findings should be considered preliminary.Google ScholarGoogle Scholar
  16. S. Turkle. The Second Self: Computers and the Human Spirit. MIT Press, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  17. B. Vasilescu, A. Capiluppi, and A. Serebrenik. Gender, representation and online participation: A quantitative study. Interacting with Computers, 26(5):488--511, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  18. B. Vasilescu, V. Filkov, and A. Serebrenik. Stack Overflow and GitHub: Associations between software development and crowdsourced knowledge. In International Conference on Social Computing, Social Computing, pages 188--195. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Vasilescu, D. Posnett, B. Ray, M. G. J. van den Brand, A. Serebrenik, P. Devanbu, and V. Filkov. Gender and tenure diversity in GitHub teams. In CHI Conference on Human Factors in Computing Systems, CHI, pages 3789--3798. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Vasilescu, A. Serebrenik, and V. Filkov. A data set for social diversity studies of GitHub teams. In 12th Working Conference on Mining Software Repositories, Data Track, MSR, pages 514--517. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Veeningen, A. Piepoli, and N. Zannone. Are on-line personae really unlinkable? In J. Garcia-Alfaro, G. Lioudakis, N. Cuppens-Boulahia, S. Foley, and W. M. Fitzgerald, editors, Data Privacy Management and Autonomous Spontaneous Security, volume 8247 of LNCS, pages 369--379. Springer, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. T. A. Wood and M. D. Griffiths. Why swedish people play online poker and factors that can increase or decrease trust in poker web sites: A qualitative investigation. Journal of Gambling Issues, 21:80--97, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Zhou, Z. Cao, and Q. Yin. Naive-deep face recognition: Touching the limit of LFW benchmark or not? Technical report, Megvii, Inc., 2015.Google ScholarGoogle Scholar

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
    May 2016
    544 pages
    ISBN:9781450341868
    DOI:10.1145/2901739

    Copyright © 2016 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 May 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!