skip to main content
research-article

Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction for Turkish Facebook Users

Published:26 May 2021Publication History
Skip Abstract Section

Abstract

Online Social Networks (OSNs) are very popular platforms for social interaction. Data posted publicly over OSNs pose various threats against the individual privacy of OSN users. Adversaries can try to predict private attribute values, such as gender, as well as links/connections. Quantifying an adversary’s capacity in inferring the gender of an OSN user is an important first step towards privacy protection. Numerous studies have been made on the problem of predicting the gender of an author/user, especially in the context of the English language. Conversely, studies in this field are quite limited for the Turkish language and specifically in the domain of OSNs. Previous studies for gender prediction of Turkish OSN users have mostly been performed by using the content of tweets and Facebook comments. In this article, we propose using various features, not just user comments, for the gender prediction problem over the Facebook OSN. Unlike existing studies, we exploited features extracted from profile, wall content, and network structure, as well as wall interactions of the user. Therefore, our study differs from the existing work in the broadness of the features considered, machine learning and deep learning methods applied, and the size of the OSN dataset used in the experimental evaluation. Our results indicate that basic profile information provides better results; moreover, using this information together with wall interactions improves prediction quality. We measured the best accuracy value as 0.982, which was obtained by combining profile data and wall interactions of Turkish OSN users. In the wall interactions model, we introduced 34 different features that provide better results than the existing content-based studies for Turkish.

References

  1. Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. DocBERT: BERT for document classification. arxiv:1904.08398Google ScholarGoogle Scholar
  2. Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in social media. ACM Transactions on the Web 6, 2 (2012), 1–33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ahmet Afsin Akın and Mehmet Dündar Akın. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure 10 (2007), 1–5.Google ScholarGoogle Scholar
  4. Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013a. Empirical evaluation of profile characteristics for gender classification on Twitter. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, Vol. 1. IEEE, Los Alamitos, CA, 365–369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013b. Language independent gender classification on Twitter. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, Los Alamitos, CA, 739–743. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Fatih Amasyalı and Banu Diri. 2006. Automatic Turkish text categorization in terms of author, genre and gender. In Proceedings of the International Conference on Application of Natural Language to Information Systems. 221–226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bassem Bsir and Mounir Zrigui. 2018a. Gender identification: A comparative study of deep learning architectures. In Proceedings of the International Conference on Intelligent Systems Design and Applications. 792–800.Google ScholarGoogle Scholar
  8. Bassem Bsir and Mounir Zrigui. 2018b. Enhancing deep learning gender identification with gated recurrent units architecture in social text. Computación y Sistemas 22, 3 (2018), 757–766.Google ScholarGoogle Scholar
  9. John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACM, New York, NY, 1301–1309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Özer Çelik and Ahmet Faruk Aslan. 2019. Gender prediction from social media comments with artificial intelligence. Sakarya Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23, 6 (2019), 1256–1264.Google ScholarGoogle Scholar
  11. Ming Cheung and James She. 2017. An analytic system for user gender identification through user shared images. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3 (2017), 1–20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Morgane Ciot, Morgan Sonderegger, and Derek Ruths. 2013. Gender inference of Twitter users in non-English contexts. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1136–1145.Google ScholarGoogle Scholar
  13. Onder Coban, Ali Inan, and Selma Ayse Ozel. 2020. Towards the design and implementation of an OSN crawler: A case of Turkish Facebook users. International Journal of Information Security Science 9, 2 (2020), 76–93.Google ScholarGoogle Scholar
  14. Andrea Corriga, Simone Cusimano, Francesca Malloci, Lodovica Marchesi, and Diego Reforgiato Recupero. 2018. Leveraging cognitive computing for gender and emotion detection. In Proceedings of the 4th Workshop on Sentic Computing, Sentiment Analysis, Opinion Mining, and Emotion Detection (EMSASW’18). 47–56.Google ScholarGoogle Scholar
  15. William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Gender identification on Twitter using the Modified Balanced Winnow. Communications and Network 4, 3 (2012), 189–195.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. arxiv:cs.CL/1810.04805Google ScholarGoogle Scholar
  17. Enfel Doğan. 2011. Türkiye Türkçesine Cinsiyet Kategorisinin İzleri. Journal of International Social Research 4, 17 (2011), 89–98.Google ScholarGoogle Scholar
  18. Mehwish Fatima, Komal Hasan, Saba Anwar, and Rao Muhammad Adeel Nawab. 2017. Multilingual author profiling on Facebook. Information Processing & Management 53, 4 (2017), 886–904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and Daniel Preoţiuc-Pietro. 2016. Analyzing biases in human perception of user age and gender from text. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 843–854.Google ScholarGoogle ScholarCross RefCross Ref
  20. Juliette Garside. 2015. Twitter puts trillions of tweets up for sale to data miners. The Guardian. Retrieved March 23, 2021 from https://www.theguardian.com/technology/2015/mar/18/twitter-puts-trillions-tweets-for-sale-data-miners.Google ScholarGoogle Scholar
  21. Daniel Gayo Avello. 2011. All liaisons are dangerous when all your friends are known to us. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia. ACM, New York, NY, 171–180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Orestis Giannakopoulos, Nikos Kalatzis, Ioanna Roussaki, and Symeon Papavassiliou. 2018. Gender recognition based on social networks for multimedia production. In Proceedings of the 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP’18). IEEE, Los Alamitos, CA, 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  23. Emma Graham-Harrison and Carole Cadwalladr. 2018. Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach. The Guardian. Retrieved March 23, 2021 from https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election.Google ScholarGoogle Scholar
  24. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 855–864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vishal Gupta and Gurpreet S. Lehal. 2009. A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence 1, 1 (2009), 60–76.Google ScholarGoogle Scholar
  26. Kyungsik Han, Yonggeol Jo, Youngseung Jeon, Bogoan Kim, Junho Song, and Sang-Wook Kim. 2018. Photos don’t have me, but how do you know me? Analyzing and predicting users on Instagram. In Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization. ACM, New York, NY, 251–256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ahmet Hayran and Mustafa Sert. 2017. Sentiment analysis on microblog data based on word embedding and fusion techniques. In Proceedings of the 2017 25th Signal Processing and Communications Applications Conference (SIU’17). IEEE, Los Alamitos, CA, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  28. Carter Jernigan and Behram F. T. Mistree. 2009. Gaydar: Facebook friendships expose sexual orientation. First Monday 14, 10 (2009). https://firstmonday.org/ojs/index.php/fm/article/download/2611/2302.Google ScholarGoogle Scholar
  29. Fariba Karimi, Claudia Wagner, Florian Lemmerich, Mohsen Jadidi, and Markus Strohmaier. 2016. Inferring gender from names on the web: A comparative evaluation of gender detection methods. In Proceedings of the 25th International Conference Companion on World Wide Web. 53–54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks. arxiv:cs.LG/1506.02078Google ScholarGoogle Scholar
  31. Amirmohammad Kazameini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, and Erik Cambria. 2020. Personality trait detection using bagged SVM over BERT word embedding ensembles. arxiv:cs.CL/2010.01309Google ScholarGoogle Scholar
  32. Jeremy Keeshin, Zach Galant, and David Kravitz. 2010. Machine Learning and Feature Based Approaches to Gender Classification of Facebook Statuses.Google ScholarGoogle Scholar
  33. Kazi Zainab Khanam, Gautam Srivastava, and Vijay Mago. 2020. The homophily principle in social network analysis. arxiv:cs.SI/2008.10383Google ScholarGoogle Scholar
  34. Ankush Khandelwal. 2019. Towards Identifying Humor and Author’s Gender in Code-Mixed Social Media Content. Ph.D. Dissertation. International Institute of Information Technology Hyderabad.Google ScholarGoogle Scholar
  35. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arxiv:cs.CL/1408.5882Google ScholarGoogle Scholar
  36. Gizem Korkmaz, Chris J. Kuhlman, Joshua Goldstein, and Fernando Vega-Redondo. 2020. A computational study of homophily and diffusion of common knowledge on social networks based on a model of Facebook. Social Network Analysis and Mining 10, 1 (2020), 5.Google ScholarGoogle ScholarCross RefCross Ref
  37. Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802–5805.Google ScholarGoogle Scholar
  38. Kamran Kowsari, Mojtaba Heidarysafa, Tolu Odukoya, Philip Potter, Laura E. Barnes, and Donald E. Brown. 2020. Gender detection on social networks using ensemble deep learning. In Proceedings of the Future Technologies Conference. 346–358.Google ScholarGoogle Scholar
  39. Tayfun Kucukyilmaz, B. Barla Cambazoglu, Cevdet Aykanat, and Fazli Can. 2006. Chat mining for gender prediction. In Proceedings of the International Conference on Advances in Information Systems. 274–283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.Google ScholarGoogle Scholar
  41. Yiou Lin, Hang Lei, Jia Wu, and Xiaoyu Li. 2015. An empirical study on sentiment classification of Chinese review using word embedding. arxiv:cs.CL/1511.01665Google ScholarGoogle Scholar
  42. Jack Lindamood, Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham. 2009. Inferring private information using social network data. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY, 1145–1146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zachary C. Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arxiv:cs.LG/1506.00019Google ScholarGoogle Scholar
  44. Wendy Liu and Derek Ruths. 2013. What’s in a name? Using first names as features for gender inference in Twitter. In Proceedings of the 2013 AAAI Spring Symposium Series. 10–16.Google ScholarGoogle Scholar
  45. Anshu Malhotra, Luam Totti, Wagner Meira Jr., Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, Los Alamitos, CA, 1065–1070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Saurav Manchanda and George Karypis. 2018. Distributed representation of multi-sense words: A loss driven approach. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 337–349.Google ScholarGoogle ScholarCross RefCross Ref
  47. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arxiv:cs.CL/1301.3781Google ScholarGoogle Scholar
  48. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Sergei Nicvist, Daria Bogatireva, and Victoria Bobivec. 2018. Tweet author gender identification, PAN 2016 task. In Proceedings of the International Conference on Telecommunications, Electronics, and Informatics. 344–347.Google ScholarGoogle Scholar
  50. Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. 2011. Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents. ACM, New York, NY, 37–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226–1238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Francisco Rangel and Paolo Rosso. 2013. On the identification of emotions and authors’ gender in Facebook comments on the basis of their writing style. In Proceedings of the International Workshop on Emotion and Sentiment in Social and Expressive Media. 34–46.Google ScholarGoogle Scholar
  53. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513–523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Lucia Santamaria and Helena Mihaljevic. 2018. Comparison and benchmark of name-to-gender inference services. PeerJ Computer Science 4 (2018), e156.Google ScholarGoogle ScholarCross RefCross Ref
  55. Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell, Michal Kosinski, Lyle Ungar, and H. Andrew Schwartz. 2014. Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1146–1151.Google ScholarGoogle Scholar
  56. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 8, 9 (2013), e73791.Google ScholarGoogle ScholarCross RefCross Ref
  58. Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019a. Gender prediction from Turkish tweets with neural networks. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU’19). IEEE, Los Alamitos, CA, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  59. Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019b. A Turkish dataset for gender identification of Twitter users. In Proceedings of the 13th Linguistic Annotation Workshop. 203–207.Google ScholarGoogle ScholarCross RefCross Ref
  60. Masoud Talebi and Cemal Köse. 2013. Identifying gender, age and education level by analyzing comments on Facebook. In Proceedings of the 2013 21st Signal Processing and Communications Applications Conference (SIU’13). IEEE, Los Alamitos, CA, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  61. Cong Tang, Keith Ross, Nitesh Saxena, and Ruichuan Chen. 2011. What’s in a name: A study of names, gender inference, and gender behavior in Facebook. In Proceedings of the International Conference on Database Systems for Advanced Applications. 344–356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Eric S. Tellez, Sabino Miranda-Jiménez, Daniela Moctezuma, Mario Graff, Vladimir Salgado, and José Ortiz-Bejar. 2018. Gender identification through multi-modal tweet analysis using MicroTC and bag of visual words. In Proceedings of the 9th International Conference of the CLEF Association (CLEF’18). http://ceur-ws.org/Vol-2125/.Google ScholarGoogle Scholar
  63. Murat Tezgider, Beytullah Yıldız, and Galip Aydın. 2018. Improving word representation by tuning Word2Vec parameters with deep learning model. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP’18). IEEE, New York, NY, 1–7.Google ScholarGoogle ScholarCross RefCross Ref
  64. Abinash Tripathy, Ankit Agrawal, and Santanu Kumar Rath. 2016. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications 57 (2016), 117–126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Mudasir Ahmad Wani, Nancy Agarwal, Suraiya Jabin, and Syed Zeeshan Hussai. 2018. Design and implementation of iMacros-based data crawler for behavioral analysis of Facebook users. arxiv:cs.SI/1802.09566Google ScholarGoogle Scholar
  66. Haifeng Wu, Qing Huang, Daqing Wang, and Lifu Gao. 2018. A CNN-SVM combined model for pattern recognition of knee motion using mechanomyography signals. Journal of Electromyography and Kinesiology 42 (2018), 136–142.Google ScholarGoogle ScholarCross RefCross Ref
  67. Dongwen Zhang, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications 42, 4 (2015), 1857–1863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.Google ScholarGoogle ScholarCross RefCross Ref
  69. Elena Zheleva and Lise Getoor. 2009. To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY, 531–540. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction for Turkish Facebook Users

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!