Abstract
Online Social Networks (OSNs) are very popular platforms for social interaction. Data posted publicly over OSNs pose various threats against the individual privacy of OSN users. Adversaries can try to predict private attribute values, such as gender, as well as links/connections. Quantifying an adversary’s capacity in inferring the gender of an OSN user is an important first step towards privacy protection. Numerous studies have been made on the problem of predicting the gender of an author/user, especially in the context of the English language. Conversely, studies in this field are quite limited for the Turkish language and specifically in the domain of OSNs. Previous studies for gender prediction of Turkish OSN users have mostly been performed by using the content of tweets and Facebook comments. In this article, we propose using various features, not just user comments, for the gender prediction problem over the Facebook OSN. Unlike existing studies, we exploited features extracted from profile, wall content, and network structure, as well as wall interactions of the user. Therefore, our study differs from the existing work in the broadness of the features considered, machine learning and deep learning methods applied, and the size of the OSN dataset used in the experimental evaluation. Our results indicate that basic profile information provides better results; moreover, using this information together with wall interactions improves prediction quality. We measured the best accuracy value as 0.982, which was obtained by combining profile data and wall interactions of Turkish OSN users. In the wall interactions model, we introduced 34 different features that provide better results than the existing content-based studies for Turkish.
- Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. DocBERT: BERT for document classification. arxiv:1904.08398Google Scholar
- Luca Maria Aiello, Alain Barrat, Rossano Schifanella, Ciro Cattuto, Benjamin Markines, and Filippo Menczer. 2012. Friendship prediction and homophily in social media. ACM Transactions on the Web 6, 2 (2012), 1–33. Google Scholar
Digital Library
- Ahmet Afsin Akın and Mehmet Dündar Akın. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure 10 (2007), 1–5.Google Scholar
- Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013a. Empirical evaluation of profile characteristics for gender classification on Twitter. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, Vol. 1. IEEE, Los Alamitos, CA, 365–369. Google Scholar
Digital Library
- Jalal S. Alowibdi, Ugo A. Buy, and Philip Yu. 2013b. Language independent gender classification on Twitter. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, Los Alamitos, CA, 739–743. Google Scholar
Digital Library
- M. Fatih Amasyalı and Banu Diri. 2006. Automatic Turkish text categorization in terms of author, genre and gender. In Proceedings of the International Conference on Application of Natural Language to Information Systems. 221–226. Google Scholar
Digital Library
- Bassem Bsir and Mounir Zrigui. 2018a. Gender identification: A comparative study of deep learning architectures. In Proceedings of the International Conference on Intelligent Systems Design and Applications. 792–800.Google Scholar
- Bassem Bsir and Mounir Zrigui. 2018b. Enhancing deep learning gender identification with gated recurrent units architecture in social text. Computación y Sistemas 22, 3 (2018), 757–766.Google Scholar
- John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACM, New York, NY, 1301–1309. Google Scholar
Digital Library
- Özer Çelik and Ahmet Faruk Aslan. 2019. Gender prediction from social media comments with artificial intelligence. Sakarya Üniversitesi Fen Bilimleri Enstitüsü Dergisi 23, 6 (2019), 1256–1264.Google Scholar
- Ming Cheung and James She. 2017. An analytic system for user gender identification through user shared images. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3 (2017), 1–20. Google Scholar
Digital Library
- Morgane Ciot, Morgan Sonderegger, and Derek Ruths. 2013. Gender inference of Twitter users in non-English contexts. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1136–1145.Google Scholar
- Onder Coban, Ali Inan, and Selma Ayse Ozel. 2020. Towards the design and implementation of an OSN crawler: A case of Turkish Facebook users. International Journal of Information Security Science 9, 2 (2020), 76–93.Google Scholar
- Andrea Corriga, Simone Cusimano, Francesca Malloci, Lodovica Marchesi, and Diego Reforgiato Recupero. 2018. Leveraging cognitive computing for gender and emotion detection. In Proceedings of the 4th Workshop on Sentic Computing, Sentiment Analysis, Opinion Mining, and Emotion Detection (EMSASW’18). 47–56.Google Scholar
- William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Gender identification on Twitter using the Modified Balanced Winnow. Communications and Network 4, 3 (2012), 189–195.Google Scholar
Cross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. arxiv:cs.CL/1810.04805Google Scholar
- Enfel Doğan. 2011. Türkiye Türkçesine Cinsiyet Kategorisinin İzleri. Journal of International Social Research 4, 17 (2011), 89–98.Google Scholar
- Mehwish Fatima, Komal Hasan, Saba Anwar, and Rao Muhammad Adeel Nawab. 2017. Multilingual author profiling on Facebook. Information Processing & Management 53, 4 (2017), 886–904. Google Scholar
Digital Library
- Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and Daniel Preoţiuc-Pietro. 2016. Analyzing biases in human perception of user age and gender from text. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 843–854.Google Scholar
Cross Ref
- Juliette Garside. 2015. Twitter puts trillions of tweets up for sale to data miners. The Guardian. Retrieved March 23, 2021 from https://www.theguardian.com/technology/2015/mar/18/twitter-puts-trillions-tweets-for-sale-data-miners.Google Scholar
- Daniel Gayo Avello. 2011. All liaisons are dangerous when all your friends are known to us. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia. ACM, New York, NY, 171–180. Google Scholar
Digital Library
- Orestis Giannakopoulos, Nikos Kalatzis, Ioanna Roussaki, and Symeon Papavassiliou. 2018. Gender recognition based on social networks for multimedia production. In Proceedings of the 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP’18). IEEE, Los Alamitos, CA, 1–5.Google Scholar
Cross Ref
- Emma Graham-Harrison and Carole Cadwalladr. 2018. Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach. The Guardian. Retrieved March 23, 2021 from https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 855–864. Google Scholar
Digital Library
- Vishal Gupta and Gurpreet S. Lehal. 2009. A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence 1, 1 (2009), 60–76.Google Scholar
- Kyungsik Han, Yonggeol Jo, Youngseung Jeon, Bogoan Kim, Junho Song, and Sang-Wook Kim. 2018. Photos don’t have me, but how do you know me? Analyzing and predicting users on Instagram. In Adjunct Publication of the 26th Conference on User Modeling, Adaptation and Personalization. ACM, New York, NY, 251–256. Google Scholar
Digital Library
- Ahmet Hayran and Mustafa Sert. 2017. Sentiment analysis on microblog data based on word embedding and fusion techniques. In Proceedings of the 2017 25th Signal Processing and Communications Applications Conference (SIU’17). IEEE, Los Alamitos, CA, 1–4.Google Scholar
Cross Ref
- Carter Jernigan and Behram F. T. Mistree. 2009. Gaydar: Facebook friendships expose sexual orientation. First Monday 14, 10 (2009). https://firstmonday.org/ojs/index.php/fm/article/download/2611/2302.Google Scholar
- Fariba Karimi, Claudia Wagner, Florian Lemmerich, Mohsen Jadidi, and Markus Strohmaier. 2016. Inferring gender from names on the web: A comparative evaluation of gender detection methods. In Proceedings of the 25th International Conference Companion on World Wide Web. 53–54. Google Scholar
Digital Library
- Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks. arxiv:cs.LG/1506.02078Google Scholar
- Amirmohammad Kazameini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, and Erik Cambria. 2020. Personality trait detection using bagged SVM over BERT word embedding ensembles. arxiv:cs.CL/2010.01309Google Scholar
- Jeremy Keeshin, Zach Galant, and David Kravitz. 2010. Machine Learning and Feature Based Approaches to Gender Classification of Facebook Statuses.Google Scholar
- Kazi Zainab Khanam, Gautam Srivastava, and Vijay Mago. 2020. The homophily principle in social network analysis. arxiv:cs.SI/2008.10383Google Scholar
- Ankush Khandelwal. 2019. Towards Identifying Humor and Author’s Gender in Code-Mixed Social Media Content. Ph.D. Dissertation. International Institute of Information Technology Hyderabad.Google Scholar
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. arxiv:cs.CL/1408.5882Google Scholar
- Gizem Korkmaz, Chris J. Kuhlman, Joshua Goldstein, and Fernando Vega-Redondo. 2020. A computational study of homophily and diffusion of common knowledge on social networks based on a model of Facebook. Social Network Analysis and Mining 10, 1 (2020), 5.Google Scholar
Cross Ref
- Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802–5805.Google Scholar
- Kamran Kowsari, Mojtaba Heidarysafa, Tolu Odukoya, Philip Potter, Laura E. Barnes, and Donald E. Brown. 2020. Gender detection on social networks using ensemble deep learning. In Proceedings of the Future Technologies Conference. 346–358.Google Scholar
- Tayfun Kucukyilmaz, B. Barla Cambazoglu, Cevdet Aykanat, and Fazli Can. 2006. Chat mining for gender prediction. In Proceedings of the International Conference on Advances in Information Systems. 274–283. Google Scholar
Digital Library
- Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.Google Scholar
- Yiou Lin, Hang Lei, Jia Wu, and Xiaoyu Li. 2015. An empirical study on sentiment classification of Chinese review using word embedding. arxiv:cs.CL/1511.01665Google Scholar
- Jack Lindamood, Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham. 2009. Inferring private information using social network data. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY, 1145–1146. Google Scholar
Digital Library
- Zachary C. Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arxiv:cs.LG/1506.00019Google Scholar
- Wendy Liu and Derek Ruths. 2013. What’s in a name? Using first names as features for gender inference in Twitter. In Proceedings of the 2013 AAAI Spring Symposium Series. 10–16.Google Scholar
- Anshu Malhotra, Luam Totti, Wagner Meira Jr., Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, Los Alamitos, CA, 1065–1070. Google Scholar
Digital Library
- Saurav Manchanda and George Karypis. 2018. Distributed representation of multi-sense words: A loss driven approach. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 337–349.Google Scholar
Cross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arxiv:cs.CL/1301.3781Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Google Scholar
Digital Library
- Sergei Nicvist, Daria Bogatireva, and Victoria Bobivec. 2018. Tweet author gender identification, PAN 2016 task. In Proceedings of the International Conference on Telecommunications, Electronics, and Informatics. 344–347.Google Scholar
- Claudia Peersman, Walter Daelemans, and Leona Van Vaerenbergh. 2011. Predicting age and gender in online social networks. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents. ACM, New York, NY, 37–44. Google Scholar
Digital Library
- Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226–1238. Google Scholar
Digital Library
- Francisco Rangel and Paolo Rosso. 2013. On the identification of emotions and authors’ gender in Facebook comments on the basis of their writing style. In Proceedings of the International Workshop on Emotion and Sentiment in Social and Expressive Media. 34–46.Google Scholar
- Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513–523. Google Scholar
Digital Library
- Lucia Santamaria and Helena Mihaljevic. 2018. Comparison and benchmark of name-to-gender inference services. PeerJ Computer Science 4 (2018), e156.Google Scholar
Cross Ref
- Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell, Michal Kosinski, Lyle Ungar, and H. Andrew Schwartz. 2014. Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1146–1151.Google Scholar
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681. Google Scholar
Digital Library
- H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, et al. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 8, 9 (2013), e73791.Google Scholar
Cross Ref
- Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019a. Gender prediction from Turkish tweets with neural networks. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU’19). IEEE, Los Alamitos, CA, 1–4.Google Scholar
Cross Ref
- Erhan Sezerer, Ozan Polatbilek, and Selma Tekir. 2019b. A Turkish dataset for gender identification of Twitter users. In Proceedings of the 13th Linguistic Annotation Workshop. 203–207.Google Scholar
Cross Ref
- Masoud Talebi and Cemal Köse. 2013. Identifying gender, age and education level by analyzing comments on Facebook. In Proceedings of the 2013 21st Signal Processing and Communications Applications Conference (SIU’13). IEEE, Los Alamitos, CA, 1–4.Google Scholar
Cross Ref
- Cong Tang, Keith Ross, Nitesh Saxena, and Ruichuan Chen. 2011. What’s in a name: A study of names, gender inference, and gender behavior in Facebook. In Proceedings of the International Conference on Database Systems for Advanced Applications. 344–356. Google Scholar
Digital Library
- Eric S. Tellez, Sabino Miranda-Jiménez, Daniela Moctezuma, Mario Graff, Vladimir Salgado, and José Ortiz-Bejar. 2018. Gender identification through multi-modal tweet analysis using MicroTC and bag of visual words. In Proceedings of the 9th International Conference of the CLEF Association (CLEF’18). http://ceur-ws.org/Vol-2125/.Google Scholar
- Murat Tezgider, Beytullah Yıldız, and Galip Aydın. 2018. Improving word representation by tuning Word2Vec parameters with deep learning model. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP’18). IEEE, New York, NY, 1–7.Google Scholar
Cross Ref
- Abinash Tripathy, Ankit Agrawal, and Santanu Kumar Rath. 2016. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications 57 (2016), 117–126. Google Scholar
Digital Library
- Mudasir Ahmad Wani, Nancy Agarwal, Suraiya Jabin, and Syed Zeeshan Hussai. 2018. Design and implementation of iMacros-based data crawler for behavioral analysis of Facebook users. arxiv:cs.SI/1802.09566Google Scholar
- Haifeng Wu, Qing Huang, Daqing Wang, and Lifu Gao. 2018. A CNN-SVM combined model for pattern recognition of knee motion using mechanomyography signals. Journal of Electromyography and Kinesiology 42 (2018), 136–142.Google Scholar
Cross Ref
- Dongwen Zhang, Hua Xu, Zengcai Su, and Yunfeng Xu. 2015. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications 42, 4 (2015), 1857–1863. Google Scholar
Digital Library
- Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.Google Scholar
Cross Ref
- Elena Zheleva and Lise Getoor. 2009. To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY, 531–540. Google Scholar
Digital Library
Index Terms
Facebook Tells Me Your Gender: An Exploratory Study of Gender Prediction for Turkish Facebook Users
Recommendations
Gender stereotypes in Facebook profiles
Social Networking Sites (SNS) provide a platform for young people to present themselves as they would like to be seen by others. Several authors have argued that females and males are subjected to differing sorts of temptation to present altered images ...
Facebook use and depressive symptomatology
We examined the impact of Facebook use, personality, and sex on depressive symptoms.We found no direct link between Facebook use and depressive symptoms.Facebook use among high neuroticism females predicted lower depressive symptoms. The popularity of ...
Building social capital with Facebook: Type of network, availability of other media, and social self-efficacy matter#
Highlights- Type of friends affects building social capital via Facebook and traditional media.
AbstractFindings about Facebook's effect on relationships are mixed, possibly due to lack of models that acknowledge differences across users, types of their friends, and use of competing media. To address this, we proposed and tested how ...






Comments