skip to main content
research-article

Measuring International Online Human Values with Word Embeddings

Published:22 December 2021Publication History
Skip Abstract Section

Abstract

As the Internet grows in number of users and in the diversity of services, it becomes more influential on peoples lives. It has the potential of constructing or modifying the opinion, the mental perception, and the values of individuals. What is being created and published online is a reflection of people’s values and beliefs. As a global platform, the Internet is a great source of information for researching the online culture of many different countries. In this work we develop a methodology for measuring data from textual online sources using word embedding models, to create a country-based online human values index that captures cultural traits and values worldwide. Our methodology is applied with a dataset of 1.7 billion tweets, and then we identify their location among 59 countries. We create a list of 22 Online Values Inquiries (OVI), each one capturing different questions from the World Values Survey, related to several values such as religion, science, and abortion. We observe that our methodology is indeed capable of capturing human values online for different counties and different topics. We also show that some online values are highly correlated (up to c = 0.69, p < 0.05) with the corresponding offline values, especially religion-related ones. Our method is generic, and we believe it is useful for social sciences specialists, such as demographers and sociologists, that can use their domain knowledge and expertise to create their own Online Values Inquiries, allowing them to analyze human values in the online environment.

REFERENCES

  1. [1] Abitbol Jacob Levy, Karsai Márton, Magué Jean-Philippe, Chevrot Jean-Pierre, and Fleury Eric. 2018. Socioeconomic dependencies of linguistic patterns in Twitter: A multivariate analysis. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 11251134. DOI: DOI: https://doi.org/10.1145/3178876.3186011 Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Althoff Tim, Sosič Rok, Hicks Jennifer, King Abby C., Delp Scott, and Leskovec Jure. 2017. Large-scale physical activity data reveal worldwide activity inequality. Nature 547. DOI: DOI: https://doi.org/10.1038/nature23018Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Antenucci Dolan, Cafarella Michael, Levenstein Margaret, Ré Christopher, and Shapiro Matthew D.. 2014. Using Social Media to Measure Labor Market Flows. Technical Report 20010. National Bureau of Economic Research.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Avruch K. and Peace United States Institute of. 1998. Culture & Conflict Resolution. United States Institute of Peace Press. 98030951 https://books.google.com.br/books?id=OofmUheyGJAC.Google ScholarGoogle Scholar
  5. [5] Baghal Tarek A. l., Sloan Luke, Jessop Curtis, Williams Matthew L., and Burnap Pete. 2019. Linking Twitter and survey data: The impact of survey mode and demographics on consent rates across three UK studies. Social Science Computer Review 0, 0 (2019), 0894439319828011. DOI: DOI: https://doi.org/10.1177/0894439319828011arXiv:https://doi.org/10.1177/0894439319828011Google ScholarGoogle Scholar
  6. [6] Ballatore Andrea, Graham Mark, and Sen Shilad. 2017. Digital hegemonies: The localness of search engine results. Annals of the American Association of Geographers 107, 5 (2017), 11941215. DOI: DOI: https://doi.org/10.1080/24694452.2017.1308240arXiv:https://doi.org/10.1080/24694452.2017.1308240Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bastos Marco, Mercea Dan, and Baronchelli Andrea. 2018. The geographic embedding of online echo chambers: Evidence from the Brexit campaign. PLOS ONE 13, 11, 116. DOI: DOI: https://doi.org/10.1371/journal.pone.0206841Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bojanowski Piotr, Grave Edouard, Joulin Armand, and Mikolov Tomas. 2017. Enriching Word Vectors with Subword Information. arxiv:1607.04606 [cs.CL]Google ScholarGoogle Scholar
  9. [9] Bollen Johan, Mao Huina, and Pepe Alberto. 2011. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM.Google ScholarGoogle Scholar
  10. [10] Bollen Johan, Mao Huina, and Zeng Xiao-Jun. 2011. Twitter mood predicts the stock market. J. Comput. Science 2, 1 (2011), 18.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Bolukbasi Tolga, Chang Kai-Wei, Zou James, Saligrama Venkatesh, and Kalai Adam. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS’16). Curran Associates Inc., USA, 43564364. http://dl.acm.org/citation.cfm?id=3157382.3157584 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Caliskan Aylin, Bryson Joanna J., and Narayanan Arvind. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334, 183186. DOI: DOI: https://doi.org/10.1126/science.aal4230Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Chen Jilin, Hsieh Gary, Mahmud Jalal U., and Nichols Jeffrey. 2014. Understanding individuals’ personal values from social media word use. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work &#38; Social Computing (Baltimore, Maryland, USA) (CSCW’14). ACM, New York, NY, USA, 405414. DOI: DOI: https://doi.org/10.1145/2531602.2531608 Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Clement J.. 2019. Twitter: Number of monthly active users 2010–2019. Statista. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/. [Online: accessed 02-Feb-2020].Google ScholarGoogle Scholar
  15. [15] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 41714186. DOI: DOI: https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle Scholar
  16. [16] Di Fraia Guido and Missaglia Maria Carlotta. 2014. The Use of Twitter in 2013 Italian Political Election. Springer International Publishing, Cham, 6377. DOI: DOI: https://doi.org/10.1007/978-3-319-04666-2_5Google ScholarGoogle Scholar
  17. [17] Dutton William H. and Reisdorf Bianca C.. 2019. Cultural divides and digital inequalities: Attitudes shaping Internet and social media divides. Information, Communication & Society 22, 1 (2019), 1838. DOI: DOI: https://doi.org/10.1080/1369118X.2017.1353640arXiv:https://doi.org/10.1080/1369118X.2017.1353640Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Fiorio Lee, Abel Guy, Cai Jixuan, Zagheni Emilio, Weber Ingmar, and Vinué Guillermo. 2017. Using Twitter data to estimate the relationship between short-term mobility and long-term migration. In Proceedings of the 2017 ACM on Web Science Conference (Troy, New York, USA) (WebSci’17). ACM, New York, NY, USA, 103110. DOI: DOI: https://doi.org/10.1145/3091478.3091496Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Fischer Ronald and Schwartz Shalom. 2011. Whence differences in value priorities?: Individual, cultural, or artifactual sources. Journal of Cross-Cultural Psychology 42, 7 (2011), 11271144. DOI: DOI: https://doi.org/10.1177/0022022110381429arXiv:https://doi.org/10.1177/0022022110381429Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] García-Gavilanes Ruth, Mejova Yelena, and Quercia Daniele. 2014. Twitter Ain’t without frontiers: Economic, social, and cultural boundaries in international communication. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (Baltimore, Maryland, USA) (CSCW’14). Association for Computing Machinery, New York, NY, USA, 15111522. DOI: DOI: https://doi.org/10.1145/2531602.2531725 Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Garcia-Gavilanes Ruth, Quercia Daniele, and Jaimes Alejandro. 2013. Cultural dimensions in Twitter: Time, individualism and power. In International AAAI Conference on Web and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6102.Google ScholarGoogle Scholar
  22. [22] Gavilanes Ruth Olimpia Garcia. 2013. On the quest of discovering cultural trails in social media. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (Rome, Italy) (WSDM’13). Association for Computing Machinery, New York, NY, USA, 747752. DOI: DOI: https://doi.org/10.1145/2433396.2433490 Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Gjoka Minas, Kurant Maciej, Butts Carter T., and Markopoulou Athina. 2009. A walk in Facebook: Uniform sampling of users in online social networks. CoRR abs/0906.0060 (2009). arXiv:0906.0060 http://arxiv.org/abs/0906.0060.Google ScholarGoogle Scholar
  24. [24] Globerson Amir, Chechik Gal, Pereira Fernando, and Tishby Naftali. 2007. Euclidean embedding of co-occurrence data. J. Mach. Learn. Res. 8 (Dec. 2007), 22652295. http://dl.acm.org/citation.cfm?id=1314498.1314572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Greenwald A. G., McGhee D. E., and Schwartz J. L. K. 1998. Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology 74, 6 (1998), 1464–80. DOI: DOI: https://doi.org/10.1037/0022-3514.74.6.1464Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Group Miniwatts Marketing. 2019. World Internet Users and 2019 Population Stats. Internet World Stats. https://www.internetworldstats.com/stats.htm. [Online: accessed 02-Feb-2020].Google ScholarGoogle Scholar
  27. [27] Guo L., Zhang D., Wu H., Cui B., and Tan K.. 2017. From raw footprints to personal interests: Bridging the semantic gap via trip intention aggregation. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 123126. DOI: DOI: https://doi.org/10.1109/ICDE.2017.55Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Hawelka Bartosz, Sitko Izabela, Beinat Euro, Sobolevsky Stanislav, Kazakopoulos Pavlos, and Ratti Carlo. 2014. Geo-located Twitter as proxy for global mobility patterns. Cartography and Geographic Information Science 41 (2014), 260271. Issue 3.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Hofstede G., Hofstede G. J., and Minkov M.. 2010. Cultures and Organizations: Software of the Mind, Third Edition. McGraw-Hill Education. 91000205 https://books.google.com.br/books?id=o4OqTgV3V00C.Google ScholarGoogle Scholar
  30. [30] Inglehart R.. 1997. Modernization and Postmodernization: Cultural, Economic, and Political Change in 43 Societies. Princeton University Press. 96053839 https://books.google.com.br/books?id=uERHzCu6l9EC.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Inglehart Ronald and Baker Wayne E.. 2000. Modernization, cultural change, and the persistence of traditional values. American Sociological Review 65, 1 (2000), 1951. http://www.jstor.org/stable/2657288.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Inglehart R., Haerpfer C., Moreno A., Welzel C., Kizilova K., Diez-Medrano J., Lagos M., Norris P., Ponarin E., Puranen B., et al. 2014. World Values Survey: Round Six - Country-Pooled Datafile 2010–2014. Madrid: JD Systems Institute.Google ScholarGoogle Scholar
  33. [33] Jakobson R. and Ruwet N.. 1969. Essais de Linguistique Générale. Editions de Minuit. https://books.google.com.br/books?id=OZhHvgAACAAJ.Google ScholarGoogle Scholar
  34. [34] Kalimeri Kyriaki, Beiró Mariano G., Delfino Matteo, Raleigh Robert, and Cattuto Ciro. 2019. Predicting demographics, moral foundations, and human values from digital behaviours. Computers in Human Behavior 92 (2019), 428445. DOI: DOI: https://doi.org/10.1016/j.chb.2018.11.024Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Lebret Rémi and Collobert Ronan. 2014. Word embeddings through Hellinger PCA. In EACL, Bouma Gosse and Parmentier Yannick (Eds.). The Association for Computer Linguistics, 482490. http://www.aclweb.org/anthology/E14-1051.Google ScholarGoogle Scholar
  36. [36] Likert R.. 1932. A Technique for the Measurement of Attitudes. Number Nº 136-165 in A Technique for the Measurement of Attitudes. Publisher not identified. 33012634 https://books.google.com.br/books?id=9rotAAAAYAAJ.Google ScholarGoogle Scholar
  37. [37] Macionis J. J.. 2016. Sociology. Pearson; 16th edition. https://books.google.com.br/books?id=BbjRZR2MJuIC.Google ScholarGoogle Scholar
  38. [38] Magno Gabriel, Comarela Giovanni, Saez-Trumper Diego, Cha Meeyoung, and Almeida Virgilio. 2012. New kid on the block: Exploring the Google+ social graph. In Proceedings of the 2012 ACM Internet Measurement Conference (Boston, Massachusetts, USA) (IMC’12). ACM, New York, NY, USA, 159170. DOI: DOI: https://doi.org/10.1145/2398776.2398794 Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1301.html#abs-1301-3781.Google ScholarGoogle Scholar
  40. [40] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS’13). Curran Associates Inc., USA, 31113119. http://dl.acm.org/citation.cfm?id=2999792.2999959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Minkov Michael. 2007. What Makes Us Different and Similar: A New Interpretation of the World Values Survey and Other Cross-Cultural Data. Klasika y Stil Publishing House.Google ScholarGoogle Scholar
  42. [42] Nissim Malvina, van Noord Rik, and van der Goot Rob. 2019. Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor. arxiv:1905.09866 [cs.CL]Google ScholarGoogle Scholar
  43. [43] O’Connor Brendan, Balasubramanyan Ramnath, Routledge Bryan R., and Smith Noah A.. 2010. From Tweets to polls: Linking text sentiment to public opinion time series. In ICWSM.Google ScholarGoogle Scholar
  44. [44] Ojanperä Sanna, Graham Mark, and Zook Matthew. 2019. The digital knowledge economy index: Mapping content production. The Journal of Development Studies 0, 0 (2019), 118. DOI: DOI: https://doi.org/10.1080/00220388.2018.1554208arXiv:https://doi.org/10.1080/00220388.2018.1554208Google ScholarGoogle Scholar
  45. [45] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. GloVe: Global vectors for word representation. In EMNLP, Vol. 14. 15321543.Google ScholarGoogle Scholar
  46. [46] Preis Tobias, Moat Helen Susannah, Stanley H. Eugene, and Bishop Steven R.. 2012. Quantifying the advantage of looking forward. Nature Scientific Reports 2 (2012), 350.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Quercia Daniele and Sáez-Trumper Diego. 2014. Mining urban deprivation from foursquare: Implicit crowdsourcing of city land use. IEEE Pervasive Computing 13, 2 (2014), 3036.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Řehůřek Radim and Sojka Petr. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 4550.Google ScholarGoogle Scholar
  49. [49] Ribeiro Bruno and Towsley Don. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (Melbourne, Australia) (IMC’10). ACM, New York, NY, USA, 390403. DOI: DOI: https://doi.org/10.1145/1879141.1879192 Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Rohde Douglas L. T., Gonnerman Laura M., and Plaut David C.. 2006. An improved model of semantic similarity based on lexical co-occurence. Communications of the ACM 8 (2006), 627633.Google ScholarGoogle Scholar
  51. [51] Rokeach M.. 1973. The Nature of Human Values. Free Press. lc72092870 https://books.google.com.br/books?id=TfRGAAAAMAAJ.Google ScholarGoogle Scholar
  52. [52] Schwartz Shalom H.. 1992. Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In Advances in Experimental Social Psychology, Zanna Mark P. (Ed.). Vol. 25. Academic Press, 165. DOI: DOI: https://doi.org/10.1016/S0065-2601(08)60281-6Google ScholarGoogle Scholar
  53. [53] Shaban Hamza. 2019. Twitter reveals its daily active user numbers for the first time. The Washington Post. https://www.washingtonpost.com/technology/2019/02/07/twitter-reveals-its-daily-active-user-numbers-first-time/. [Online: accessed 04-Jul-2019].Google ScholarGoogle Scholar
  54. [54] Silva Thiago H., Vaz de Melo Pedro O. S., Almeida Jussara M., Musolesi Mirco, and Loureiro Antonio A. F.. 2014. You are what you eat (and drink): Identifying cultural boundaries by analyzing food and drink habits in foursquare. In Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1–4, 2014. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8113.Google ScholarGoogle Scholar
  55. [55] Sloan Luke and Morgan Jeffrey. 2015. Who Tweets with their location?: Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on Twitter. PloS One 10, 11 (06 Nov 2015), e0142209–e0142209. DOI: DOI: https://doi.org/10.1371/journal.pone.0142209Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Spencer-Oatey H.. 2008. Culturally Speaking: Culture, Communication and Politeness Theory. Continuum. 2008008309 https://books.google.com.br/books?id=aTOBAAAAMAAJ.Google ScholarGoogle Scholar
  57. [57] Spencer-Oatey Helen. 2012. What is Culture?: A Compilation of Quotations. https://www2.warwick.ac.uk/fac/soc/al/globalpad/openhouse/interculturalskills/global_pad_-_what_is_culture.pdf. Recommended.Google ScholarGoogle Scholar
  58. [58] tm. 2015. Evaluating language identification performance. Twitter Engineering. https://blog.twitter.com/engineering/en_us/a/2015/evaluating-language-identification-performance. [Online: accessed 09-Oct-2021].Google ScholarGoogle Scholar
  59. [59] Wikipedia. 2019. World Values Survey — Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=World%20Values%20Survey&oldid=885226660. [Online: accessed 08-May-2019].Google ScholarGoogle Scholar
  60. [60] contributors Wikipedia. 2020. Languages with official status in India — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Languages_with_official_status_in_India&oldid=938502640. [Online: accessed 02-Feb-2020].Google ScholarGoogle Scholar
  61. [61] contributors Wikipedia. 2020. South Africa — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=South_Africa&oldid=938819999. [Online: accessed 02-Feb-2020].Google ScholarGoogle Scholar
  62. [62] Yin Zi and Shen Yuanyuan. 2018. On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 895906. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Youyou Wu, Kosinski Michal, and Stillwell David. 2015. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112, 4 (2015), 10361040. DOI: DOI: https://doi.org/10.1073/pnas.1418680112arXiv:https://www.pnas.org/content/112/4/1036.full.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  64. [64] Zagheni Emilio, Garimella Venkata Rama Kiran, Weber Ingmar, and State Bogdan. 2014. Inferring international and internal migration patterns from Twitter data. In WWW (Companion Volume). 439444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Zhang Xue, Gloor Hauke Fuehres, and Peter A.. 2012. Predicting asset value through Twitter buzz. Advances in Intelligent and Soft Computing 113 (2012), 2334.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Measuring International Online Human Values with Word Embeddings

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on the Web
          ACM Transactions on the Web  Volume 16, Issue 2
          May 2022
          148 pages
          ISSN:1559-1131
          EISSN:1559-114X
          DOI:10.1145/3506669
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 December 2021
          • Accepted: 1 November 2021
          • Revised: 1 October 2021
          • Received: 1 November 2020
          Published in tweb Volume 16, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!