Abstract
As the Internet grows in number of users and in the diversity of services, it becomes more influential on peoples lives. It has the potential of constructing or modifying the opinion, the mental perception, and the values of individuals. What is being created and published online is a reflection of people’s values and beliefs. As a global platform, the Internet is a great source of information for researching the online culture of many different countries. In this work we develop a methodology for measuring data from textual online sources using word embedding models, to create a country-based online human values index that captures cultural traits and values worldwide. Our methodology is applied with a dataset of 1.7 billion tweets, and then we identify their location among 59 countries. We create a list of 22 Online Values Inquiries (OVI), each one capturing different questions from the World Values Survey, related to several values such as religion, science, and abortion. We observe that our methodology is indeed capable of capturing human values online for different counties and different topics. We also show that some online values are highly correlated (up to c = 0.69, p < 0.05) with the corresponding offline values, especially religion-related ones. Our method is generic, and we believe it is useful for social sciences specialists, such as demographers and sociologists, that can use their domain knowledge and expertise to create their own Online Values Inquiries, allowing them to analyze human values in the online environment.
- [1] . 2018. Socioeconomic dependencies of linguistic patterns in Twitter: A multivariate analysis. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (
WWW’18 ). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1125–1134.DOI: DOI: https://doi.org/10.1145/3178876.3186011 Google ScholarCross Ref
- [2] . 2017. Large-scale physical activity data reveal worldwide activity inequality. Nature 547.
DOI: DOI: https://doi.org/10.1038/nature23018Google ScholarCross Ref
- [3] . 2014. Using Social Media to Measure Labor Market Flows.
Technical Report 20010. National Bureau of Economic Research.Google ScholarCross Ref
- [4] . 1998. Culture & Conflict Resolution. United States Institute of Peace Press.
98030951 https://books.google.com.br/books?id=OofmUheyGJAC.Google Scholar - [5] . 2019. Linking Twitter and survey data: The impact of survey mode and demographics on consent rates across three UK studies. Social Science Computer Review 0, 0 (2019), 0894439319828011.
DOI: DOI: https://doi.org/10.1177/0894439319828011arXiv:https://doi.org/10.1177/0894439319828011 Google Scholar - [6] . 2017. Digital hegemonies: The localness of search engine results. Annals of the American Association of Geographers 107, 5 (2017), 1194–1215.
DOI: DOI: https://doi.org/10.1080/24694452.2017.1308240arXiv:https://doi.org/10.1080/24694452.2017.1308240 Google ScholarCross Ref
- [7] . 2018. The geographic embedding of online echo chambers: Evidence from the Brexit campaign. PLOS ONE 13, 11, 1–16.
DOI: DOI: https://doi.org/10.1371/journal.pone.0206841Google ScholarCross Ref
- [8] . 2017. Enriching Word Vectors with Subword Information.
arxiv:1607.04606 [cs.CL]Google Scholar - [9] . 2011. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM.Google Scholar
- [10] . 2011. Twitter mood predicts the stock market. J. Comput. Science 2, 1 (2011), 1–8.Google Scholar
Cross Ref
- [11] . 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (
NIPS’16 ). Curran Associates Inc., USA, 4356–4364. http://dl.acm.org/citation.cfm?id=3157382.3157584 Google ScholarDigital Library
- [12] . 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334, 183–186.
DOI: DOI: https://doi.org/10.1126/science.aal4230Google ScholarCross Ref
- [13] . 2014. Understanding individuals’ personal values from social media word use. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (Baltimore, Maryland, USA) (
CSCW’14 ). ACM, New York, NY, USA, 405–414.DOI: DOI: https://doi.org/10.1145/2531602.2531608 Google ScholarCross Ref
- [14] . 2019. Twitter: Number of monthly active users 2010–2019. Statista. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/.
[Online: accessed 02-Feb-2020] .Google Scholar - [15] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.
DOI: DOI: https://doi.org/10.18653/v1/N19-1423Google Scholar - [16] . 2014. The Use of Twitter in 2013 Italian Political Election. Springer International Publishing, Cham, 63–77.
DOI: DOI: https://doi.org/10.1007/978-3-319-04666-2_5Google Scholar - [17] . 2019. Cultural divides and digital inequalities: Attitudes shaping Internet and social media divides. Information, Communication & Society 22, 1 (2019), 18–38.
DOI: DOI: https://doi.org/10.1080/1369118X.2017.1353640arXiv:https://doi.org/10.1080/1369118X.2017.1353640 Google ScholarCross Ref
- [18] . 2017. Using Twitter data to estimate the relationship between short-term mobility and long-term migration. In Proceedings of the 2017 ACM on Web Science Conference (Troy, New York, USA) (
WebSci’17 ). ACM, New York, NY, USA, 103–110.DOI: DOI: https://doi.org/10.1145/3091478.3091496Google ScholarCross Ref
- [19] . 2011. Whence differences in value priorities?: Individual, cultural, or artifactual sources. Journal of Cross-Cultural Psychology 42, 7 (2011), 1127–1144.
DOI: DOI: https://doi.org/10.1177/0022022110381429arXiv:https://doi.org/10.1177/0022022110381429 Google ScholarCross Ref
- [20] . 2014. Twitter Ain’t without frontiers: Economic, social, and cultural boundaries in international communication. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (Baltimore, Maryland, USA) (
CSCW’14 ). Association for Computing Machinery, New York, NY, USA, 1511–1522.DOI: DOI: https://doi.org/10.1145/2531602.2531725 Google ScholarCross Ref
- [21] . 2013. Cultural dimensions in Twitter: Time, individualism and power. In International AAAI Conference on Web and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6102.Google Scholar
- [22] . 2013. On the quest of discovering cultural trails in social media. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (Rome, Italy) (
WSDM’13 ). Association for Computing Machinery, New York, NY, USA, 747–752.DOI: DOI: https://doi.org/10.1145/2433396.2433490 Google ScholarCross Ref
- [23] . 2009. A walk in Facebook: Uniform sampling of users in online social networks. CoRR abs/0906.0060 (2009). arXiv:0906.0060 http://arxiv.org/abs/0906.0060.Google Scholar
- [24] . 2007. Euclidean embedding of co-occurrence data. J. Mach. Learn. Res. 8 (
Dec. 2007), 2265–2295. http://dl.acm.org/citation.cfm?id=1314498.1314572. Google ScholarDigital Library
- [25] . 1998. Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology 74, 6 (1998), 1464–80.
DOI: DOI: https://doi.org/10.1037/0022-3514.74.6.1464Google ScholarCross Ref
- [26] . 2019. World Internet Users and 2019 Population Stats. Internet World Stats. https://www.internetworldstats.com/stats.htm.
[Online: accessed 02-Feb-2020] .Google Scholar - [27] . 2017. From raw footprints to personal interests: Bridging the semantic gap via trip intention aggregation. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 123–126.
DOI: DOI: https://doi.org/10.1109/ICDE.2017.55Google ScholarCross Ref
- [28] . 2014. Geo-located Twitter as proxy for global mobility patterns. Cartography and Geographic Information Science 41 (2014), 260–271. Issue 3.Google Scholar
Cross Ref
- [29] . 2010. Cultures and Organizations: Software of the Mind, Third Edition. McGraw-Hill Education.
91000205 https://books.google.com.br/books?id=o4OqTgV3V00C.Google Scholar - [30] . 1997. Modernization and Postmodernization: Cultural, Economic, and Political Change in 43 Societies. Princeton University Press.
96053839 https://books.google.com.br/books?id=uERHzCu6l9EC.Google ScholarCross Ref
- [31] . 2000. Modernization, cultural change, and the persistence of traditional values. American Sociological Review 65, 1 (2000), 19–51. http://www.jstor.org/stable/2657288.Google Scholar
Cross Ref
- [32] . 2014. World Values Survey: Round Six - Country-Pooled Datafile 2010–2014. Madrid: JD Systems Institute.Google Scholar
- [33] . 1969. Essais de Linguistique Générale. Editions de Minuit. https://books.google.com.br/books?id=OZhHvgAACAAJ.Google Scholar
- [34] . 2019. Predicting demographics, moral foundations, and human values from digital behaviours. Computers in Human Behavior 92 (2019), 428–445.
DOI: DOI: https://doi.org/10.1016/j.chb.2018.11.024Google ScholarDigital Library
- [35] . 2014. Word embeddings through Hellinger PCA. In EACL, and (Eds.). The Association for Computer Linguistics, 482–490. http://www.aclweb.org/anthology/E14-1051.Google Scholar
- [36] . 1932. A Technique for the Measurement of Attitudes. Number Nº 136-165 in
A Technique for the Measurement of Attitudes . Publisher not identified.33012634 https://books.google.com.br/books?id=9rotAAAAYAAJ.Google Scholar - [37] . 2016. Sociology. Pearson; 16th edition. https://books.google.com.br/books?id=BbjRZR2MJuIC.Google Scholar
- [38] . 2012. New kid on the block: Exploring the Google+ social graph. In Proceedings of the 2012 ACM Internet Measurement Conference (Boston, Massachusetts, USA) (
IMC’12 ). ACM, New York, NY, USA, 159–170.DOI: DOI: https://doi.org/10.1145/2398776.2398794 Google ScholarCross Ref
- [39] . 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1301.html#abs-1301-3781.Google Scholar
- [40] . 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (
NIPS’13 ). Curran Associates Inc., USA, 3111–3119. http://dl.acm.org/citation.cfm?id=2999792.2999959. Google ScholarDigital Library
- [41] . 2007. What Makes Us Different and Similar: A New Interpretation of the World Values Survey and Other Cross-Cultural Data. Klasika y Stil Publishing House.Google Scholar
- [42] . 2019. Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor.
arxiv:1905.09866 [cs.CL]Google Scholar - [43] . 2010. From Tweets to polls: Linking text sentiment to public opinion time series. In ICWSM.Google Scholar
- [44] . 2019. The digital knowledge economy index: Mapping content production. The Journal of Development Studies 0, 0 (2019), 1–18.
DOI: DOI: https://doi.org/10.1080/00220388.2018.1554208arXiv:https://doi.org/10.1080/00220388.2018.1554208 Google Scholar - [45] . 2014. GloVe: Global vectors for word representation. In EMNLP, Vol. 14. 1532–1543.Google Scholar
- [46] . 2012. Quantifying the advantage of looking forward. Nature Scientific Reports 2 (2012), 350.Google Scholar
Cross Ref
- [47] . 2014. Mining urban deprivation from foursquare: Implicit crowdsourcing of city land use. IEEE Pervasive Computing 13, 2 (2014), 30–36.Google Scholar
Cross Ref
- [48] . 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45–50.Google Scholar
- [49] . 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (Melbourne, Australia) (
IMC’10 ). ACM, New York, NY, USA, 390–403.DOI: DOI: https://doi.org/10.1145/1879141.1879192 Google ScholarCross Ref
- [50] . 2006. An improved model of semantic similarity based on lexical co-occurence. Communications of the ACM 8 (2006), 627–633.Google Scholar
- [51] . 1973. The Nature of Human Values. Free Press.
lc72092870 https://books.google.com.br/books?id=TfRGAAAAMAAJ.Google Scholar - [52] . 1992. Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In Advances in Experimental Social Psychology, (Ed.). Vol. 25. Academic Press, 1–65.
DOI: DOI: https://doi.org/10.1016/S0065-2601(08)60281-6Google Scholar - [53] . 2019. Twitter reveals its daily active user numbers for the first time. The Washington Post. https://www.washingtonpost.com/technology/2019/02/07/twitter-reveals-its-daily-active-user-numbers-first-time/.
[Online: accessed 04-Jul-2019] .Google Scholar - [54] . 2014. You are what you eat (and drink): Identifying cultural boundaries by analyzing food and drink habits in foursquare. In Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1–4, 2014. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8113.Google Scholar
- [55] . 2015. Who Tweets with their location?: Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on Twitter. PloS One 10, 11 (
06 Nov 2015), e0142209–e0142209.DOI: DOI: https://doi.org/10.1371/journal.pone.0142209Google ScholarCross Ref
- [56] . 2008. Culturally Speaking: Culture, Communication and Politeness Theory. Continuum.
2008008309 https://books.google.com.br/books?id=aTOBAAAAMAAJ.Google Scholar - [57] . 2012. What is Culture?: A Compilation of Quotations. https://www2.warwick.ac.uk/fac/soc/al/globalpad/openhouse/interculturalskills/global_pad_-_what_is_culture.pdf.
Recommended .Google Scholar - [58] . 2015. Evaluating language identification performance. Twitter Engineering. https://blog.twitter.com/engineering/en_us/a/2015/evaluating-language-identification-performance.
[Online: accessed 09-Oct-2021] .Google Scholar - [59] . 2019. World Values Survey — Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=World%20Values%20Survey&oldid=885226660.
[Online: accessed 08-May-2019] .Google Scholar - [60] . 2020. Languages with official status in India — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Languages_with_official_status_in_India&oldid=938502640.
[Online: accessed 02-Feb-2020] .Google Scholar - [61] . 2020. South Africa — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=South_Africa&oldid=938819999.
[Online: accessed 02-Feb-2020] .Google Scholar - [62] . 2018. On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (
NIPS’18 ). Curran Associates Inc., Red Hook, NY, USA, 895–906. Google ScholarDigital Library
- [63] . 2015. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112, 4 (2015), 1036–1040.
DOI: DOI: https://doi.org/10.1073/pnas.1418680112arXiv:https://www.pnas.org/content/112/4/1036.full.pdf Google ScholarCross Ref
- [64] . 2014. Inferring international and internal migration patterns from Twitter data. In WWW (Companion Volume). 439–444. Google Scholar
Digital Library
- [65] . 2012. Predicting asset value through Twitter buzz. Advances in Intelligent and Soft Computing 113 (2012), 23–34.Google Scholar
Cross Ref
Index Terms
Measuring International Online Human Values with Word Embeddings
Recommendations
Mediating effect of self-acceptance between values and offline/online identity expressions among college students
We study how values and self-acceptance influence authenticity of identity expression.Our model identifies self-acceptance as mediator between values and identity.Only freedom influences authentic identity expression via self-acceptance.Desire to be ...
Finding Online Subcultures in Shared Meanings
This article describes a cluster analysis of affective sentiments assigned by a sample of 2,400 Internet users to a set of online-relevant concepts. The results of the cluster analysis, which identified five subcultures of the Internet population (...
How to measure the effectiveness of online advertising in online marketplaces
The online marketplace, in the form of an ''open market'' where a very large number of buyers and sellers participate, has occupied a rapidly increasing position in e-commerce, resulting in sellers' increasing investment in online advertising. Hence, ...






Comments