Abstract
Despite the importance and pervasiveness of Wikipedia as one of the largest platforms for open knowledge, surprisingly little is known about how people navigate its content when seeking information. To bridge this gap, we present the first systematic large-scale analysis of how readers browse Wikipedia. Using billions of page requests from Wikipedia’s server logs, we measure how readers reach articles, how they transition between articles, and how these patterns combine into more complex navigation paths. We find that navigation behavior is characterized by highly diverse structures. Although most navigation paths are shallow, comprising a single pageload, there is much variety, and the depth and shape of paths vary systematically with topic, device type, and time of day. We show that Wikipedia navigation paths commonly mesh with external pages as part of a larger online ecosystem, and we describe how naturally occurring navigation paths are distinct from targeted navigation in lab-based settings. Our results further suggest that navigation is abandoned when readers reach low-quality pages. Taken together, these insights contribute to a more systematic understanding of readers’ information needs and allow for improving their experience on Wikipedia and the Web in general.
- [1] . 2014. The dynamics of repeat consumption. In Proceedings of the International World Wide Web Conference (WWW’14).Google Scholar
Digital Library
- [2] . 2021. Searching for Wikipedia. Retrieved October 13, 2021 from https://techblog.wikimedia.org/2021/06/07/search ing-for-wikipedia/. Accessed 25 January 2023.Google Scholar
- [3] . 2022. Wikipedia reader navigation: When synthetic data is enough. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM’22). Association for Computing Machinery, New York, NY, 16–26.Google Scholar
Digital Library
- [4] . 2007. Web navigation prediction using multiple evidence combination and domain knowledge. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37, 6 (2007), 1054–1062.Google Scholar
Digital Library
- [5] . 1989. The design of browsing and berrypicking techniques for the online search interface. Online Review.Google Scholar
Cross Ref
- [6] . 2016. Modeling user consumption sequences. In Proceedings of the International World Wide Web Conference (WWW’16).Google Scholar
Digital Library
- [7] . 2008. Mining the search trails of surfing crowds: Identifying relevant websites from user activity. In Proceedings of the 17th International Conference on World Wide Web. 51–60.Google Scholar
Digital Library
- [8] . 1945. As we may think. The Atlantic Monthly 176, 1 (1945), 101–108.Google Scholar
- [9] . 2001. Using information scent to model user information needs and actions and the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 490–497.Google Scholar
Digital Library
- [10] . 2012. Are web users really Markovian?. In Proceedings of the International World Wide WebConference (WWW’12).Google Scholar
Digital Library
- [11] . 2016. Extracting semantics from random walks on Wikipedia: Comparing learning and counting methods. In Proceedings of the Conference on Web and Social Media (ICWSM’16).Google Scholar
- [12] . 2004. Selective Markov models for predicting web page accesses. ACM Transactions on Internet Technology (TOIT) 4, 2 (2004), 163–184.Google Scholar
Digital Library
- [13] . 2018. Query for architecture, click through military: Comparing the roles of search and navigation on Wikipedia. In Proceedings of the Conference on Web Science (WebSci’18).Google Scholar
Digital Library
- [14] . 2017. What makes a link successful on Wikipedia?. In Proceedings of the International World Wide Web Conference (WWW’17).Google Scholar
Digital Library
- [15] . 2014. Lessons from the journey: A query log analysis of within-session learning. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. 223–232.Google Scholar
Digital Library
- [16] . 2019. Medium-term plan 2019: The model for engagement. Retrieved October 13, 2021 from https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Medium-term_plan_2019#The_model_for_engagement. Accessed: 25 January 2023.Google Scholar
- [17] . 2005. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS) 23, 2 (2005), 147–168.Google Scholar
Digital Library
- [18] . 2015. Improving web navigation usability by comparing actual and anticipated usage. IEEE Transactions on Human-Machine Systems 45, 1 (2015), 84–94.Google Scholar
Cross Ref
- [19] . 2018. Inspiration, captivation, and misdirection: Emergent properties in networks of online navigation. Complex Networks IX (2018), 271–282.Google Scholar
Cross Ref
- [20] . 2017. Interpolating quality dynamics in Wikipedia and demonstrating the Keilana effect. In Proceedings of the International Symposium on Open Collaboration (OpenSym’17).Google Scholar
Digital Library
- [21] . 2019. ORES: Lowering barriers with participatory machine learning in Wikipedia. In Proceedings of the Human-Computer Interaction (HCI’19).Google Scholar
- [22] . 2015. User session identification based on strong regularities in inter-activity time. In Proceedings of the International World Wide Web Conference (WWW’15).Google Scholar
Digital Library
- [23] . 2012. Analyzing user click paths in a Wikipedia navigation game. In Proceedings of the International Convention MIPRO.Google Scholar
- [24] . 2022. The most visited website in every country (that isn’t a search engine). https://www.hostinger.com/tutorials/the-most-visited-website-in-every-country.Google Scholar
- [25] . 2010. Parallel browsing behavior on the web. In Proceedings of the 21st ACM Conference on Hypertext and Hypermedia. 13–18.Google Scholar
Digital Library
- [26] . 2022. A comparison of dataset search behaviour of internal versus search engine referred sessions. In ACM SIGIR Conference on Human Information Interaction and Retrieval. 158–168.Google Scholar
Digital Library
- [27] . 2013. Mining search and browse logs for web search: A survey. ACM Transactions on Intelligent Systems and Technology (TIST) 4, 4 (2013), 1–37.Google Scholar
Digital Library
- [28] . 2020. Efficient web navigation prediction using hybrid models based on multiple evidence combinations. International Journal of Computers and Applications 42, 7 (2020), 715–728.Google Scholar
Cross Ref
- [29] . 2020. Global gender differences in Wikipedia readership. In Proceedings of the Conference on Web and Social Media (ICWSM’20).Google Scholar
- [30] . 2009. An integrated model for next page access prediction. International Journal of Knowledge and Web Intelligence 1, 1-2 (2009), 48–80.Google Scholar
Digital Library
- [31] . 2000. A comprehension-based model of web navigation and its application to web usability analysis. In People and Computers XIV—Usability or Else!Springer, 357–373.Google Scholar
- [32] . 2019. On the right track! Analysing and predicting navigation success in Wikipedia. In Proceedings of the Conference on Hypertext and Social Media (HT’19).Google Scholar
Digital Library
- [33] . 2021. Characterizing the online learning landscape: What and how people learn online. ACM Human-Computer Interaction 5, CSCW1 (
Feb. 2021), 19.Google Scholar - [34] . 2020. Web routineness and limits of predictability: Investigating demographic and behavioral differences using web tracking data. 15 (2021), 327–338.Google Scholar
- [35] . 2016. Evaluating and improving navigability of Wikipedia: A comparative study of eight language editions. In Proceedings of the International Symposium on Open Collaboration (OpenSym’16).Google Scholar
Digital Library
- [36] . 2017. How the structure of Wikipedia articles influences user navigation. New Review of Hypermedia and Multimedia 23, 1 (2017), 29–50.Google Scholar
Digital Library
- [37] . 2021. Meaningful measures of human society in the twenty-first century. Nature 595, 7866 (2021), 189–196.Google Scholar
Cross Ref
- [38] . 2014. The parable of Google flu: Traps in big data analysis. Science 343, 6176 (2014), 1203–1205.Google Scholar
Cross Ref
- [39] . 2014. Reader preferences and behavior on Wikipedia. In Proceedings of the Conference on Hypertext and Social Media (HT’14).Google Scholar
Digital Library
- [40] . 2019. Why the world reads Wikipedia: Beyond English speakers. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’19).Google Scholar
Digital Library
- [41] . 2021. Hunters, busybodies and the knowledge network building associated with deprivation curiosity. Nature Human Behaviour 5, 3 (2021), 327–336.Google Scholar
Cross Ref
- [42] . 2009. Semantic-rich Markov models for web prefetching. In Proceedings of the International Conference on Data Mining Workshops (ICDMW’09). IEEE, 465–470.Google Scholar
Digital Library
- [43] . 1983. The study of information: Interdisciplinary messages.Google Scholar
- [44] . 2020. Meta-research: Reader engagement with medical content on Wikipedia. Elife 9 (2020), e52426.Google Scholar
Cross Ref
- [45] . 2013. Invasion biology and the success of social collaboration networks, with application to Wikipedia. Israel Journal of Ecology and Evolution 59, 1 (2013), 17–26.Google Scholar
Cross Ref
- [46] . 2017. The substantial interdependence of Wikipedia and Google: A case study on the relationship between peer production communities and information technologies. In Proceedings of the Conference on Web and Social Media (ICWSM’17).Google Scholar
Cross Ref
- [47] . 2020. WikiHist.html: English Wikipedia’s full revision history in HTML format. In Proceedings of the Conference on Web and Social Media (ICWSM’20).Google Scholar
Cross Ref
- [48] . 2001. Transparent queries: Investigation users’ mental models of search engines. In Conference on Research & Development in Information Retrieval (SIGIR’01).Google Scholar
Digital Library
- [49] . 2015. Predicting user’s web navigation behavior using hybrid approach. Procedia Computer Science 45 (2015), 3–12.Google Scholar
Cross Ref
- [50] . 1977. Telling more than we can know: Verbal reports on mental processes. Psychological Review 84, 3 (1977), 231.Google Scholar
Cross Ref
- [51] . 2019. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data 2 (2019), 13.Google Scholar
Cross Ref
- [52] . 2016. Improving website hyperlink structure using server logs. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’16).Google Scholar
Digital Library
- [53] . 2018. Structuring Wikipedia articles with section recommendations. In Conference on Research & Development in Information Retrieval (SIGIR’18).Google Scholar
Digital Library
- [54] . 2020. Quantifying engagement with citations on Wikipedia. In Proceedings of the International World Wide Web Conference (WWW’20).Google Scholar
Digital Library
- [55] . 2021. On the value of Wikipedia as a gateway to the web. In Proceedings of the International World Wide Web Conference (WWW’21).Google Scholar
Digital Library
- [56] . 2021. Crosslingual topic modeling with wikiPDA. In Proceedings of the International World Wide Web Conference (WWW’21).Google Scholar
- [57] . 1999. Information foraging. Psychological Review 106, 4 (1999), 643.Google Scholar
Cross Ref
- [58] . 1999. Distributions of surfers’ paths through the world wide web: Empirical characterizations. World Wide Web 2, 1 (1999), 29–45.Google Scholar
Digital Library
- [59] . 2008. Model-driven formative evaluation of exploratory search: A study under a sensemaking framework. Information Processing & Management 44, 2 (2008), 534–555.Google Scholar
Digital Library
- [60] . 2020. A taxonomy of knowledge gaps for Wikimedia projects (second draft). (Aug. 2020). arXiv:2008.12314.Google Scholar
- [61] . 2017. Search strategies of Wikipedia readers. PloS One 12, 2 (Feb. 2017), 1–15.Google Scholar
- [62] . 2011. From slacktivism to activism: Participatory culture in the age of social media. In CHI’11 Extended Abstracts on Human Factors in Computing Systems.Google Scholar
- [63] . 2019. Bit by Bit: Social Research in the Digital Age. Princeton University Press.Google Scholar
- [64] . 2014. The last click: Why users give up information network navigation. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’14).Google Scholar
Digital Library
- [65] . 2018. The pipeline of online participation inequalities: The case of Wikipedia editing. The Journal of Communication 68, 1 (
Feb. 2018), 143–168.Google ScholarCross Ref
- [66] . 2017. Why we read Wikipedia. In Proceedings of the International World Wide Web Conference (WWW’17).Google Scholar
Digital Library
- [67] . 2013. Computing semantic relatedness from human navigational paths: A case study on Wikipedia. International Journal on Semantic Web and Information Systems 9, 4 (
Oct. 2013), 41–70.Google ScholarDigital Library
- [68] . 2010. Studying trailfinding algorithms for enhanced web search. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 443–450.Google Scholar
Digital Library
- [69] . 2010. Want to be retweeted? Large scale analytics on factors impacting retweet in twitter network. In 2010 IEEE Second International Conference on Social Computing. IEEE, 177–184.Google Scholar
Digital Library
- [70] . 1997. Revisitation patterns in world wide web navigation. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’97).Google Scholar
Digital Library
- [71] . 2019. Dwelling on Wikipedia: Investigating time spent by global encyclopedia readers. In Proceedings of the International Symposium on Open Collaboration (OpenSym’19).Google Scholar
Digital Library
- [72] . 2020. The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic. PLoS Computational Biology 16, 3 (
March 2020), e1007633.Google ScholarCross Ref
- [73] . 2021. A deeper investigation of the importance of Wikipedia links to search engine results. Proceedings of the ACM on Human-Computer Interacttion 5, CSCW1 (
April 2021), 1–15.Google ScholarDigital Library
- [74] . 2021. Measuring algorithmically infused societies. Nature 595, 7866 (2021), 197–204.Google Scholar
Cross Ref
- [75] . 2021. A survey on session-based recommender systems. ACM Computing Surveys 54, 7 (
July 2021), 1–38.Google ScholarDigital Library
- [76] . 2012. Automatic versus human navigation in information networks. In Proceedings of the Conference on Web and Social Media (ICWSM’12).Google Scholar
- [77] . 2012. Human wayfinding in information networks. In Proceedings of the International World Wide Web Conference (WWW’12).Google Scholar
Digital Library
- [78] . 2015. Mining missing hyperlinks from human navigation traces: A case study of Wikipedia. In Proceedings of the International World Wide Web Conference (WWW’15).Google Scholar
Digital Library
- [79] . 2009. Wikispeedia: An online game for inferring semantic distances between concepts. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’09).Google Scholar
- [80] . 2007. Studying the use of popular destinations to enhance web search interaction. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 159–166.Google Scholar
Digital Library
- [81] . 2007. Investigating behavioral variability in web search. In Proceedings of the International World Wide Web Conference (WWW’07). 21–30.Google Scholar
Digital Library
- [82] . 2010. Assessing the scenic route: Measuring the value of search trails in web logs. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 587–594.Google Scholar
Digital Library
- [83] . 1981. On user studies and information needs. Journal of Documentation 37, 1 (1981), 3–15.Google Scholar
Cross Ref
- [84] . 1997. Information behaviour: An interdisciplinary perspective. Information Processing & Management 33, 4 (1997), 551–572.Google Scholar
Digital Library
- [85] . 1999. Models in information behaviour research. Journal of Documentation 55, 3 (1999), 249–270.Google Scholar
Cross Ref
- [86] . 2015. Wikipedia clickstream. https://meta.wikimedia.org/wiki/Research:Wikiped ia_clickstream. Accessed 25 January 2023.Google Scholar
- [87] . 2010. Internet-based information-seeking behaviour amongst doctors and nurses: A short review of the literature. Health Information & Libraries Journal 27, 1 (2010), 2–10.Google Scholar
Cross Ref
- [88] . 2020. Content growth and attention contagion in information networks: Addressing information poverty on Wikipedia. Information Systems Research 31, 2 (
June 2020), 491–509.Google ScholarDigital Library
Index Terms
A Large-Scale Characterization of How Readers Browse Wikipedia
Recommendations
From DBpedia to Wikipedia: Filling the Gap by Discovering Wikipedia Conventions
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Many relations existing in DBpedia are missing in Wikipedia yielding up an information gap between the semantic web and the social web. Inserting these missing relations requires to automatically discover Wikipedia conventions. From pairs linked by a ...
Discovering Wikipedia Conventions Using DBpedia Properties
Revised Selected and Invited Papers of the International Workshop on Semantic Web Collaborative Spaces - Volume 9507Wikipedia is a public and universal encyclopedia where contributors edit articles collaboratively. Wikipedia infoboxes and categories have been used by semantic technologies to create DBpedia, a knowledge base that semantically describes Wikipedia ...
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...






Comments