Abstract
When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful, for example, for improving underlying network structures, predicting user clicks, or enhancing recommendations. In this work, we present a method called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our method utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to calculate the evidence of the data under them. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method, and to compare the relative plausibility of hypotheses, we employ Bayes factors. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including Web site navigation, business reviews, and online music played. Our work expands the repertoire of methods available for studying human trails.
- Dimitris Achlioptas. 2001. Database-friendly random projections. In Proceedings of the Symposium on Principles of Database Systems. ACM, New York, NY, 274--281. Google Scholar
Digital Library
- Jisun An, Daniele Quercia, and Jon Crowcroft. 2014. Partisan sharing: Facebook evidence and societal consequences. In Proceedings of the Conference on Online Social Networks. ACM, New York, NY, 13--24. Google Scholar
Digital Library
- Claudio Baccigalupo and Enric Plaza. 2006. Case-based sequential ordering of songs for playlist recommendation. In Proceedings of the European Conference on Case-Based Reasoning. 286--300. Google Scholar
Digital Library
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google Scholar
- Martin Becker, Kathrin Borchert, Matthias Hirth, Hauke Mewes, Andreas Hotho, and Phuoc Tran-Gia. 2015. MicroTrails: Comparing hypotheses about task selection on a crowdsourcing platform. In Proceedings of the International Conference on Knowledge Technologies and Data-Driven Business. ACM, New York, NY, 10. Google Scholar
Digital Library
- Martin Becker, Hauke Mewes, Andreas Hotho, Dimitar Dimitrov, Florian Lemmerich, and Markus Strohmaier. 2016. SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails. Available at http://dmir.org/sparktrails/.Google Scholar
- Martin Becker, Philipp Singer, Florian Lemmerich, Andreas Hotho, Denis Helic, and Markus Strohmaier. 2015a. Photowalking the city: Comparing hypotheses about urban photo trails on Flickr. In Proceedings of the International Conference on Social Informatics. 227--244.Google Scholar
Cross Ref
- Martin Becker, Philipp Singer, Florian Lemmerich, Andreas Hotho, Denis Helic, and Markus Strohmaier. 2015b. VizTrails: An information visualization tool for exploring geographic movement trajectories. In Proceedings of the Conference on Hypertext and Social Media. ACM, New York, NY, 319--320. Google Scholar
Digital Library
- Tim Berners-Lee and Mark Fischetti. 2000. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. Harper Business. Google Scholar
Digital Library
- Mikhail Bilenko and Ryen W. White. 2008. Mining the search trails of surfing crowds: Identifying relevant Websites from user activity. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 51--60. Google Scholar
Digital Library
- Jose Borges and Mark Levene. 2000. Data mining of user navigation patterns. In Web Usage Analysis and User Profiling. Lecture Notes in Computer Science, Vol. 1836. Springer, 92--112. Google Scholar
Digital Library
- Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the International Conference on World Wide Web. 107--117. Google Scholar
Digital Library
- Duncan P. Brumby and Andrew Howes. 2004. Good enough but I’ll just check: Web-page search as attentional refocusing. In Proceedings of the International Conference on Cognitive Modeling. 46--51.Google Scholar
- Vannevar Bush. 1945. As we may think. Atlantic Monthly 176, 1, 101--108.Google Scholar
- John W. Byers, Michael Mitzenmacher, and Georgios Zervas. 2012. The Groupon effect on Yelp ratings: A root cause analysis. In Proceedings of the Conference on Electronic Commerce. ACM, New York, NY, 248--265. Google Scholar
Digital Library
- Lara D. Catledge and James E. Pitkow. 1995. Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27, 6, 1065--1073. Google Scholar
Digital Library
- O. Celma. 2010. Music Recommendation and Discovery in the Long Tail. Springer. Google Scholar
Digital Library
- Matthew Chalmers, Kerry Rodden, and Dominique Brodbeck. 1998. The order of things: Activity-centred information access. Computer Networks and ISDN Systems 30, 1, 359--367. Google Scholar
Digital Library
- Ed H. Chi, Peter L. T. Pirolli, Kim Chen, and James Pitkow. 2001. Using information scent to model user information needs and actions and the Web. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, New York, NY, 490--497. Google Scholar
Digital Library
- Flavio Chierichetti, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. 2012. Are Web users really Markovian? In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 609--618. Google Scholar
Digital Library
- Sanjoy Dasgupta and Anupam Gupta. 2003. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 22, 1, 60--65. Google Scholar
Digital Library
- Cameron Davidson-Pilon. 2014. Probablistic Programming and Bayesian Methods for Hackers. Retrieved March 21, 2017, from http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian- Methods-for-Hackers/.Google Scholar
- Munmun De Choudhury, Moran Feldman, Sihem Amer-Yahia, Nadav Golbandi, Ronny Lempel, and Cong Yu. 2010. Automatic construction of travel itineraries using social breadcrumbs. In Proceedings of the Conference on Hypertext and Hypermedia. ACM, New York, NY, 35--44. Google Scholar
Digital Library
- Mukund Deshpande and George Karypis. 2004. Selective Markov models for predicting Web page accesses. ACM Transactions on Internet Technology 4, 2, 163--184. Google Scholar
Digital Library
- Lisette Espín-Noboa, Florian Lemmerich, Philipp Singer, and Markus Strohmaier. 2016. Discovering and characterizing mobility patterns in urban spaces: A study of Manhattan taxi data. In Proceedings of the International Conference on World Wide Web Companion. Google Scholar
Digital Library
- Paul H. Garthwaite, Joseph B. Kadane, and Anthony O’Hagan. 2005. Statistical methods for eliciting probability distributions. Journal of the American Statistical Association 100, 470, 680--701.Google Scholar
Cross Ref
- S. M. Gore. 1987. Biostatistics and the medical research council. Medical Research Council News 35, 19--20.Google Scholar
- Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. 1998. Strong regularities in World Wide Web surfing. Science 280, 5360, 95--97.Google Scholar
- Robert E. Kass and Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90, 430, 773--795.Google Scholar
Cross Ref
- Srivatsan Laxman, Vikram Tankasali, and Ryen W. White. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 453--461. Google Scholar
Digital Library
- R. Lempel and S. Moran. 2000. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks 33, 1, 387--401. Google Scholar
Digital Library
- Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. Very sparse random projections. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 287--296. Google Scholar
Digital Library
- David J. C. MacKay. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge, UK. Google Scholar
Digital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Vol. 1. Cambridge University Press, Cambridge, UK. Google Scholar
Digital Library
- Judith Masthoff. 2004. Group modeling: Selecting a sequence of television items to suit a group of viewers. In Personalized Digital Television. Springer, 93--141.Google Scholar
- Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, and Masatoshi Yoshikawa. 2012. Fast mining and forecasting of complex time-stamped events. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 271--279. Google Scholar
Digital Library
- Theodor H. Nelson. 1965. Complex information processing: A file structure for the complex, the changing and the indeterminate. In Proceedings of the 20th National Conference (ACM’65). ACM, New York, NY, 84--100. Google Scholar
Digital Library
- J. Oakley. 2010. Eliciting univariate probability distributions. In Rethinking Risk Measurement and Reporting, Vol. 1, K. Bocker (Ed.). Risk Books, London, UK, 155--178.Google Scholar
- Byron J. Pierce, Stanley R. Parkinson, and Norwood Sisson. 1992. Effects of semantic similarity, omission probability and number of alternatives in computer menu search. International Journal of Man-Machine Studies 37, 5, 653--677. Google Scholar
Digital Library
- Peter L. T. Pirolli and Stuart K. Card. 1999. Information foraging. Psychological Review 106, 4, 643--675.Google Scholar
Digital Library
- Peter L. T. Pirolli and James E. Pitkow. 1999. Distributions of surfers’ paths through the World Wide Web: Empirical characterizations. World Wide Web 2, 1--2, 29--45. Google Scholar
Digital Library
- Derek de Solla Price. 1976. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science 27, 5, 292--306.Google Scholar
Cross Ref
- Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM 8, 10, 627--633. Google Scholar
Digital Library
- Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 5, 513--523. Google Scholar
Digital Library
- Anna Samoilenko, Fariba Karimi, Daniel Edler, Jérôme Kunegis, and Markus Strohmaier. 2016. Linguistic neighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editing activity. In Proceedings of the International School and Conference on Network Science.Google Scholar
- Philipp Singer, Denis Helic, Andreas Hotho, and Markus Strohmaier. 2015. HypTrails: A Bayesian approach for comparing hypotheses about human trails on the Web. In Proceedings of the International Conference on World Wide Web. Google Scholar
Digital Library
- Philipp Singer, Denis Helic, Behnam Taraghi, and Markus Strohmaier. 2014. Detecting memory and structure in human navigation patterns using Markov chain models of varying order. PloS One 9, 7, e102070.Google Scholar
Cross Ref
- Philipp Singer, Thomas Niebler, Markus Strohmaier, and Andreas Hotho. 2013. Computing semantic relatedness from human navigational paths: A case study on Wikipedia. International Journal on Semantic Web and Information Systems 9, 4, 41--70. Google Scholar
Digital Library
- Roger W. Sinnott. 1984. Virtues of the Haversine. Sky and Telescope 68, 2, 158.Google Scholar
- Christopher C. Strelioff, James P. Crutchfield, and Alfred W. Hübler. 2007. Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Physical Review E 76, 1, 011106.Google Scholar
- Wolf Vanpaemel. 2010. Prior sensitivity in theory testing: An apologia for the Bayes factor. Journal of Mathematical Psychology 54, 6, 491--498.Google Scholar
Cross Ref
- Wolf Vanpaemel. 2011. Constructing informative model priors using hierarchical methods. Journal of Mathematical Psychology 55, 1, 106--117.Google Scholar
Cross Ref
- Wolf Vanpaemel and Michael D. Lee. 2012. Using priors to formalize theory: Optimal attention and the generalized context model. Psychonomic Bulletin and Review 19, 6, 1047--1056.Google Scholar
Cross Ref
- Simon Walk, Philipp Singer, Lisette Espín Noboa, Tania Tudorache, Mark A. Musen, and Markus Strohmaier. 2015. Understanding how users edit ontologies: Comparing hypotheses about four real-world projects. In Proceedings of the International Semantic Web Conference. 551--568. Google Scholar
Digital Library
- Simon Walk, Philipp Singer, and Markus Strohmaier. 2014a. Sequential action patterns in collaborative ontology-engineering projects: A case-study in the biomedical domain. In Proceedings of the International Conference on Conference on Information and Knowledge Management. ACM, New York, NY. Google Scholar
Digital Library
- Simon Walk, Philipp Singer, Markus Strohmaier, Tania Tudorache, Mark A. Musen, and Natalya F. Noy. 2014b. Discovering beaten paths in collaborative ontology-engineering projects using Markov chains. Journal of Biomedical Informatics 51, 254--271. Google Scholar
Digital Library
- Larry Wasserman. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 1, 92--107. Google Scholar
Digital Library
- Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 619--628. Google Scholar
Digital Library
- Robert West, Joelle Pineau, and Doina Precup. 2009. Wikispeedia: An online game for inferring semantic distances between concepts. In Proceedings of the International Joint Conference on Artificial Intelligence. 1598--1603. Google Scholar
Digital Library
- Ryen W. White and Jeff Huang. 2010. Assessing the scenic route: Measuring the value of search trails in Web logs. In Proceedings of the Conference on Research and Development in Information Retrieval. ACM, New York, NY, 587--594. Google Scholar
Digital Library
- Wangang Xie, Paul O. Lewis, Yu Fan, Lynn Kuo, and Ming-Hui Chen. 2010. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology 60, 2, 150--160.Google Scholar
Cross Ref
- Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, and Nigam Shah. 2014. Finding progression stages in time-evolving event sequences. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 783--794. Google Scholar
Digital Library
Index Terms
A Bayesian Method for Comparing Hypotheses About Human Trails
Recommendations
HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web
WWW '15: Proceedings of the 24th International Conference on World Wide WebWhen users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that ...
SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebHypTrails is a bayesian approach for comparing different hypotheses about human trails on the web. While a standard implementation exists, it exposes performance issues when working with large-scale data. In this paper, we propose a distributed ...
Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebAs A/B testing gains wider adoption in the industry, more people begin to realize the limitations of the traditional frequentist null hypothesis statistical testing (NHST). The large number of search results for the query ``Bayesian A/B testing'' shows ...






Comments