skip to main content
research-article

A Bayesian Method for Comparing Hypotheses About Human Trails

Authors Info & Claims
Published:23 June 2017Publication History
Skip Abstract Section

Abstract

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful, for example, for improving underlying network structures, predicting user clicks, or enhancing recommendations. In this work, we present a method called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our method utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to calculate the evidence of the data under them. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method, and to compare the relative plausibility of hypotheses, we employ Bayes factors. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including Web site navigation, business reviews, and online music played. Our work expands the repertoire of methods available for studying human trails.

References

  1. Dimitris Achlioptas. 2001. Database-friendly random projections. In Proceedings of the Symposium on Principles of Database Systems. ACM, New York, NY, 274--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jisun An, Daniele Quercia, and Jon Crowcroft. 2014. Partisan sharing: Facebook evidence and societal consequences. In Proceedings of the Conference on Online Social Networks. ACM, New York, NY, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Claudio Baccigalupo and Enric Plaza. 2006. Case-based sequential ordering of songs for playlist recommendation. In Proceedings of the European Conference on Case-Based Reasoning. 286--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google ScholarGoogle Scholar
  5. Martin Becker, Kathrin Borchert, Matthias Hirth, Hauke Mewes, Andreas Hotho, and Phuoc Tran-Gia. 2015. MicroTrails: Comparing hypotheses about task selection on a crowdsourcing platform. In Proceedings of the International Conference on Knowledge Technologies and Data-Driven Business. ACM, New York, NY, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Martin Becker, Hauke Mewes, Andreas Hotho, Dimitar Dimitrov, Florian Lemmerich, and Markus Strohmaier. 2016. SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails. Available at http://dmir.org/sparktrails/.Google ScholarGoogle Scholar
  7. Martin Becker, Philipp Singer, Florian Lemmerich, Andreas Hotho, Denis Helic, and Markus Strohmaier. 2015a. Photowalking the city: Comparing hypotheses about urban photo trails on Flickr. In Proceedings of the International Conference on Social Informatics. 227--244.Google ScholarGoogle ScholarCross RefCross Ref
  8. Martin Becker, Philipp Singer, Florian Lemmerich, Andreas Hotho, Denis Helic, and Markus Strohmaier. 2015b. VizTrails: An information visualization tool for exploring geographic movement trajectories. In Proceedings of the Conference on Hypertext and Social Media. ACM, New York, NY, 319--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tim Berners-Lee and Mark Fischetti. 2000. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. Harper Business. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mikhail Bilenko and Ryen W. White. 2008. Mining the search trails of surfing crowds: Identifying relevant Websites from user activity. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jose Borges and Mark Levene. 2000. Data mining of user navigation patterns. In Web Usage Analysis and User Profiling. Lecture Notes in Computer Science, Vol. 1836. Springer, 92--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the International Conference on World Wide Web. 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Duncan P. Brumby and Andrew Howes. 2004. Good enough but I’ll just check: Web-page search as attentional refocusing. In Proceedings of the International Conference on Cognitive Modeling. 46--51.Google ScholarGoogle Scholar
  14. Vannevar Bush. 1945. As we may think. Atlantic Monthly 176, 1, 101--108.Google ScholarGoogle Scholar
  15. John W. Byers, Michael Mitzenmacher, and Georgios Zervas. 2012. The Groupon effect on Yelp ratings: A root cause analysis. In Proceedings of the Conference on Electronic Commerce. ACM, New York, NY, 248--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lara D. Catledge and James E. Pitkow. 1995. Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27, 6, 1065--1073. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Celma. 2010. Music Recommendation and Discovery in the Long Tail. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Matthew Chalmers, Kerry Rodden, and Dominique Brodbeck. 1998. The order of things: Activity-centred information access. Computer Networks and ISDN Systems 30, 1, 359--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ed H. Chi, Peter L. T. Pirolli, Kim Chen, and James Pitkow. 2001. Using information scent to model user information needs and actions and the Web. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, New York, NY, 490--497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Flavio Chierichetti, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. 2012. Are Web users really Markovian? In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 609--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sanjoy Dasgupta and Anupam Gupta. 2003. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 22, 1, 60--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Cameron Davidson-Pilon. 2014. Probablistic Programming and Bayesian Methods for Hackers. Retrieved March 21, 2017, from http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian- Methods-for-Hackers/.Google ScholarGoogle Scholar
  23. Munmun De Choudhury, Moran Feldman, Sihem Amer-Yahia, Nadav Golbandi, Ronny Lempel, and Cong Yu. 2010. Automatic construction of travel itineraries using social breadcrumbs. In Proceedings of the Conference on Hypertext and Hypermedia. ACM, New York, NY, 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mukund Deshpande and George Karypis. 2004. Selective Markov models for predicting Web page accesses. ACM Transactions on Internet Technology 4, 2, 163--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lisette Espín-Noboa, Florian Lemmerich, Philipp Singer, and Markus Strohmaier. 2016. Discovering and characterizing mobility patterns in urban spaces: A study of Manhattan taxi data. In Proceedings of the International Conference on World Wide Web Companion. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Paul H. Garthwaite, Joseph B. Kadane, and Anthony O’Hagan. 2005. Statistical methods for eliciting probability distributions. Journal of the American Statistical Association 100, 470, 680--701.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. M. Gore. 1987. Biostatistics and the medical research council. Medical Research Council News 35, 19--20.Google ScholarGoogle Scholar
  28. Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. 1998. Strong regularities in World Wide Web surfing. Science 280, 5360, 95--97.Google ScholarGoogle Scholar
  29. Robert E. Kass and Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90, 430, 773--795.Google ScholarGoogle ScholarCross RefCross Ref
  30. Srivatsan Laxman, Vikram Tankasali, and Ryen W. White. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 453--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Lempel and S. Moran. 2000. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks 33, 1, 387--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. Very sparse random projections. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 287--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David J. C. MacKay. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Vol. 1. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Judith Masthoff. 2004. Group modeling: Selecting a sequence of television items to suit a group of viewers. In Personalized Digital Television. Springer, 93--141.Google ScholarGoogle Scholar
  36. Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, and Masatoshi Yoshikawa. 2012. Fast mining and forecasting of complex time-stamped events. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Theodor H. Nelson. 1965. Complex information processing: A file structure for the complex, the changing and the indeterminate. In Proceedings of the 20th National Conference (ACM’65). ACM, New York, NY, 84--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Oakley. 2010. Eliciting univariate probability distributions. In Rethinking Risk Measurement and Reporting, Vol. 1, K. Bocker (Ed.). Risk Books, London, UK, 155--178.Google ScholarGoogle Scholar
  39. Byron J. Pierce, Stanley R. Parkinson, and Norwood Sisson. 1992. Effects of semantic similarity, omission probability and number of alternatives in computer menu search. International Journal of Man-Machine Studies 37, 5, 653--677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Peter L. T. Pirolli and Stuart K. Card. 1999. Information foraging. Psychological Review 106, 4, 643--675.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Peter L. T. Pirolli and James E. Pitkow. 1999. Distributions of surfers’ paths through the World Wide Web: Empirical characterizations. World Wide Web 2, 1--2, 29--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Derek de Solla Price. 1976. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science 27, 5, 292--306.Google ScholarGoogle ScholarCross RefCross Ref
  43. Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM 8, 10, 627--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 5, 513--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Anna Samoilenko, Fariba Karimi, Daniel Edler, Jérôme Kunegis, and Markus Strohmaier. 2016. Linguistic neighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editing activity. In Proceedings of the International School and Conference on Network Science.Google ScholarGoogle Scholar
  46. Philipp Singer, Denis Helic, Andreas Hotho, and Markus Strohmaier. 2015. HypTrails: A Bayesian approach for comparing hypotheses about human trails on the Web. In Proceedings of the International Conference on World Wide Web. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Philipp Singer, Denis Helic, Behnam Taraghi, and Markus Strohmaier. 2014. Detecting memory and structure in human navigation patterns using Markov chain models of varying order. PloS One 9, 7, e102070.Google ScholarGoogle ScholarCross RefCross Ref
  48. Philipp Singer, Thomas Niebler, Markus Strohmaier, and Andreas Hotho. 2013. Computing semantic relatedness from human navigational paths: A case study on Wikipedia. International Journal on Semantic Web and Information Systems 9, 4, 41--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Roger W. Sinnott. 1984. Virtues of the Haversine. Sky and Telescope 68, 2, 158.Google ScholarGoogle Scholar
  50. Christopher C. Strelioff, James P. Crutchfield, and Alfred W. Hübler. 2007. Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Physical Review E 76, 1, 011106.Google ScholarGoogle Scholar
  51. Wolf Vanpaemel. 2010. Prior sensitivity in theory testing: An apologia for the Bayes factor. Journal of Mathematical Psychology 54, 6, 491--498.Google ScholarGoogle ScholarCross RefCross Ref
  52. Wolf Vanpaemel. 2011. Constructing informative model priors using hierarchical methods. Journal of Mathematical Psychology 55, 1, 106--117.Google ScholarGoogle ScholarCross RefCross Ref
  53. Wolf Vanpaemel and Michael D. Lee. 2012. Using priors to formalize theory: Optimal attention and the generalized context model. Psychonomic Bulletin and Review 19, 6, 1047--1056.Google ScholarGoogle ScholarCross RefCross Ref
  54. Simon Walk, Philipp Singer, Lisette Espín Noboa, Tania Tudorache, Mark A. Musen, and Markus Strohmaier. 2015. Understanding how users edit ontologies: Comparing hypotheses about four real-world projects. In Proceedings of the International Semantic Web Conference. 551--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Simon Walk, Philipp Singer, and Markus Strohmaier. 2014a. Sequential action patterns in collaborative ontology-engineering projects: A case-study in the biomedical domain. In Proceedings of the International Conference on Conference on Information and Knowledge Management. ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Simon Walk, Philipp Singer, Markus Strohmaier, Tania Tudorache, Mark A. Musen, and Natalya F. Noy. 2014b. Discovering beaten paths in collaborative ontology-engineering projects using Markov chains. Journal of Biomedical Informatics 51, 254--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Larry Wasserman. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 1, 92--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 619--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Robert West, Joelle Pineau, and Doina Precup. 2009. Wikispeedia: An online game for inferring semantic distances between concepts. In Proceedings of the International Joint Conference on Artificial Intelligence. 1598--1603. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ryen W. White and Jeff Huang. 2010. Assessing the scenic route: Measuring the value of search trails in Web logs. In Proceedings of the Conference on Research and Development in Information Retrieval. ACM, New York, NY, 587--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Wangang Xie, Paul O. Lewis, Yu Fan, Lynn Kuo, and Ming-Hui Chen. 2010. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology 60, 2, 150--160.Google ScholarGoogle ScholarCross RefCross Ref
  62. Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, and Nigam Shah. 2014. Finding progression stages in time-evolving event sequences. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 783--794. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Bayesian Method for Comparing Hypotheses About Human Trails

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!