skip to main content
research-article

Emergence of consensus and shared vocabularies in collaborative tagging systems

Authors Info & Claims
Published:24 September 2009Publication History
Skip Abstract Section

Abstract

This article uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users.

First, we study the formation of stable distributions in tagging systems, seen as an implicit form of “consensus” reached by the users of the system around the tags that best describe a resource. We show that final tag frequencies for most resources converge to power law distributions and we propose an empirical method to examine the dynamics of the convergence process, based on the Kullback-Leibler divergence measure. The convergence analysis is performed for both the most utilized tags at the top of tag distributions and the so-called long tail.

Second, we study the information structures that emerge from collaborative tagging, namely tag correlation (or folksonomy) graphs. We show how community-based network techniques can be used to extract simple tag vocabularies from the tag correlation graphs by partitioning them into subsets of related tags. Furthermore, we also show, for a specialized domain, that shared vocabularies produced by collaborative tagging are richer than the vocabularies which can be extracted from large-scale query logs provided by a major search engine.

Although the empirical analysis presented in this article is based on a set of tagging data obtained from del.icio.us, the methods developed are general, and the conclusions should be applicable across other websites that employ tagging.

References

  1. Anderson, C. 2006. The Long Tail. Random House Business Books.Google ScholarGoogle Scholar
  2. Bar-Yam, Y. 2003. Dynamics of Complex Systems (Studies in Nonlinearity). Westview Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Batagelj, V. and Mrvar, A. 1998. Pajek—A program for large network analysis. Connections 21, 47--57.Google ScholarGoogle Scholar
  4. Bateman, S., Brooks, C., McCalla, G., and Brusilovsky, P. 2007. Applying collaborative tagging to e-learning. In Proceedings of the Workshop on Tagging and Metadata for Social Information Organization (WWW'07).Google ScholarGoogle Scholar
  5. Boydell, O. and Smyth, B. 2006. Capturing community search expertise for personalized Web search using snippet-indexes. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'06). ACM Press, 1313--1314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Boydell, O. and Smyth, B. 2007. From social bookmarking to social summarization: An experiment in community-based summary generation. In Proceedings of the International Conference on Intelligent User Interfaces, 42--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brandes, U., Delling, D., Gaertler, M., Goerke, R., Hoefer, M., Nikoloski, Z., and Wagner, D. 2006. Maximizing modularity is hard. http://arxiv.org/abs/physics/0608255.Google ScholarGoogle Scholar
  8. Butterfield, S. 2004. Folksonomy. http://www.sylloge.com/personal/2004/08/folksonomy-social-classification-great.html.Google ScholarGoogle Scholar
  9. Cattuto, C., Loreto, V., and Pietronero, L. 2007. Semiotic dynamics and collaborative tagging. Proc. Nat. Acad. Sci. 104, 5, 1461--1464.Google ScholarGoogle ScholarCross RefCross Ref
  10. Chirita, P., Costache, S., Handschuh, S., and Nejdl, W. 2007. P-tag: Large scale automated generation of personalised annotation tags for the Web. In Proceeding of the 16th International World Wide Web Conference (WWW'07). ACM Press, 845--854. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cilibrasi, R. and Vitanyi, P. 2007. The google similarity distance. IEEE Trans. Knowl. Data Engin. 19, 3, 370--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dellschaft, K. and Staab, S. 2008. An epistemic dynamic model for tagging systems. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (HYPERTEXT'08). ACM Press, 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghvan, P., and Tomkins, A. 2006. P-tag: Large scale automated generation of personalised annotation tags for the Web. In Proceeding of the 15th International World Wide Web Conference (WWW'06). ACM Press, 193--202.Google ScholarGoogle Scholar
  14. Gligorov, R., Aleksovski, Z., ten Cate, W., and van Harmelen, F. 2008. Using Google distance to weight approximate ontology matches. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 767--775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Golder, S. and Huberman, B. 2006. Usage patterns of collaborative tagging systems. J. Inform. Sci. 32, 2, 198--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Halpin, H., Robu, V., and Shepherd, H. 2007. The complex dynamics of collaborative tagging. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 211--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Halvey, M. and Keane, M. T. 2007. An assesment of tag presentation techniques. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 1313--1314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hayes, C. and Avesani, P. 2007. Using tags and clustering to identify topic-relevant blogs. In Proceedings of the 1st International Conference on Weblogs and Social Media, N. Nicolov, N. Glance, E. Adar, M. Hurst, M. Liberman, J. H. Martin, and F. Salvetti, Eds. http://www.icwsm.org.Google ScholarGoogle Scholar
  19. Hearst, M. A. and Rosner, D. 2008. Tag clouds: Data analysis tools or social signaller? In Proceedings of the 41st Hawaii International Conference on System Sciences. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Heymann, P., Koutrika, G., and Garcia-Molina, H. 2008. Can social bookmarking improve search? In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). ACM Press, 195--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. http://labs.google.com/sets. 2008. Google sets. (Accessed September 2009).Google ScholarGoogle Scholar
  22. Jacob, E. 2004. Classification and categorization: A difference that makes a difference. Library Trends 52, 3, 515--540.Google ScholarGoogle Scholar
  23. Jin, R. K.-X., Parkes, D. C., and Wolfe, P. J. 2007. Analysis of bidding networks in eBay: Aggregate preference identification through community detection. In Proceedings of the AAAI Workshop on Plan, Activity and Intent Recognition (PAIR).Google ScholarGoogle Scholar
  24. Kaser, O. and Lemire, D. 2007. Tag-cloud drawing: Algorithms for cloud visualization. In Proceedings of the Workshop on Tagging and Metadata for Social Information Organization (WWW'07).Google ScholarGoogle Scholar
  25. Kuo, B. Y.-L., Hentrich, T., Good, B. M., and Wilkinson, M. D. 2007. Tag clouds for summarizing web search results. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 1203--1204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Manning, C. and Schutze, H. 2002. Foundations of Statistical Natural Language Processing. MIT Press, London. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Marlow, C., Naaman, M., Boyd, D., and Davis, M. 2006. Position paper, tagging, taxonomy, flickr, article, toread. In Proceedings of the Collaborative Web Tagging Workshop at WWW'06.Google ScholarGoogle Scholar
  28. Mathes, A. 2004. Folksonomies: Cooperative classification and communication through shared metadata. http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html.Google ScholarGoogle Scholar
  29. Mika, P. 2005. Ontologies are us: A unified model of social networks and semantics. In Proceedings of the 4th International Semantic Web Conference (ISWC'05). Lecture Notes in Computer Science, vol. 3729, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mikroyannidis, A. 2007. Towards a social semantic Web. IEEE Comput. Mag., 113--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Newman, M. 2005. Power laws, pareto distributions and Zipf's law. Contem. Phys. 46, 323--351.Google ScholarGoogle ScholarCross RefCross Ref
  32. Newman, M. E. J. 2004. Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133.Google ScholarGoogle ScholarCross RefCross Ref
  33. Newman, M. E. J. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113.Google ScholarGoogle ScholarCross RefCross Ref
  34. Rattenbury, T., Good, N., and Naaman, M. 2007. Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of SIGIR'07. Press, Ed. 103--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Robu, V., Poutré, H. L., and Bohte, S. 2009. The complex dynamics of sponsored search markets. Agents and Data Mining Interaction. Lecture Notes in Computer Science, vol. 5680. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Robu, V. and Poutré, J. A. L. 2006. Retrieving utility graphs used in multi-item negotiation through collaborative filtering. In Proceedings of RRS'06.Google ScholarGoogle Scholar
  37. Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International WWW Conference (WWW10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sen, S., Lam, S. K., Rashid, A. M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F. M., and Riedl, J. 2006. Tagging, communities, vocabulary, evolution. In Proceedings of the 20th Conference on Computer Supported Cooperative Work (CSCW'06). ACM Press, 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shen, K. and Wu, L. 2005. Folksonomy as a complex network. http://arxiv.org/abs/cs.IR/0509072.Google ScholarGoogle Scholar
  40. Watts, D. and Strogatz, S. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684, 440--442.Google ScholarGoogle Scholar

Index Terms

  1. Emergence of consensus and shared vocabularies in collaborative tagging systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!