Abstract
This article uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users.
First, we study the formation of stable distributions in tagging systems, seen as an implicit form of “consensus” reached by the users of the system around the tags that best describe a resource. We show that final tag frequencies for most resources converge to power law distributions and we propose an empirical method to examine the dynamics of the convergence process, based on the Kullback-Leibler divergence measure. The convergence analysis is performed for both the most utilized tags at the top of tag distributions and the so-called long tail.
Second, we study the information structures that emerge from collaborative tagging, namely tag correlation (or folksonomy) graphs. We show how community-based network techniques can be used to extract simple tag vocabularies from the tag correlation graphs by partitioning them into subsets of related tags. Furthermore, we also show, for a specialized domain, that shared vocabularies produced by collaborative tagging are richer than the vocabularies which can be extracted from large-scale query logs provided by a major search engine.
Although the empirical analysis presented in this article is based on a set of tagging data obtained from del.icio.us, the methods developed are general, and the conclusions should be applicable across other websites that employ tagging.
- Anderson, C. 2006. The Long Tail. Random House Business Books.Google Scholar
- Bar-Yam, Y. 2003. Dynamics of Complex Systems (Studies in Nonlinearity). Westview Press. Google Scholar
Digital Library
- Batagelj, V. and Mrvar, A. 1998. Pajek—A program for large network analysis. Connections 21, 47--57.Google Scholar
- Bateman, S., Brooks, C., McCalla, G., and Brusilovsky, P. 2007. Applying collaborative tagging to e-learning. In Proceedings of the Workshop on Tagging and Metadata for Social Information Organization (WWW'07).Google Scholar
- Boydell, O. and Smyth, B. 2006. Capturing community search expertise for personalized Web search using snippet-indexes. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'06). ACM Press, 1313--1314. Google Scholar
Digital Library
- Boydell, O. and Smyth, B. 2007. From social bookmarking to social summarization: An experiment in community-based summary generation. In Proceedings of the International Conference on Intelligent User Interfaces, 42--51. Google Scholar
Digital Library
- Brandes, U., Delling, D., Gaertler, M., Goerke, R., Hoefer, M., Nikoloski, Z., and Wagner, D. 2006. Maximizing modularity is hard. http://arxiv.org/abs/physics/0608255.Google Scholar
- Butterfield, S. 2004. Folksonomy. http://www.sylloge.com/personal/2004/08/folksonomy-social-classification-great.html.Google Scholar
- Cattuto, C., Loreto, V., and Pietronero, L. 2007. Semiotic dynamics and collaborative tagging. Proc. Nat. Acad. Sci. 104, 5, 1461--1464.Google Scholar
Cross Ref
- Chirita, P., Costache, S., Handschuh, S., and Nejdl, W. 2007. P-tag: Large scale automated generation of personalised annotation tags for the Web. In Proceeding of the 16th International World Wide Web Conference (WWW'07). ACM Press, 845--854. Google Scholar
Digital Library
- Cilibrasi, R. and Vitanyi, P. 2007. The google similarity distance. IEEE Trans. Knowl. Data Engin. 19, 3, 370--382. Google Scholar
Digital Library
- Dellschaft, K. and Staab, S. 2008. An epistemic dynamic model for tagging systems. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (HYPERTEXT'08). ACM Press, 71--80. Google Scholar
Digital Library
- Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghvan, P., and Tomkins, A. 2006. P-tag: Large scale automated generation of personalised annotation tags for the Web. In Proceeding of the 15th International World Wide Web Conference (WWW'06). ACM Press, 193--202.Google Scholar
- Gligorov, R., Aleksovski, Z., ten Cate, W., and van Harmelen, F. 2008. Using Google distance to weight approximate ontology matches. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 767--775. Google Scholar
Digital Library
- Golder, S. and Huberman, B. 2006. Usage patterns of collaborative tagging systems. J. Inform. Sci. 32, 2, 198--208. Google Scholar
Digital Library
- Halpin, H., Robu, V., and Shepherd, H. 2007. The complex dynamics of collaborative tagging. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 211--220. Google Scholar
Digital Library
- Halvey, M. and Keane, M. T. 2007. An assesment of tag presentation techniques. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 1313--1314. Google Scholar
Digital Library
- Hayes, C. and Avesani, P. 2007. Using tags and clustering to identify topic-relevant blogs. In Proceedings of the 1st International Conference on Weblogs and Social Media, N. Nicolov, N. Glance, E. Adar, M. Hurst, M. Liberman, J. H. Martin, and F. Salvetti, Eds. http://www.icwsm.org.Google Scholar
- Hearst, M. A. and Rosner, D. 2008. Tag clouds: Data analysis tools or social signaller? In Proceedings of the 41st Hawaii International Conference on System Sciences. IEEE. Google Scholar
Digital Library
- Heymann, P., Koutrika, G., and Garcia-Molina, H. 2008. Can social bookmarking improve search? In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). ACM Press, 195--205. Google Scholar
Digital Library
- http://labs.google.com/sets. 2008. Google sets. (Accessed September 2009).Google Scholar
- Jacob, E. 2004. Classification and categorization: A difference that makes a difference. Library Trends 52, 3, 515--540.Google Scholar
- Jin, R. K.-X., Parkes, D. C., and Wolfe, P. J. 2007. Analysis of bidding networks in eBay: Aggregate preference identification through community detection. In Proceedings of the AAAI Workshop on Plan, Activity and Intent Recognition (PAIR).Google Scholar
- Kaser, O. and Lemire, D. 2007. Tag-cloud drawing: Algorithms for cloud visualization. In Proceedings of the Workshop on Tagging and Metadata for Social Information Organization (WWW'07).Google Scholar
- Kuo, B. Y.-L., Hentrich, T., Good, B. M., and Wilkinson, M. D. 2007. Tag clouds for summarizing web search results. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 1203--1204. Google Scholar
Digital Library
- Manning, C. and Schutze, H. 2002. Foundations of Statistical Natural Language Processing. MIT Press, London. Google Scholar
Digital Library
- Marlow, C., Naaman, M., Boyd, D., and Davis, M. 2006. Position paper, tagging, taxonomy, flickr, article, toread. In Proceedings of the Collaborative Web Tagging Workshop at WWW'06.Google Scholar
- Mathes, A. 2004. Folksonomies: Cooperative classification and communication through shared metadata. http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html.Google Scholar
- Mika, P. 2005. Ontologies are us: A unified model of social networks and semantics. In Proceedings of the 4th International Semantic Web Conference (ISWC'05). Lecture Notes in Computer Science, vol. 3729, Springer. Google Scholar
Digital Library
- Mikroyannidis, A. 2007. Towards a social semantic Web. IEEE Comput. Mag., 113--115. Google Scholar
Digital Library
- Newman, M. 2005. Power laws, pareto distributions and Zipf's law. Contem. Phys. 46, 323--351.Google Scholar
Cross Ref
- Newman, M. E. J. 2004. Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133.Google Scholar
Cross Ref
- Newman, M. E. J. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113.Google Scholar
Cross Ref
- Rattenbury, T., Good, N., and Naaman, M. 2007. Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of SIGIR'07. Press, Ed. 103--110. Google Scholar
Digital Library
- Robu, V., Poutré, H. L., and Bohte, S. 2009. The complex dynamics of sponsored search markets. Agents and Data Mining Interaction. Lecture Notes in Computer Science, vol. 5680. Springer. Google Scholar
Digital Library
- Robu, V. and Poutré, J. A. L. 2006. Retrieving utility graphs used in multi-item negotiation through collaborative filtering. In Proceedings of RRS'06.Google Scholar
- Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International WWW Conference (WWW10). Google Scholar
Digital Library
- Sen, S., Lam, S. K., Rashid, A. M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F. M., and Riedl, J. 2006. Tagging, communities, vocabulary, evolution. In Proceedings of the 20th Conference on Computer Supported Cooperative Work (CSCW'06). ACM Press, 181--190. Google Scholar
Digital Library
- Shen, K. and Wu, L. 2005. Folksonomy as a complex network. http://arxiv.org/abs/cs.IR/0509072.Google Scholar
- Watts, D. and Strogatz, S. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684, 440--442.Google Scholar
Index Terms
Emergence of consensus and shared vocabularies in collaborative tagging systems
Recommendations
The complex dynamics of collaborative tagging
WWW '07: Proceedings of the 16th international conference on World Wide WebThe debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature ...
Usage patterns of collaborative tagging systems
Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. Recently, collaborative tagging has grown in popularity on the web, on sites that allow users to tag bookmarks, photographs and other ...
TagScore: Approximate Similarity Using Tag Synopses
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01Collaborative tagging is the aggregate effort by a community of online users to annotate web content with metadata labels called tags. It is a simple activity that enriches our knowledge about digital content, and has gained popularity with services ...






Comments