Abstract
The rapidly increasing amount of user-generated content in social tagging systems provides a huge source of information. Yet, performing effective search in these systems is very challenging, especially when we seek the most appropriate items that match a potentially ambiguous query. Collaborative filtering-based personalization is appealing in this context, as it limits the search within a small network of participants with similar preferences. Offline personalization, which consists in maintaining, for every user, a network of similar participants based on their tagging behaviors, is effective for queries that are close to the querying user’s tagging profile but performs poorly when the queries, reflecting emerging interests, have little correlation with the querying user’s profile.
We present P2TK2, the first protocol to personalize query processing in social tagging systems online. P2TK2 is completely decentralized, and this design choice stems from the observation that the evolving social tagging systems naturally resemble P2P systems where users are both producers and consumers. This design exploits the power of the crowd and prevents any central authority from controlling personal information. P2TK2 is gossip-based and probabilistic. It dynamically associates each user with social acquaintances sharing similar tagging behaviors. Appropriate users for answering a query are discovered at query time with the help of social acquaintances. This is achieved according to the hybrid interest of the querying user, taking into account both her tagging behavior and her query. Results are iteratively refined and returned to the querying user. We evaluate P2TK2 on CiteULike and Delicious traces involving up to 50,000 users. We highlight the advantages of online personalization compared to offline personalization, as well as its efficiency, scalability, and inherent ability to cope with user departure and interest evolution in P2P systems.
- Sihem Amer-Yahia, Michael Benedikt, Laks V. S. Lakshmanan, and Julia Stoyanovich. 2008. Efficient network aware search in collaborative tagging sites. Proc. VLDB Endow. 1, 1, 710--721. Google Scholar
Digital Library
- Ricardo Baeza-Yates, Aristides Gionis, Flavio P. Junqueira, Vanessa Murdock, Vassilis Plachouras, and Fabrizio Silvestri. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, Article 20, 28. Google Scholar
Digital Library
- Xiao Bai, Marin Bertier, Rachid Guerraoui, Anne-Marie Kermarrec, and Vincent Leroy. 2010. Gossiping personalized queries. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, New York, NY, 87--98. Google Scholar
Digital Library
- Xiao Bai, Rachid Guerraoui, Anne-Marie Kermarrec, and Vincent Leroy. 2011. Collaborative personalized top-k processing. ACM Trans. Datab. Syst. 36, 4, Article 26. Google Scholar
Digital Library
- Xiao Bai and Flavio P. Junqueira. 2012. Online result cache invalidation for real-time Web search. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, 641--650. Google Scholar
Digital Library
- Robert M. Bell and Yehuda Koren. 2007. Improved neighborhood-based collaborative filtering. In Proceedings of the 1st KDDCup’07.Google Scholar
- Matthias Bender, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Josiane Xavier Parreira, and Gerhard Weikum. 2007. Peer-to-peer information search: Semantic, social, or spiritual? IEEE Data Eng. Bull. 30, 2, 51--60.Google Scholar
- Pei Cao and Sandy Irani. 1997. Cost-aware WWW proxy caching algorithms. In Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems (USITS’97). USENIX Association, Berkeley, CA, 193--206. Google Scholar
Digital Library
- Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. Wiley-Interscience, New York, NY. Google Scholar
Digital Library
- Zhicheng Dou, Ruihua Song, and Ji-Rong Wen. 2007. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY, 581--590. Google Scholar
Digital Library
- Patrick T. Eugster, Rachid Guerraoui, Anne-Marie Kermarrec, and Laurent Massoulié. 2004. Epidemic information dissemination in distributed systems. Computer 37, 5, 60--67. Google Scholar
Digital Library
- Ronald Fagin. 2002. Combining fuzzy information: An overview. SIGMOD Rec. 31, 2, 109--118. Google Scholar
Digital Library
- Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2008. Can social bookmarking improve Web search? In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM’08). ACM, New York, NY, 195--206. Google Scholar
Digital Library
- M. Jelasity, W. Kowalczyk, and M. van Steen. 2004. An approach to massively distributed aggregate computing on peer-to-peer networks. In Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP’04). 200--207.Google Scholar
Cross Ref
- Márk Jelasity, Spyros Voulgaris, Rachid Guerraoui, Anne-Marie Kermarrec, and Maarten van Steen. 2007. Gossip-based peer sampling. ACM Trans. Comput. Syst. 25, 3, Article 8. Google Scholar
Digital Library
- Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1, 1--19. Google Scholar
Digital Library
- Xiaohui Long and Torsten Suel. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web (WWW’05). ACM, New York, NY, 257--266. Google Scholar
Digital Library
- Andreas Loupasakis, Nikos Ntarmos, and Peter Triantafillou. 2011. eXO: Decentralized autonomous scalable social networking. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR’11). 85--95.Google Scholar
- Xin Luo, Yuanxin Ouyang, and Zhang Xiong. 2012. Improving neighborhood based collaborative filtering via integrated folksonomy information. Pattern Recogn. Lett. 33, 3, 263--270. Google Scholar
Digital Library
- Alan Mislove, Krishna P. Gummadi, and Peter Druschel. 2006. Exploiting social networks for Internet search. In Proceedings of the 5th Workshop on Hot Topics in Networks (HotNets’06). 79--84.Google Scholar
- Alberto Montresor and Márk Jelasity. 2009. PeerSim: A scalable P2P simulator. In Proceedings of the 9th International Conference on Peer-to-Peer Computing. Henning Schulzrinne, Karl Aberer, and Anwitaman Datta Eds., IEEE, 99--100.Google Scholar
Cross Ref
- Michael G. Noll and Christoph Meinel. 2007. Web search personalization via social bookmarking and tagging. In Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC’07/ASWC’07). Springer-Verlag, Berlin, Heidelberg, 367--380. Google Scholar
Digital Library
- G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11, 613--620. Google Scholar
Digital Library
- Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane X. Parreira, and Gerhard Weikum. 2008. Efficient top-k querying over social-tagging networks. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, USA, 523--530. Google Scholar
Digital Library
- Micro Speretta and Susan Gauch. 2005. Personalized search based on user search histories. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). IEEE Computer Society, Los Alamitos, CA, 622--628. Google Scholar
Digital Library
- Julia Stoyanovich, Sihem Amer-Yahia, Cameron Marlow, and Cong Yu. 2008. Leveraging tagging to model user interests in del.icio.us. In Proceedings of the AAAI Social Information Spring Symposium (AAAI-SIP’08). 104--109.Google Scholar
- Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa. 2004. Adaptive Web search based on user profile constructed without any effort from users. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, NY, 675--684. Google Scholar
Digital Library
- Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. 2005. CubeSVD: A novel approach to personalized Web search. In Proceedings of the 14th International Conference on World Wide Web (WWW’05). ACM, New York, NY, 382--390. Google Scholar
Digital Library
- Jaime Teevan, Meredith Ringel Morris, and Steve Bush. 2009. Discovering and using groups to improve personalized search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM’09). ACM, New York, NY, 15--24. Google Scholar
Digital Library
- Yohannes Tsegay, Andrew Turpin, and Justin Zobel. 2007. Dynamic index pruning for effective caching. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, 987--990. Google Scholar
Digital Library
- Spyros Voulgaris and Maarten van Steen. 2005. Epidemic-style management of semantic overlays for content-based searching. In Proceedings of the 11th International Euro-Par Conference on Parallel Processing (Euro-Par’05). Lecture Notes in Computer Science, vol. 3648. Springer-Verlag, Berlin, Heidelberg, 1143--1152. Google Scholar
Digital Library
- Neal E. Young. 1998. On-line file caching. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’98). SIAM, Philadelphia, PA, 82--86. Google Scholar
Digital Library
Index Terms
Personalizing Top-k Processing Online in a Peer-to-Peer Social Tagging Network
Recommendations
Collaborative personalized top-k processing
This article presents P4Q, a fully decentralized gossip-based protocol to personalize query processing in social tagging systems. P4Q dynamically associates each user with social acquaintances sharing similar tagging behaviors. Queries are gossiped ...
Personalization on a peer-to-peer television system
We introduce personalization on Tribler , a peer-to-peer (P2P) television system. Personalization allows users to browse programs much more efficiently according to their taste. It also enables to build social networks that can improve the performance ...
An Efficient Hybrid Peer-to-Peer System for Distributed Data Sharing
Peer-to-peer overlay networks are widely used in distributed systems. Based on whether a regular topology is maintained among peers, peer-to-peer networks can be divided into two categories: structured peer-to-peer networks in which peers are connected ...






Comments