skip to main content
research-article

Personalizing Top-k Processing Online in a Peer-to-Peer Social Tagging Network

Published:01 July 2014Publication History
Skip Abstract Section

Abstract

The rapidly increasing amount of user-generated content in social tagging systems provides a huge source of information. Yet, performing effective search in these systems is very challenging, especially when we seek the most appropriate items that match a potentially ambiguous query. Collaborative filtering-based personalization is appealing in this context, as it limits the search within a small network of participants with similar preferences. Offline personalization, which consists in maintaining, for every user, a network of similar participants based on their tagging behaviors, is effective for queries that are close to the querying user’s tagging profile but performs poorly when the queries, reflecting emerging interests, have little correlation with the querying user’s profile.

We present P2TK2, the first protocol to personalize query processing in social tagging systems online. P2TK2 is completely decentralized, and this design choice stems from the observation that the evolving social tagging systems naturally resemble P2P systems where users are both producers and consumers. This design exploits the power of the crowd and prevents any central authority from controlling personal information. P2TK2 is gossip-based and probabilistic. It dynamically associates each user with social acquaintances sharing similar tagging behaviors. Appropriate users for answering a query are discovered at query time with the help of social acquaintances. This is achieved according to the hybrid interest of the querying user, taking into account both her tagging behavior and her query. Results are iteratively refined and returned to the querying user. We evaluate P2TK2 on CiteULike and Delicious traces involving up to 50,000 users. We highlight the advantages of online personalization compared to offline personalization, as well as its efficiency, scalability, and inherent ability to cope with user departure and interest evolution in P2P systems.

References

  1. Sihem Amer-Yahia, Michael Benedikt, Laks V. S. Lakshmanan, and Julia Stoyanovich. 2008. Efficient network aware search in collaborative tagging sites. Proc. VLDB Endow. 1, 1, 710--721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ricardo Baeza-Yates, Aristides Gionis, Flavio P. Junqueira, Vanessa Murdock, Vassilis Plachouras, and Fabrizio Silvestri. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, Article 20, 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Xiao Bai, Marin Bertier, Rachid Guerraoui, Anne-Marie Kermarrec, and Vincent Leroy. 2010. Gossiping personalized queries. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, New York, NY, 87--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Xiao Bai, Rachid Guerraoui, Anne-Marie Kermarrec, and Vincent Leroy. 2011. Collaborative personalized top-k processing. ACM Trans. Datab. Syst. 36, 4, Article 26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Xiao Bai and Flavio P. Junqueira. 2012. Online result cache invalidation for real-time Web search. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, 641--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Robert M. Bell and Yehuda Koren. 2007. Improved neighborhood-based collaborative filtering. In Proceedings of the 1st KDDCup’07.Google ScholarGoogle Scholar
  7. Matthias Bender, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Josiane Xavier Parreira, and Gerhard Weikum. 2007. Peer-to-peer information search: Semantic, social, or spiritual? IEEE Data Eng. Bull. 30, 2, 51--60.Google ScholarGoogle Scholar
  8. Pei Cao and Sandy Irani. 1997. Cost-aware WWW proxy caching algorithms. In Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems (USITS’97). USENIX Association, Berkeley, CA, 193--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. Wiley-Interscience, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zhicheng Dou, Ruihua Song, and Ji-Rong Wen. 2007. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY, 581--590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Patrick T. Eugster, Rachid Guerraoui, Anne-Marie Kermarrec, and Laurent Massoulié. 2004. Epidemic information dissemination in distributed systems. Computer 37, 5, 60--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ronald Fagin. 2002. Combining fuzzy information: An overview. SIGMOD Rec. 31, 2, 109--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2008. Can social bookmarking improve Web search? In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM’08). ACM, New York, NY, 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Jelasity, W. Kowalczyk, and M. van Steen. 2004. An approach to massively distributed aggregate computing on peer-to-peer networks. In Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP’04). 200--207.Google ScholarGoogle ScholarCross RefCross Ref
  15. Márk Jelasity, Spyros Voulgaris, Rachid Guerraoui, Anne-Marie Kermarrec, and Maarten van Steen. 2007. Gossip-based peer sampling. ACM Trans. Comput. Syst. 25, 3, Article 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xiaohui Long and Torsten Suel. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web (WWW’05). ACM, New York, NY, 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Andreas Loupasakis, Nikos Ntarmos, and Peter Triantafillou. 2011. eXO: Decentralized autonomous scalable social networking. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR’11). 85--95.Google ScholarGoogle Scholar
  19. Xin Luo, Yuanxin Ouyang, and Zhang Xiong. 2012. Improving neighborhood based collaborative filtering via integrated folksonomy information. Pattern Recogn. Lett. 33, 3, 263--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alan Mislove, Krishna P. Gummadi, and Peter Druschel. 2006. Exploiting social networks for Internet search. In Proceedings of the 5th Workshop on Hot Topics in Networks (HotNets’06). 79--84.Google ScholarGoogle Scholar
  21. Alberto Montresor and Márk Jelasity. 2009. PeerSim: A scalable P2P simulator. In Proceedings of the 9th International Conference on Peer-to-Peer Computing. Henning Schulzrinne, Karl Aberer, and Anwitaman Datta Eds., IEEE, 99--100.Google ScholarGoogle ScholarCross RefCross Ref
  22. Michael G. Noll and Christoph Meinel. 2007. Web search personalization via social bookmarking and tagging. In Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC’07/ASWC’07). Springer-Verlag, Berlin, Heidelberg, 367--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11, 613--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane X. Parreira, and Gerhard Weikum. 2008. Efficient top-k querying over social-tagging networks. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, USA, 523--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Micro Speretta and Susan Gauch. 2005. Personalized search based on user search histories. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). IEEE Computer Society, Los Alamitos, CA, 622--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Julia Stoyanovich, Sihem Amer-Yahia, Cameron Marlow, and Cong Yu. 2008. Leveraging tagging to model user interests in del.icio.us. In Proceedings of the AAAI Social Information Spring Symposium (AAAI-SIP’08). 104--109.Google ScholarGoogle Scholar
  27. Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa. 2004. Adaptive Web search based on user profile constructed without any effort from users. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, NY, 675--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. 2005. CubeSVD: A novel approach to personalized Web search. In Proceedings of the 14th International Conference on World Wide Web (WWW’05). ACM, New York, NY, 382--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jaime Teevan, Meredith Ringel Morris, and Steve Bush. 2009. Discovering and using groups to improve personalized search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM’09). ACM, New York, NY, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yohannes Tsegay, Andrew Turpin, and Justin Zobel. 2007. Dynamic index pruning for effective caching. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, 987--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Spyros Voulgaris and Maarten van Steen. 2005. Epidemic-style management of semantic overlays for content-based searching. In Proceedings of the 11th International Euro-Par Conference on Parallel Processing (Euro-Par’05). Lecture Notes in Computer Science, vol. 3648. Springer-Verlag, Berlin, Heidelberg, 1143--1152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Neal E. Young. 1998. On-line file caching. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’98). SIAM, Philadelphia, PA, 82--86. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Personalizing Top-k Processing Online in a Peer-to-Peer Social Tagging Network

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF2602572.2.pdf

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!