skip to main content
10.1145/3394486.3403370acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

SimClusters: Community-Based Representations for Heterogeneous Recommendations at Twitter

Published:20 August 2020Publication History

ABSTRACT

Personalized recommendation products at Twitter target a multitude of heterogeneous items: Tweets, Events, Topics, Hashtags, and users. Each of these targets varies in their cardinality (which affects the scale of the problem) and their "shelf life'' (which constrains the latency of generating the recommendations). Although Twitter has built a variety of recommendation systems before dating back a decade, solutions to the broader problem were mostly tackled piecemeal. In this paper, we present SimClusters, a general-purpose representation layer based on overlapping communities into which users as well as heterogeneous content can be captured as sparse, interpretable vectors to support a multitude of recommendation tasks. We propose a novel algorithm for community discovery based on Metropolis-Hastings sampling, which is both more accurate and significantly faster than off-the-shelf alternatives. SimClusters scales to networks with billions of users and has been effective across a variety of deployed applications at Twitter.

Skip Supplemental Material Section

Supplemental Material

3394486.3403370.mp4

A brief explainer video with slides for the paper "SimClusters: Community-Based Representations for Heterogeneous Recommendations at Twitter".

References

  1. Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed Membership Stochastic Blockmodels. JMLR, Vol. 9 (June 2008), 1981--2014.Google ScholarGoogle Scholar
  2. Iván Cantador and Paolo Cremonesi. 2014. Tutorial on Cross-domain Recommender Systems. In RecSys '14. 401--402.Google ScholarGoogle Scholar
  3. Andrzej Cichocki and Anh-Huy Phan. 2009. Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations. IEICE Transactions, Vol. 92-A (03 2009), 708--721.Google ScholarGoogle ScholarCross RefCross Ref
  4. Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, Vol. 55, 1 (2005), 58--75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In RecSys '16. 191--198.Google ScholarGoogle Scholar
  6. Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Recsys'19. 101--109.Google ScholarGoogle Scholar
  7. Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2007. Weighted Graph Cuts Without Eigenvectors A Multilevel Approach. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 29, 11 (Nov. 2007), 1944--1957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In WWW'15. 278--288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ajeet Grewal, Jerry Jiang, Gary Lam, Tristan Jung, Lohith Vuddemarri, Quannan Li, Aaditya Landge, and Jimmy Lin. 2018. Recservice: Distributed Real-Time Graph Processing at Twitter. In HotCloud'18. USENIX Association, 3.Google ScholarGoogle Scholar
  10. Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In KDD '16. 855--864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The Who to Follow Service at Twitter. In WWW '13. 505--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pankaj Gupta, Venu Satuluri, Ajeet Grewal, Siva Gurumurthy, Volodymyr Zhabiuk, Quannan Li, and Jimmy Lin. 2014. Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs. Proceedings of the VLDB Endowment, Vol. 7, 13 (2014), 1379--1380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS'17. 1025--1035.Google ScholarGoogle Scholar
  14. Krishna Kamath, Aneesh Sharma, Dong Wang, and Zhijun Yin. 2014. Realgraph: User interaction prediction at twitter. In User Engagement Optimization Workshop at KDD'14.Google ScholarGoogle Scholar
  15. Richard M Karp, Scott Shenker, and Christos H Papadimitriou. 2003. A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), Vol. 28, 1 (2003), 51--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR'17.Google ScholarGoogle Scholar
  17. Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM, Vol. 46, 5 (Sept. 1999), 604--632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer, Vol. 42, 8 (Aug. 2009), 30--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jérôme Kunegis. 2013. KONECT -- The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343--1350.Google ScholarGoogle Scholar
  20. R. Lempel and S. Moran. 2001. SALSA: The Stochastic Approach for Link-Structure Analysis. ACM Trans. Inf. Syst., Vol. 19, 2 (April 2001), 131--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jure Leskovec and Rok Sosivc. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 8, 1 (2016), 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In WWW '18. 689--698.Google ScholarGoogle Scholar
  23. David Melamed. 2014. Community Structures in Bipartite Networks: A Dual-Projection Approach. PLOS ONE, Vol. 9, 5 (05 2014), 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  24. Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. 2011. HOGWILD!: A Lock-free Approach to Parallelizing Stochastic Gradient Descent. In NIPS'11. 693--701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. et. al. Pedregosa. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD'14. 701--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Steffen Rendle. 2010. Factorization machines. In ICDM'10. IEEE, 995--1000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Venu Satuluri and Srinivasan Parthasarathy. 2011. Symmetrizations for Clustering Directed Graphs. In EDBT/ICDT '11. 343--354.Google ScholarGoogle Scholar
  29. Venu Satuluri, Srinivasan Parthasarathy, and Yiye Ruan. 2011. Local Graph Sparsification for Scalable Clustering. In SIGMOD '11. 721--732.Google ScholarGoogle Scholar
  30. Sebastian Schelter, Venu Satuluri, and Reza Bosagh Zadeh. 2014. Factorbird - a Parameter Server Approach to Distributed Matrix Factorization. ArXiv, Vol. abs/1411.0602 (2014).Google ScholarGoogle Scholar
  31. Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time Content Recommendations at Twitter. Proc. VLDB Endow., Vol. 9, 13 (Sept. 2016), 1281--1292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Aneesh Sharma, C. Seshadhri, and Ashish Goel. 2017. When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors. In WWW '17. 431--440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Charalampos Tsourakakis. 2015. Provably Fast Inference of Latent Features from Networks: With Applications to Learning Social Circles and Multilabel Classification. In WWW'15. 1111--1121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jaewon Yang and Jure Leskovec. 2013. Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach. In WSDM'13. 587--596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jaewon Yang, Julian McAuley, and Jure Leskovec. 2014. Detecting Cohesive and 2-Mode Communities Indirected and Undirected Networks. In WSDM'14. 323--332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Recsys'19. 269--277.Google ScholarGoogle Scholar
  37. Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In KDD '18. 974--983.Google ScholarGoogle Scholar
  38. Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In WSDM'14. 283--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yongfeng Zhang, Qingyao Ai, Xu Chen, and W Bruce Croft. 2017. Joint representation learning for top-n recommendation with heterogeneous information sources. In CIKM'17. 1449--1458.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SimClusters: Community-Based Representations for Heterogeneous Recommendations at Twitter

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      August 2020
      3664 pages
      ISBN:9781450379984
      DOI:10.1145/3394486

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader