Abstract
Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.
- [1] . 2012. Impact of data characteristics on recommender systems performance. ACM Transactions on Management Information Systems (TMIS) 3, 1 (2012), 1–17.Google Scholar
Digital Library
- [2] 2016. Recommender Systems. Vol. 1. Springer, Berlin.Google Scholar
- [3] . 2009. Computational Complexity: A Modern Approach. Cambridge University Press, Cambridge, UK.Google Scholar
Cross Ref
- [4] . 1983. Sample size: How much is enough? Quality and Quantity 17, 3 (1983), 239–245.Google Scholar
Cross Ref
- [5] . 2015. Support vector regression. In Efficient Learning Machines. Springer, Berlin, 67–80.Google Scholar
Cross Ref
- [6] . 2004. Tuning search algorithms for real-world applications: A regression tree based approach. In Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No. 04TH8753), Vol. 1. IEEE, 1111–1118.Google Scholar
Cross Ref
- [7] . 2016. Recommender systems for product bundling. Knowledge-based Systems 111 (2016), 193–206.Google Scholar
Digital Library
- [8] . 2010. A performance prediction approach to enhance collaborative filtering performance. In European Conference on Information Retrieval. Springer, Berlin, 382–393.Google Scholar
Digital Library
- [9] . 2017. Statistical biases in information retrieval metrics for recommender systems. Information Retrieval Journal 20, 6 (2017), 606–634.Google Scholar
Digital Library
- [10] . 1996. Numerical Methods for Least Squares Problems. Vol. 51. SIAM, Philadelphia, PA.Google Scholar
Cross Ref
- [11] . 1995. High-level optimization via automated statistical modeling. ACM SIGPLAN Notices 30, 8 (1995), 80–91.Google Scholar
Digital Library
- [12] . 2011. Handbook of Markov Chain Monte Carlo. CRC Press, Florida.Google Scholar
Cross Ref
- [13] Dinh-Mao Bui, YongIk Yoon, Eui-Nam Huh, SungIk Jun, and Sungyoung Lee. 2017. Energy efficiency for cloud computing system based on predictive optimization. J. Parallel and Distrib. Comput. 102 (2017), 103–114.Google Scholar
- [14] . 2016. Co-clustering under the maximum norm. Algorithms 9, 1 (2016), 17.Google Scholar
Cross Ref
- [15] . 2011. Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems. ACM Transactions on the Web (TWEB) 5, 1 (2011), 1–33.Google Scholar
Digital Library
- [16] . 2011. Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems. ACM Trans. Web 5, 1,
Article 2 (Feb. 2011), 33 pages. Google ScholarDigital Library
- [17] . 2018. Should I follow the crowd? A probabilistic analysis of the effectiveness of popularity in recommender systems. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 415–424.Google Scholar
- [18] . 2020. On target item sampling in offline recommender system evaluation. In 14th ACM Conference on Recommender Systems. ACM, 259–268.Google Scholar
Digital Library
- [19] . 2020. Offline evaluation options for recommender systems. Information Retrieval Journal 23, 4 (2020), 387–410.Google Scholar
Cross Ref
- [20] . 2008. Bayesian Methods for Data Analysis. CRC Press, Florida.Google Scholar
Cross Ref
- [21] Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. 1998. Random sampling for histogram construction: How much is enough? ACM SIGMOD Record 27, 2 (1998), 436–447.Google Scholar
- [22] . 2007. Computation of the singular value decomposition. In Handbook of Linear Algebra, (Ed.). Chapman & Hall/CRC, Boca Raton, FL, Chapter 45, 45–1–45–13.Google Scholar
- [23] . 2009. Introduction to Algorithms (3rd ed.). MIT Press, Cambridge, MA.Google Scholar
- [24] . 2020. How dataset characteristics affect the robustness of collaborative recommendation models. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 951–960.Google Scholar
Digital Library
- [25] . 2011. A comprehensive survey of neighborhood-based recommendation methods. In Recommender Systems Handbook. Springer, Boston, MA, 107–144.Google Scholar
Cross Ref
- [26] . 2021. Optimizing inference performance of transformers on CPUs. In Proceedings of the 1st Workshop on Machine Learning and Systems (EuroMLSys’21). ACM, 1–8.Google Scholar
- [27] . 1998. Applied Regression Analysis. Vol. 326. John Wiley & Sons, Hoboken, NJ.Google Scholar
Cross Ref
- [28] Eugene Fink. 1998. How to solve it automatically: Selection among problem solving methods. In Proceedings of the Fourth International Conference on Artificial Intelligence Planning Systems. Association for the Advancement of Artificial Intelligence (AAAI) Press, 128–136. Pittsburgh, USA.Google Scholar
- [29] . 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119–139.Google Scholar
Digital Library
- [30] . 2001. The Elements of Statistical Learning. Vol. 1 (10). Springer Series in Statistics, New York.Google Scholar
- [31] . 2013. In Bayesian Data Analysis. CRC Press, Florida.Google Scholar
Cross Ref
- [32] Thomas George and Srujana Merugu. 2005. A scalable collaborative filtering framework based on co-clustering. In Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, USA. IEEE, 4–pp.Google Scholar
- [33] . 2002. Fast incremental maintenance of approximate histograms. ACM Transactions on Database Systems (TODS) 27, 3 (2002), 261–298.Google Scholar
Digital Library
- [34] . 1921. Measurement of inequality of incomes. Economic Journal 31, 121 (1921), 124–126.Google Scholar
Cross Ref
- [35] . 2015. Evaluating recommender systems. In Recommender Systems Handbook. Springer, Boston, MA, 265–308.Google Scholar
Cross Ref
- [36] . 1992. Sequential sampling procedures for query size estimation. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 341–350.Google Scholar
Digital Library
- [37] . 2015. The Movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS) 5, 4 (2015), 1–19.Google Scholar
Digital Library
- [38] . 1999. An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 230–237.Google Scholar
Digital Library
- [39] . 2004. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS) 22, 1 (2004), 5–53.Google Scholar
Digital Library
- [40] . 1991. Error-constrained COUNT query evaluation in relational databases. ACM SIGMOD Record 20, 2 (1991), 278–287.Google Scholar
Digital Library
- [41] . 1999. Exploiting competitive planner performance. In European Conference on Planning. Springer, Berlin, 62–72.Google Scholar
- [42] . 2008. Collaborative filtering for implicit feedback datasets. In 2008 8th IEEE International Conference on Data Mining. IEEE, 263–272.Google Scholar
Digital Library
- [43] . 2017. Energy-efficient resource utilization for heterogeneous embedded computing systems. IEEE Trans. Comput. 66, 9 (2017), 1518–1531.Google Scholar
Digital Library
- [44] . 2007. A comparison of collaborative-filtering recommendation algorithms for e-commerce. IEEE Intelligent Systems 22, 5 (2007), 68–78.Google Scholar
Cross Ref
- [45] . 2020. Surprise: A python library for recommender systems. Journal of Open Source Software 5, 52 (2020), 2174. Google Scholar
Cross Ref
- [46] . 2010. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD) 4, 1 (2010), 1–24.Google Scholar
Digital Library
- [47] . 2006. Towards a scalable kNN CF algorithm: Exploring effective applications of clustering. In International Workshop on Knowledge Discovery on the Web. Springer, Berlin, 147–166.Google Scholar
- [48] . 2021. A survey on advancing the DBMS query optimizer: Cardinality estimation, cost model, and plan enumeration. Data Science and Engineering 6, 1 (2021), 1–16.Google Scholar
Cross Ref
- [49] . 2020. H2O AutoML: Scalable automatic machine learning. In 7th ICML Workshop on Automated Machine Learning (AutoML’20). ICML, 1–16. https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdfGoogle Scholar
- [50] . 2021. Towards cost-optimal query processing in the cloud. Proceedings of the VLDB Endowment 14, 9 (2021), 1606–1612.Google Scholar
Digital Library
- [51] . 2005. Slope one predictors for online rating-based collaborative filtering. In Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, 471–475.Google Scholar
Cross Ref
- [52] . 2002. Learning the empirical hardness of optimization problems: The case of combinatorial auctions. In International Conference on Principles and Practice of Constraint Programming. Springer, 556–572.Google Scholar
Cross Ref
- [53] . 2012. Robust estimation of resource consumption for SQL queries using statistical techniques. Proceedings of the VLDB Endowment 5, 11 (2012), 1–12.Google Scholar
Digital Library
- [54] . 2006. Personalized content recommendation and user satisfaction: Theoretical synthesis and empirical findings. Journal of Management Information Systems 23, 3 (2006), 45–70.Google Scholar
Digital Library
- [55] . 1995. Query size estimation by adaptive sampling. J. Comput. System Sci. 51, 1 (1995), 18–25.Google Scholar
Digital Library
- [56] . 2019. Performance comparison of neural and non-neural approaches to session-based recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 462–466.Google Scholar
Digital Library
- [57] . 2014. An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Transactions on Industrial Informatics 10, 2 (2014), 1273–1284.Google Scholar
Cross Ref
- [58] . 2017. Machine learning with big data: Challenges and approaches. IEEE Access 5 (2017), 7776–7797.Google Scholar
Cross Ref
- [59] . 1999. Random sampling techniques for space efficient online computation of order statistics of large datasets. ACM SIGMOD Record 28, 2 (1999), 251–262.Google Scholar
Digital Library
- [60] . 2017. Statistical Intervals: A Guide for Practitioners and Researchers. Vol. 541. John Wiley & Sons, Hoboken, NJ.Google Scholar
Digital Library
- [61] . 2018. Maximum likelihood estimation. In Statistical Inference for Engineers and Data Scientists. Cambridge University Press, Cambridge, UK, 319–357. Google Scholar
Cross Ref
- [62] . 2012. Machine Learning: A Probabilistic Perspective. MIT Press, London, UK.Google Scholar
Digital Library
- [63] . 2020. Efficiency-effectiveness trade-offs in recommendation systems. In Proceedings of the 14th ACM Conference on Recommender Systems (RecSys’20). ACM, 770–775.Google Scholar
Digital Library
- [64] . 2021. Are we there yet? Estimating training time for recommendation systems. In Proceedings of the 1st Workshop on Machine Learning and Systems (EuroMLSys’21). ACM, 1–9.Google Scholar
Digital Library
- [65] . 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing. ACM, 1–13.Google Scholar
Digital Library
- [66] . 2010. Recommender Systems Handbook (1st ed.). Springer, Boston, MA.Google Scholar
Digital Library
- [67] . 2007. Tuning the performance of the MMAS heuristic. In International Workshop on Engineering Stochastic Local Search Algorithms. Springer, Berlin, 46–60.Google Scholar
- [68] . 2007. Learned models of performance for many planners. In ICAPS 2007 Workshop AI Planning and Learning. ICAPS, 36–40.Google Scholar
- [69] . 2020. Engagement in proactive recommendations. Journal of Intelligent Information Systems 54, 1 (2020), 79–100.Google Scholar
Digital Library
- [70] . 2020. Stochastic gradient descent 0.23.0 documentation. https://scikit-learn.org/stable/modules/sgd.html #complexity.Google Scholar
- [71] . 2003. Monte Carlo sampling methods. Handbooks in Operations Research and Management Science 10 (2003), 353–425.Google Scholar
Cross Ref
- [72] . 1995. Social information filtering: Algorithms for automating “word of mouth.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 210–217.Google Scholar
Digital Library
- [73] . 2006. Machine learning approaches for estimation of prediction interval for the model output. Neural Networks 19, 2 (2006), 225–235.Google Scholar
Digital Library
- [74] . 2017. Two decades of recommender systems at Amazon.com. IEEE Internet Computing 21, 3 (2017), 12–18.Google Scholar
Digital Library
- [75] . 2013. Uncertainty Quantification: Theory, Implementation, and Applications. Vol. 12. SIAM, Philadelphia, PA.Google Scholar
Digital Library
- [76] . 2019. Energy and policy considerations for deep learning in NLP. CoRR abs/1906.02243 (2019), 1–6.
arxiv:1906.02243 http://arxiv.org/abs/1906.02243Google Scholar - [77] . 2018. Rose: Cluster resource scheduling via speculative over-subscription. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS’18). IEEE, 949–960.Google Scholar
Cross Ref
- [78] . 2011. One real-time personalized recommendation systems based on slope one algorithm. In 2011 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD’11), Vol. 3. IEEE, 1826–1830.Google Scholar
Cross Ref
- [79] . 2006. Monte Carlo integration with acceptance-rejection. Journal of Computational and Graphical Statistics 15, 3 (2006), 735–752.Google Scholar
Cross Ref
- [80] . 2019. The relative complexity of maximum likelihood estimation, map estimation, and sampling. In Conference on Learning Theory. PMLR, 2993–3035.Google Scholar
- [81] . 2021. Recommendation of food items for thyroid patients using content-based KNN method. In Data Science and Security. Springer, Berlin, 71–77.Google Scholar
Cross Ref
- [82] . 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1009–1024.Google Scholar
Digital Library
- [83] . 2014. A CUDA-enabled parallel implementation of collaborative filtering. Procedia Computer Science 30 (2014), 66–74.Google Scholar
Cross Ref
- [84] . 2013. Recommendation systems and consumer satisfaction online: Moderating effects of consumer product awareness. In 2013 46th Hawaii International Conference on System Sciences. IEEE, 2753–2762.Google Scholar
Digital Library
- [85] . 2018. Openrec: A modular framework for extensible and adaptable recommendation algorithms. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 664–672.Google Scholar
Digital Library
- [86] . 2019. Making sense of recommendations. Journal of Behavioral Decision Making 32, 4 (2019), 403–414.Google Scholar
Cross Ref
- [87] . 2020. Towards GPU utilization prediction for cloud deep learning. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’20). USENIX, Online, 1–9.Google Scholar
- [88] . 2017. Goodbooks-10k: a new dataset for book recommendations. http://fastml.com/goodbooks-10k.Google Scholar
- [89] . 2019. Efficient collaborative filtering recommendations with multi-channel feature vectors. International Journal of Machine Learning and Cybernetics 10, 5 (2019), 1165–1172.Google Scholar
Cross Ref
Index Terms
White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance
Recommendations
A Collaborative Filtering Recommendation Algorithm Based on Item Genre and Rating Similarity
CINC '09: Proceedings of the 2009 International Conference on Computational Intelligence and Natural Computing - Volume 02Aiming at the disadvantages of user-based collaborative filtering algorithm and item-based collaborative filtering algorithm on the instance of user’s rating data’s extreme sparseness, introducing the similarity of item genre and rating and improving on ...
Contextual-boosted deep neural collaborative filtering model for interpretable recommendation
Highlights- Recommendation performance is improved by leveraging textual introductions of items.
AbstractCollaborative filtering (CF) is one of the most successful recommendation techniques due to its simplicity and attractive accuracy. However, existing CF methods fail to interpret the reasons why they recommend a new item. In this paper,...
Collaborative Filtering Using a Regression-Based Approach
The task of collaborative filtering is to predict the preferences of an active user for unseen items given preferences of other users. These preferences are typically expressed as numerical ratings. In this paper, we propose a novel regression-based ...






Comments