ABSTRACT
Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.
References
- O. Küçüktunç, E. Saule, K. Kaya, and Ü.V. Çatalyürek, "Recommendation on Academic Networks using Direction Aware Citation Analysis," arXiv preprint arXiv:1205.1143, 2012, pp. 1--10.Google Scholar
- R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl, "Enhancing digital libraries with TechLens," Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, ACM New York, NY, USA, 2004, pp. 228--236. Google Scholar
Digital Library
- S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl, "On the Recommending of Citations for Research Papers," Proceedings of the ACM Conference on Computer Supported Cooperative Work, New Orleans, Louisiana, USA: ACM, 2002, pp. 116--125. Google Scholar
Digital Library
- J. A. Konstan and J. Riedl, "Recommender systems: from algorithms to user experience," User Modeling and User-Adapted Interaction, 2012, pp. 1--23. Google Scholar
Digital Library
- B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell, "Explaining the user experience of recommender systems," User Modeling and User-Adapted Interaction, vol. 22, 2012, pp. 441--504. Google Scholar
Digital Library
- A. Gunawardana and G. Shani, "A survey of accuracy evaluation metrics of recommendation tasks," The Journal of Machine Learning Research, vol. 10, 2009. Google Scholar
Digital Library
- G. Karypis, "Evaluation of item-based top-n recommendation algorithms," Proceedings of the tenth international conference on Information and knowledge management, ACM, 2001, pp. 247--254. Google Scholar
Digital Library
- J. Beel, S. Langer, M. Genzmehr, B. Gipp, C. Breitinger, and A. Nürnberger, "Research Paper Recommender System Evaluation: A Quantitative Literature Survey," Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), 2013. Google Scholar
Digital Library
- M. Ge, C. Delgado-Battenfeld, and D. Jannach, "Beyond accuracy: evaluating recommender systems by coverage and serendipity," Proceedings of the fourth ACM conference on Recommender systems, ACM, 2010, pp. 257--260. Google Scholar
Digital Library
- W. Hersh, A. Turpin, S. Price, B. Chan, D. Kramer, L. Sacherek, and D. Olson, "Do batch and user evaluations give the same results?," Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2000, pp. 17--24. Google Scholar
Digital Library
- D. Jannach, L. Lerche, F. Gedikli, and G. Bonnin, "What Recommenders Recommend--An Analysis of Accuracy, Popularity, and Sales Diversity Effects," User Modeling, Adaptation, and Personalization, Springer, 2013.Google Scholar
- G. Shani and A. Gunawardana, "Evaluating recommendation systems," Recommender systems handbook, Springer, 2011, pp. 257--297.Google Scholar
Cross Ref
- A. H. Turpin and W. Hersh, "Why batch and user evaluations do not give the same results," Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2001, pp. 225--231. Google Scholar
Digital Library
- J. Beel, B. Gipp, S. Langer, and M. Genzmehr, "Docear: An Academic Literature Suite for Searching, Organizing and Creating Academic Literature," Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, ACM, 2011, pp. 465--466. Google Scholar
Digital Library
- J. Beel, S. Langer, M. Genzmehr, and A. Nürnberger, "Introducing Docear's Research Paper Recommender System," Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), ACM, 2013, pp. 459--460. Google Scholar
Digital Library
- K. D. Bollacker, S. Lawrence, and C. L. Giles, "CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications," Proceedings of the 2nd international conference on Autonomous agents, ACM, 1998, pp. 116--123. Google Scholar
Digital Library
- S. M. McNee, N. Kapoor, and J. A. Konstan, "Don't look stupid: avoiding pitfalls when recommending research papers," Proceedings of the 20th anniversary conference on Computer supported cooperative work, ProQuest, 2006, pp. 171--180. Google Scholar
Digital Library
- J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems (TOIS), vol. 22, 2004, pp. 5--53. Google Scholar
Digital Library
- F. Ricci, L. Rokach, B. Shapira, and K. B. P., "Recommender systems handbook," Recommender Systems Handbook, 2011, pp. 1--35.Google Scholar
- J. Beel, S. Langer, and M. Genzmehr, "Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: 2013, pp. 395--399.Google Scholar
- J. Beel, S. Langer, A. Nürnberger, and M. Genzmehr, "The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: Springer, 2013, pp. 400--404.Google Scholar
- T. A. Brooks, "Private acts and public objects: an investigation of citer motivations," Journal of the American Society for Information Science, vol. 36, 1985, pp. 223--229. Google Scholar
Digital Library
- M. Liu, "Progress in documentation the complexities of citation practice: a review of citation studies," Journal of Documentation, vol. 49, 1993, pp. 370--408.Google Scholar
Cross Ref
- M. H. MacRoberts and B. MacRoberts, "Problems of Citation Analysis," Scientometrics, vol. 36, 1996, pp. 435--444.Google Scholar
Cross Ref
- X. Amatriain, J. Pujol, and N. Oliver, "I like it... i like it not: Evaluating user ratings noise in recommender systems," User Modeling, Adaptation, and Personalization, 2009, pp. 247--258. Google Scholar
Digital Library
Index Terms
A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation




Comments