10.1145/2532508.2532511acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrepsysConference Proceedingsconference-collections
research-article

A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

Online:12 October 2013Publication History

ABSTRACT

Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.

References

  1. O. Küçüktunç, E. Saule, K. Kaya, and Ü.V. Çatalyürek, "Recommendation on Academic Networks using Direction Aware Citation Analysis," arXiv preprint arXiv:1205.1143, 2012, pp. 1--10.Google ScholarGoogle Scholar
  2. R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl, "Enhancing digital libraries with TechLens," Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, ACM New York, NY, USA, 2004, pp. 228--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, J. A. Konstan, and J. Riedl, "On the Recommending of Citations for Research Papers," Proceedings of the ACM Conference on Computer Supported Cooperative Work, New Orleans, Louisiana, USA: ACM, 2002, pp. 116--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. A. Konstan and J. Riedl, "Recommender systems: from algorithms to user experience," User Modeling and User-Adapted Interaction, 2012, pp. 1--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell, "Explaining the user experience of recommender systems," User Modeling and User-Adapted Interaction, vol. 22, 2012, pp. 441--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Gunawardana and G. Shani, "A survey of accuracy evaluation metrics of recommendation tasks," The Journal of Machine Learning Research, vol. 10, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Karypis, "Evaluation of item-based top-n recommendation algorithms," Proceedings of the tenth international conference on Information and knowledge management, ACM, 2001, pp. 247--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Beel, S. Langer, M. Genzmehr, B. Gipp, C. Breitinger, and A. Nürnberger, "Research Paper Recommender System Evaluation: A Quantitative Literature Survey," Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Ge, C. Delgado-Battenfeld, and D. Jannach, "Beyond accuracy: evaluating recommender systems by coverage and serendipity," Proceedings of the fourth ACM conference on Recommender systems, ACM, 2010, pp. 257--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Hersh, A. Turpin, S. Price, B. Chan, D. Kramer, L. Sacherek, and D. Olson, "Do batch and user evaluations give the same results?," Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2000, pp. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Jannach, L. Lerche, F. Gedikli, and G. Bonnin, "What Recommenders Recommend--An Analysis of Accuracy, Popularity, and Sales Diversity Effects," User Modeling, Adaptation, and Personalization, Springer, 2013.Google ScholarGoogle Scholar
  12. G. Shani and A. Gunawardana, "Evaluating recommendation systems," Recommender systems handbook, Springer, 2011, pp. 257--297.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. H. Turpin and W. Hersh, "Why batch and user evaluations do not give the same results," Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2001, pp. 225--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Beel, B. Gipp, S. Langer, and M. Genzmehr, "Docear: An Academic Literature Suite for Searching, Organizing and Creating Academic Literature," Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, ACM, 2011, pp. 465--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Beel, S. Langer, M. Genzmehr, and A. Nürnberger, "Introducing Docear's Research Paper Recommender System," Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'13), ACM, 2013, pp. 459--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. D. Bollacker, S. Lawrence, and C. L. Giles, "CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications," Proceedings of the 2nd international conference on Autonomous agents, ACM, 1998, pp. 116--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. M. McNee, N. Kapoor, and J. A. Konstan, "Don't look stupid: avoiding pitfalls when recommending research papers," Proceedings of the 20th anniversary conference on Computer supported cooperative work, ProQuest, 2006, pp. 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems (TOIS), vol. 22, 2004, pp. 5--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Ricci, L. Rokach, B. Shapira, and K. B. P., "Recommender systems handbook," Recommender Systems Handbook, 2011, pp. 1--35.Google ScholarGoogle Scholar
  20. J. Beel, S. Langer, and M. Genzmehr, "Sponsored vs. Organic (Research Paper) Recommendations and the Impact of Labeling," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: 2013, pp. 395--399.Google ScholarGoogle Scholar
  21. J. Beel, S. Langer, A. Nürnberger, and M. Genzmehr, "The Impact of Demographics (Age and Gender) and Other User Characteristics on Evaluating Recommender Systems," Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), T. Aalberg, M. Dobreva, C. Papatheodorou, G. Tsakonas, and C. Farrugia, eds., Valletta, Malta: Springer, 2013, pp. 400--404.Google ScholarGoogle Scholar
  22. T. A. Brooks, "Private acts and public objects: an investigation of citer motivations," Journal of the American Society for Information Science, vol. 36, 1985, pp. 223--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Liu, "Progress in documentation the complexities of citation practice: a review of citation studies," Journal of Documentation, vol. 49, 1993, pp. 370--408.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. H. MacRoberts and B. MacRoberts, "Problems of Citation Analysis," Scientometrics, vol. 36, 1996, pp. 435--444.Google ScholarGoogle ScholarCross RefCross Ref
  25. X. Amatriain, J. Pujol, and N. Oliver, "I like it... i like it not: Evaluating user ratings noise in recommender systems," User Modeling, Adaptation, and Personalization, 2009, pp. 247--258. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!