10.1145/2623330.2623654acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedings
research-article

Methods for ordinal peer grading

ABSTRACT

Massive Online Open Courses have the potential to revolutionize higher education with their wide outreach and accessibility, but they require instructors to come up with scalable alternates to traditional student evaluation. Peer grading -- having students assess each other -- is a promising approach to tackling the problem of evaluation at scale, since the number of "graders" naturally scales with the number of students. However, students are not trained in grading, which means that one cannot expect the same level of grading skills as in traditional settings. Drawing on broad evidence that ordinal feedback is easier to provide and more reliable than cardinal feedback [5, 38, 29, 9], it is therefore desirable to allow peer graders to make ordinal statements (e.g. "project X is better than project Y") and not require them to make cardinal statements (e.g. "project X is a B-"). Thus, in this paper we study the problem of automatically inferring student grades from ordinal peer feedback, as opposed to existing methods that require cardinal peer feedback. We formulate the ordinal peer grading problem as a type of rank aggregation problem, and explore several probabilistic models under which to estimate student grades and grader reliability. We study the applicability of these methods using peer grading data collected from a real class --- with instructor and TA grades as a baseline --- and demonstrate the efficacy of ordinal feedback techniques in comparison to existing cardinal peer grading methods. Finally, we compare these peer-grading techniques to traditional evaluation techniques.

References

  1. N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1--23:27, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. J. Arrow. Social Choice and Individual Values. Yale University Press, 2nd edition, Sept. 1970.Google ScholarGoogle Scholar
  3. J. A. Aslam and M. Montague. Models for metasearch. In SIGIR, pages 276--284, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Bachrach, T. Graepel, T. Minka, and J. Guiver. How to grade a test without knowing the answers - a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In ICML, 2012.Google ScholarGoogle Scholar
  5. W. Barnett. The modern theory of consumer behavior: Ordinal or cardinal? The Quarterly Journal of Austrian Economics, 6(1):41--65, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Bashir, J. Anderton, J. Wu, P. B. Golbus, V. Pavlu, and J. A. Aslam. A document rating system for preference judgements. In SIGIR, pages 909--912, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Bouzidi and A. Jaillet. Can online peer assessment be trusted? Educational Technology & Society, 12(4):257--268, 2009.Google ScholarGoogle Scholar
  8. R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):pp. 324--345, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  9. B. Carterette, P. N. Bennett, D. M. Chickering, and S. T. Dumais. Here or there: Preference judgments for relevance. In ECIR, pages 16--27, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C.-C. Chang, K.-H. Tseng, P.-N. Chou, and Y.-H. Chen. Reliability and validity of web-based portfolio peer assessment: A case study for a senior high school's students taking computer course. Comput. Educ., 57(1):1306--1316, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Diez, O. Luaces, A. Alonso-Betanzos, A. Troncoso, and A. Bahamonde. Peer assessment in moocs using preference learning via matrix factorization, 2013.Google ScholarGoogle Scholar
  13. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Freeman and J. W. Parks. How accurate is peer grading? CBE-Life Sciences Education, 9(4):482--488, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Guiver and E. Snelson. Bayesian inference for plackettluce ranking models. In ICML, pages 377--384, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Haber. http://degreeoffreedom.org/between-two-worlds-moocs-and-assessment.Google ScholarGoogle Scholar
  17. J. Haber. http://degreeoffreedom.org/mooc-assignments-screwing/, Oct. 2013.Google ScholarGoogle Scholar
  18. R. Herbrich, T. Minka, and T. Graepel. Trueskill tm: A bayesian skill rating system. In NIPS, pages 569--576, 2007.Google ScholarGoogle Scholar
  19. P. G. Ipeirotis and P. K. Paritosh. Managing crowdsourced human computation: a tutorial. In WWW, pages 287--288, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kendall. Rank correlation methods. Griffin, London, 1948.Google ScholarGoogle Scholar
  21. C. Kenyon-Mathieu and W. Schudy. How to rank with few errors. In STOC, pages 95--103, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Kulkarni, K. Wei, H. Le, D. Chia, K. Papadopoulos, J. Cheng, D. Koller, and S. Klemmer. Peer and self assessment in massive online classes. ACM Trans. CHI, 20(6):33:1--33:31, Dec. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Lebanon and J. D. Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In ICML, pages 363--370, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Lu and C. Boutilier. Learning mallows models with pair-wise preferences. In ICML, pages 145--152, June 2011.Google ScholarGoogle Scholar
  26. T. Lu and C. E. Boutilier. The unavailable candidate model: A decision-theoretic view of social choice. In EC, pages 263--274, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. D. Luce. Individual Choice Behavior: A theoretical analysis. Wiley, 1959.Google ScholarGoogle Scholar
  28. C. L. Mallows. Non-null ranking models. Biometrika, 44(1/2):pp. 114--130, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  29. G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63(2):81--97, March 1956.Google ScholarGoogle Scholar
  30. M. Mostert and J. D. Snowball. Where angels fear to tread: online peer-assessment in a large first-year class. Assessment & Evaluation in Higher Education, 38(6):674--686, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. Niu, Y. Lan, J. Guo, and X. Cheng. Stochastic rank aggregation. CoRR, abs/1309.6852, 2013.Google ScholarGoogle Scholar
  32. C. Piech, J. Huang, Z. Chen, C. Do, A. Ng, and D. Koller. Tuned models of peer assessment in MOOCs. In EDM, 2013.Google ScholarGoogle Scholar
  33. R. L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics), 24(2):193--202, 1975.Google ScholarGoogle Scholar
  34. T. Qin, X. Geng, and T.-Y. Liu. A new probabilistic model for rank aggregation. In NIPS, pages 1948--1956, 2010.Google ScholarGoogle Scholar
  35. V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. JMLR, 11:1297--1322, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Rees. http://www.insidehighered.com/views/2013/03/05/essays-flaws-peer-grading-moocs.Google ScholarGoogle Scholar
  37. N. Shah, J. Bradley, A. Parekh, M. Wainwright, and K. Ramchandran. A case for ordinal peer-evaluation in MOOCs, 2013.Google ScholarGoogle Scholar
  38. N. Stewart, G. D. A. Brown, and N. Chater. Absolute identification by relative judgment. Psychological Review, 112:881--911, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  39. L. L. Thurstone. The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 27:384--400, 1927.Google ScholarGoogle ScholarCross RefCross Ref
  40. M. N. Volkovs and R. S. Zemel. A exible generative model for preference aggregation. In WWW, pages 479--488, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Supplemental Material

p1037-sidebyside.mp4

Index Terms

  1. Methods for ordinal peer grading

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!