Abstract
Measuring theoretical concepts, so-called constructs, is a central challenge of Player Experience research. Building on recent work in HCI and psychology, we conducted a systematic literature review to study the transparency of measurement reporting. We accessed the ACM Digital Library to analyze all 48 full papers published at CHI PLAY 2020, of those, 24 papers used self-report measurements and were included in the full review. We assessed specifically, whether researchers reported What, How and Why they measured. We found that researchers matched their measures to the construct under study and that administrative details, such as number of points on a Likert-type scale, were frequently reported. However, definitions of the constructs to be measured and justifications for selecting a particular scale were sparse. Lack of transparency in these areas threaten the validity of singular studies, but further compromise the building of theories and accumulation of research knowledge in meta-analytic work. This work is limited to only assessing the current transparency of measurement reporting at CHI PLAY 2020, however we argue this constitutes a fair foundation to assess potential pitfalls. To address these pitfalls, we propose a prescriptive model of a measurement selection process, which aids researchers to systematically define their constructs, specify operationalizations, and justify why these measures were chosen. Future research employing this model should contribute to more transparency in measurement reporting. The research was funded through internal resources. All materials are available on https://osf.io/4xz2v/.
- Vero Vanden Abeele, Katta Spiel, Lennart Nacke, Daniel Johnson, and Kathrin Gerling. 2020. Development and validation of the player experience inventory: A scale to measure player experiences at the level of functional and psychosocial consequences. International Journal of Human-Computer Studies 135 (2020), 102--370. https: //doi.org/10.1016/j.ijhcs.2019.102370Google Scholar
Digital Library
- CHI PLAY ACM. 2021. Contribution Types and Evaluation Criteria. https://chiplay.acm.org/2021/table-1/Google Scholar
- John Antonakis, Nicolas Bastardoz, Philippe Jacquart, and Boas Shamir. 2016. Charisma: An Ill-Defined and Ill- Measured Gift. Annual Review of Organizational Psychology and Organizational Behavior 3, 1 (Mar. 2016), 293--319. https://doi.org/10.1146/annurev-orgpsych-041015-062305Google Scholar
Cross Ref
- Mark Appelbaum, Harris Cooper, Rex B. Kline, Evan Mayo-Wilson, Arthur M. Nezu, and Stephen M. Rao. 2018. Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist 73, 1 (2018), 3--25. https://doi.org/10.1037/amp0000191Google Scholar
Cross Ref
- American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational, and Psychological Testing (US). 1999. Standards for educational and psychological testing. Amer Educational Research Assn, Washington, DC, USA.Google Scholar
- Javier A. Bargas-Avila and Kasper Hornbæk. 2011. Old Wine in New Bottles or Novel Challenges: A Critical Analysis of Empirical Studies of User Experience. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI '11). Association for Computing Machinery, New York, NY, USA, 2689--2698. https://doi.org/10.1145/1978942.1979336Google Scholar
Digital Library
- Adam E. Barry, Beth Chaney, Anna K. Piazza-Gardner, and Enmanuel A. Chavarria. 2014. Validity and Reliability Reporting Practices in the Field of Health Education and Behavior: A Review of Seven Journals. Health Education & Behavior 41, 1 (2014), 12--18. https://doi.org/10.1177/1090198113483139Google Scholar
Cross Ref
- Denny Borsboom. 2006. The attack of the psychometricians. Psychometrika 71, 3 (Sep. 2006), 425--440. https: //doi.org/10.1007/s11336-006--1447--6Google Scholar
Cross Ref
- Katreen Boustani, Anne C. Tally, Yu Ra Kim, and Christena Nippert-Eng. 2020. Gaming the Name: Player Strategies for Adapting to Name Constraints in Online Videogames. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play (Virtual Event, Canada) (CHI PLAY '20). Association for Computing Machinery, New York, NY, USA, 120--131. https://doi.org/10.1145/3410404.3414259Google Scholar
Digital Library
- Florian Brühlmann and Elisa D. Mekler. 2018. Surveys in Games User Research. In Games User Research, Anders Drachen, Pejman Mirza-Babaei, and Lennart Nacke (Eds.). Oxford University Press, Oxford, Chapter 9, 141--162.Google Scholar
- Florian Brühlmann and Gian-Marco Schmid. 2015. How to Measure the Game Experience?: Analysis of the Factor Structure of Two Questionnaires. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, USA, 1181--1186. https://doi.org/10.1145/2702613.2732831Google Scholar
Digital Library
- Marcus Carter, John Downs, Bjorn Nansen, Mitchell Harrop, and Martin Gibbs. 2014. Paradigms of Games Research in HCI: A Review of 10 Years of Research at CHI. In Proceedings of the First ACM SIGCHI Annual Symposium on Computer-Human Interaction in Play (Toronto, Ontario, Canada) (CHI PLAY '14). Association for Computing Machinery, New York, NY, USA, 27--36. https://doi.org/10.1145/2658537.2658708Google Scholar
Digital Library
- David Chan. 2009. So why ask me? Are self-report data really that bad. In Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences, Charles E Lance and Robert J Vandenberg (Eds.). Taylor & Francis, Abingdon, UK, Chapter 13, 309--336.Google Scholar
- Lewis L. Chuang and Ulrike Pfeil. 2018. Transparency and Openness Promotion Guidelines for HCI. Association for Computing Machinery, New York, NY, USA, 1--4. https://doi.org/10.1145/3170427.3185377 Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CHI PLAY, Article 233. Publication date: October 2021. 233:20 Lena Fanya Aeschbach, et al.Google Scholar
Digital Library
- James M. Conway and Charles E. Lance. 2010. What Reviewers Should Expect from Authors Regarding Common Method Bias in Organizational Research. Journal of Business and Psychology 25, 3 (May 2010), 325--334. https: //doi.org/10.1007/s10869-010--9181--6Google Scholar
Cross Ref
- John Dawes. 2008. Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales. International Journal of Market Research 50, 1 (Jan. 2008), 61--104. https: //doi.org/10.1177/147078530805000106Google Scholar
Cross Ref
- Edward L. Deci and Richard M. Ryan. 2003. Intrinsic motivation inventory. Self-determination theory 267 (2003).Google Scholar
- Robert F. DeVellis. 2016. Scale development: Theory and applications. Vol. 26. Sage publications, Los Angeles, CA, USA.Google Scholar
- Florian Echtler and Maximilian Häußler. 2018. Open Source, Open Science, and the Replication Crisis in HCI. Association for Computing Machinery, New York, NY, USA, 1--8. https://doi.org/10.1145/3170427.3188395Google Scholar
- Matthias Egger, George Davey Smith, and Jonathan AC Sterne. 2001. Uses and abuses of meta-analysis. Clinical Medicine 1, 6 (2001), 478. https://doi.org/10.7861/clinmedicine.1--6--478Google Scholar
Cross Ref
- Panteleimon Ekkekakis and James A. Russell. 2013. The Measurement of Affect, Mood, and Emotion: A Guide for Health-Behavioral Research. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9780511820724Google Scholar
- Jessica Kay Flake and Eiko I. Fried. 2020. Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science 3, 4 (2020), 456--465. https: //doi.org/10.1177/2515245920952393Google Scholar
- Saul Greenberg and Bill Buxton. 2008. Usability Evaluation Considered Harmful (Some of the Time). In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI '08). Association for Computing Machinery, New York, NY, USA, 111--120. https://doi.org/10.1145/1357054.1357074Google Scholar
Digital Library
- Olivia Guest and Andrea E. Martin. 2021. How Computational Modeling Can Force Theory Building in Psychological Science. Perspectives on Psychological Science 16, 4 (2021), 789--802. https://doi.org/10.1177/1745691620970585 arXiv:https://doi.org/10.1177/1745691620970585Google Scholar
Cross Ref
- Megan L. Head, Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions. 2015. The Extent and Consequences of P-Hacking in Science. PLOS Biology 13, 3 (Mar. 2015), 1--15. https://doi.org/10.1371/journal.pbio. 1002106Google Scholar
Cross Ref
- Kenneth D Hopkins. 1998. Educational and psychological measurement and evaluation. Pearson, London, UK.Google Scholar
- Kasper Hornbæk. 2006. Current practice in measuring usability: Challenges to usability studies and research. International Journal of Human-Computer Studies 64, 2 (2006), 79--102. https://doi.org/10.1016/j.ijhcs.2005.06.002Google Scholar
Digital Library
- Wijnand IJsselsteijn, Yvonne De Kort, and Karolien Poels. 2013. The Game Experience Questionnaire. Eindhoven: Technische Universiteit Eindhoven.Google Scholar
- Daniel Johnson, M. John Gardner, and Ryan Perry. 2018. Validation of two game experience scales: The Player Experience of Need Satisfaction (PENS) and Game Experience Questionnaire (GEQ). International Journal of Human- Computer Studies 118 (2018), 38 -- 46. https://doi.org/10.1016/j.ijhcs.2018.05.003Google Scholar
Cross Ref
- E. F. Juniper. 2009. Validated questionnaires should not be modified. European Respiratory Journal 34, 5 (Oct. 2009), 1015--1017. https://doi.org/10.1183/09031936.00110209Google Scholar
Cross Ref
- Effie L.-C. Law, Florian Brühlmann, and Elisa D. Mekler. 2018. Systematic Review and Validation of the Game Experience Questionnaire (GEQ) - Implications for Citation and Reporting Practice. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play (Melbourne, VIC, Australia) (CHI PLAY '18). Association for Computing Machinery, New York, NY, USA, 257--270. https://doi.org/10.1145/3242671.3242683Google Scholar
- Effie L.-C. Law, Paul van Schaik, and Virpi Roto. 2014. Attitudes towards user experience (UX) measurement. International Journal of Human-Computer Studies 72, 6 (2014), 526 -- 541. https://doi.org/10.1016/j.ijhcs.2013.09.006 Interplay between User Experience Evaluation and System Development.Google Scholar
Cross Ref
- James R. Lewis. 2014. Usability: Lessons Learned ... and Yet to Be Learned. International Journal of Human-Computer Interaction 30, 9 (2014), 663--684. https://doi.org/10.1080/10447318.2014.930311Google Scholar
Cross Ref
- Elizabeth F. Loftus and Guido Zanni. 1975. Eyewitness testimony: The influence of the wording of a question. Bulletin of the Psychonomic Society 5, 1 (Jan. 1975), 86--88. https://doi.org/10.3758/bf03336715Google Scholar
Cross Ref
- Scott B. MacKenzie. 2003. The Dangers of Poor Construct Conceptualization. Journal of the Academy of Marketing Science 31, 3 (Jun. 2003), 323--326. https://doi.org/10.1177/0092070303031003011Google Scholar
Cross Ref
- Regan Mandryk. 2020. PACM Statement. https://chiplay.acm.org/2021/pacm/Google Scholar
- Andrew Maul. 2017. Rethinking Traditional Methods of Survey Validation. Measurement: Interdisciplinary Research and Perspectives 15, 2 (2017), 51--69. https://doi.org/10.1080/15366367.2017.1348108Google Scholar
Cross Ref
- Kent L. Norman. 2013. GEQ (Game Engagement/Experience Questionnaire): A Review of Two Papers. Interacting with Computers 25, 4 (Mar. 2013), 278--283. https://doi.org/10.1093/iwc/iwt009Google Scholar
Cross Ref
- Matthew J. Page, Joanne E. McKenzie, Patrick M. Bossuyt, Isabelle Boutron, Tammy C. Hoffmann, Cynthia D. Mulrow, Larissa Shamseer, Jennifer M. Tetzlaff, Elie A. Akl, Sue E. Brennan, Roger Chou, Julie Glanville, Jeremy M. Grimshaw, Asbjørn Hróbjartsson, Manoj M. Lalu, Tianjing Li, Elizabeth W. Loder, Evan Mayo-Wilson, Steve McDonald, Luke A. Proc. ACM Hum.-Comput. Interact., Vol. 5, No. CHI PLAY, Article 233. Publication date: October 2021. Systematic Literature Review of Transparency in Measurement Reporting 233:21 McGuinness, Lesley A. Stewart, James Thomas, Andrea C. Tricco, Vivian A. Welch, Penny Whiting, and David Moher. 2021. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Systematic Reviews 10, 1 (29 Mar 2021), 89. https://doi.org/10.1186/s13643-021-01626--4Google Scholar
- Elazar J Pedhazur and Liora Pedhazur Schmelkin. 2013. Measurement, design, and analysis: An integrated approach. psychology press, Hove, East Sussex, UK.Google Scholar
- Ingrid Pettersson, Florian Lachner, Anna-Katharina Frison, Andreas Riener, and Andreas Butz. 2018. A Bermuda Triangle? A Review of Method Application and Triangulation in User Experience Evaluation. Association for Computing Machinery, New York, NY, USA, 1--16. https://doi.org/10.1145/3173574.3174035Google Scholar
Digital Library
- Karolien Poels, Yvonne De Kort, and Wijnand IJsselsteijn. 2007. D3.3 : Game Experience Questionnaire: development of a self-report measure to assess the psychological impact of digital games. Eindhoven: Technische Universiteit Eindhoven.Google Scholar
- Shaghayegh Roohi, Asko Relas, Jari Takatalo, Henri Heiskanen, and Perttu Hämäläinen. 2020. Predicting Game Difficulty and Churn Without Players. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play (Virtual Event, Canada) (CHI PLAY '20). Association for Computing Machinery, New York, NY, USA, 585--593. https://doi.org/10.1145/3410404.3414235Google Scholar
Digital Library
- R Nevitt Sanford, Theodor W Adorno, Else Frenkel-Brunswik, and Daniel J Levinson. 1950. The measurement of implicit antidemocratic trends. The authoritarian personality (1950), 222--279.Google Scholar
- Liam P. Satchell, Dean Fido, Craig A. Harper, Heather Shaw, Brittany Davidson, David A. Ellis, Claire M. Hart, Rahul Jalil, Alice Jones Bartoli, Linda K. Kaye, Gary L. J. Lancaster, and Melissa Pavetich. 2020. Development of an Offline- Friend Addiction Questionnaire (O-FAQ): Are most people really social addicts? Behavior Research Methods 52, 5 (24 Sep 2020). https://doi.org/10.3758/s13428-020-01462--9Google Scholar
- Klaus R. Scherer. 2005. What are emotions? And how can they be measured? Social Science Information 44, 4 (2005), 695--729. https://doi.org/10.1177/0539018405058216Google Scholar
- Donald Sharpe. 2013. Why the resistance to statistical innovations? Bridging the communication gap. Psychological Methods 18, 4 (2013), 572--582. https://doi.org/10.1037/a0034177Google Scholar
Cross Ref
- Pamela Shoemaker, James W. Tankard, and Dominic L. Lasorsa. 2004. Theoretical concepts: The building blocks of theory. In How to Build Social Science Theories. SAGE Publications, Inc., Los Angeles, CA, 15--36. https://doi.org/10. 4135/9781412990110Google Scholar
- Brent D. Slife, Casey D. Wright, and Stephen C. Yanchar. 2016. Using Operational Definitions in Research: A Best- Practices Approach. The Journal of Mind and Behavior 37, 2 (2016), 119--139. http://www.jstor.org/stable/44631540Google Scholar
- Velvet Spors, Gisela Reyes Cruz, H. R. Cameron, Martin Flintham, Pat Brundell, and David Murphy. 2020. Plastic Buttons, Complex People: An Ethnomethodology-Informed Ethnography of a Video Game Museum. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play (Virtual Event, Canada) (CHI PLAY '20). Association for Computing Machinery, New York, NY, USA, 594--605. https://doi.org/10.1145/3410404.3414234Google Scholar
Digital Library
- April Tyack and Elisa D. Mekler. 2020. Self-Determination Theory in HCI Games Research: Current Uses and Open Questions. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--22. https://doi.org/10.1145/3313831.3376723Google Scholar
Digital Library
- April Tyack and Elisa D. Mekler. 2021. Off-Peak: An Examination of Ordinary Player Experience. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 115, 12 pages. https://doi.org/10.1145/3411764.3445230Google Scholar
- Elina Vessonen. 2020. Respectful operationalism. Theory & Psychology 31, 1 (2020), 84--105. https://doi.org/10.1177/ 0959354320945036Google Scholar
Cross Ref
- Jan B. Vornhagen, April Tyack, and Elisa D. Mekler. 2020. Statistical Significance Testing at CHI PLAY: Challenges and Opportunities for More Transparency. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play. ACM, New York, NY, USA, 4--18. https://doi.org/10.1145/3410404.3414229Google Scholar
- Chat Wacharamanotham, Lukas Eisenring, Steve Haroz, and Florian Echtler. 2020. Transparency of CHI Research Artifacts: Results of a Self-Reported Survey. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--14. https: //doi.org/10.1145/3313831.3376448Google Scholar
Digital Library
- Larry J. Williams and Stella E. Anderson. 1994. An Alternative Approach to Method Effects by Using Latent-Variable Models: Applications in Organizational Behavior Research. Journal of Applied Psychology 79, 3 (1994), 323 -- 331. https://doi.org/10.1037/0021--9010.79.3.323Google Scholar
Cross Ref
- Jacob O. Wobbrock and Julie A. Kientz. 2016. Research Contributions in Human-Computer Interaction. Interactions 23, 3 (Apr. 2016), 38--44. https://doi.org/10.1145/2907069Google Scholar
Digital Library
Recommendations
Accuracy of inter-researcher similarity measures based on topical and social clues
Scientific literature recommender systems (SLRSs) provide papers to researchers according to their scientific interests. Systems rely on inter-researcher similarity measures that are usually computed according to publication contents (i.e., by ...
Thinking About Measures and Measurement in Positivist Research: A Proposal for Refocusing on Fundamentals
We challenge two taken-for-granted assumptions about measurement in positivist research. The first assumption is that measures and measurements are relevant for quantitative, but not qualitative, research. We explain why they apply to both types of ...






Comments