Abstract
Colonoscopy, the visual inspection of the large bowel using an endoscope, offers protection against colorectal cancer by allowing for the detection and removal of pre-cancerous polyps. The literature on polyp detection shows widely varying miss rates among clinicians, with averages ranging around 22%--27%. While recent work has considered the use of AI support systems for polyp detection, how to visualise and integrate these systems into clinical practice is an open question. In this work, we explore the design of visual markers as used in an AI support system for colonoscopy. Supported by the gastroenterologists in our team, we designed seven unique visual markers and rendered them on real-life patient video footage. Through an online survey targeting relevant clinical staff (N = 36), we evaluated these designs and obtained initial insights and understanding into the way in which clinical staff envision AI to integrate in their daily work-environment. Our results provide concrete recommendations for the future deployment of AI support systems in continuous, adaptive scenarios.
- Omer F. Ahmad, Antonio S. Soares, Evangelos Mazomenos, Patrick Brandao, Roser Vega, Edward Seward, Danail Stoyanov, Manish Chand, and Laurence B. Lovat. 2019. Artificial intelligence and computer-aided diagnosis in colonoscopy: Current evidence and future directions. Lancet Gastroenterol. Hepatol. 4, 1 (2019), 71--80. DOI:https://doi.org/10.1016/S2468-1253(18)30282-6Google Scholar
- Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for human-AI interaction. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300233Google Scholar
- J. Bernal, J. Sánchez, and F. Vilariño. 2012. Towards automatic polyp detection with a polyp appearance model. Pattern Recog. 45, 9 (2012), 3166--3182. DOI:https://doi.org/10.1016/j.patcog.2012.03.002Google Scholar
Digital Library
- Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. “It’s reducing a human being to a percentage”: Perceptions of justice in algorithmic decisions. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. DOI:https://doi.org/10.1145/3173574.3173951Google Scholar
- Florian Block, Victoria Hodge, Stephen Hobson, Nick Sephton, Sam Devlin, Marian F. Ursu, Anders Drachen, and Peter I. Cowling. 2018. Narrative bytes: Data-driven content production in esports. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (TVX’18). ACM, New York, NY, 29--41. DOI:https://doi.org/10.1145/3210825.3210833Google Scholar
- G. Bradski. 2000. The OpenCV library. Dr. Dobb’s Journal of Software Tools 25 (2000), 120--125.Google Scholar
- F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal. 2018. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Canc. J. Clinic. 68, 6 (11 2018), 394--424.Google Scholar
- Amaury Bréhéret. 2017. Pixel Annotation Tool. Retrieved from https://github.com/abreheret/PixelAnnotationTool.Google Scholar
- Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, and Michael Terry. 2019. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300234Google Scholar
- Paul Cairns. 2019. Doing Better Statistics in Human-Computer Interaction. Cambridge University Press. DOI:https://doi.org/10.1017/9781108685139Google Scholar
- S. C. Chen and D. K. Rex. 2007. Endoscopist can be more powerful than age and male gender in predicting adenoma detection at colonoscopy. Amer. J. Gastroenterol. 102, 4 (Apr. 2007), 856--861.Google Scholar
- Eli P. Cox. 1980. The optimal number of response alternatives for a scale: A review. J. Market. Res. 17, 4 (1980), 407--422. Retrieved from http://www.jstor.org/stable/3150495.Google Scholar
Cross Ref
- S. S. Deeb. 2005. The molecular basis of variation in human color vision. Clin. Gen. 67, 5 (May 2005), 369--377.Google Scholar
- Alan Dix, Janet Finlay, Gregory D. Abowd, and Russell Beale. 2004. Human-Computer Interaction. Pearson/Prentice-Hall, Harlow, England New York.Google Scholar
- Endoscopic Classification Review Group. 2005. Update on the Paris classification of superficial neoplastic lesions in the digestive tract. Endoscopy 37, 6 (2005), 570--578. DOI:https://doi.org/10.1055/s-2005-861352Google Scholar
- European Colorectal Cancer Screening Guidelines Working Group. 2013. European guidelines for quality assurance in colorectal cancer screening and diagnosis: Overview and introduction to the full Supplement publication. Endoscopy 45, 01 (2013), 51--59. DOI:https://doi.org/10.1055/s-0032-1325997Google Scholar
- Kraig Finstad. 2010. Response interpolation and scale sensitivity: Evidence against 5-point scales. J. Usab. Stud. 5, 3 (May 2010), 104--110.Google Scholar
- M. Ganz, X. Yang, and G. Slabaugh. 2012. Automatic segmentation of polyps in colonoscopic narrow-band imaging data. IEEE Trans. Biomed. Eng. 59, 8 (Aug. 2012), 2144--2151. DOI:https://doi.org/10.1109/TBME.2012.2195314Google Scholar
Cross Ref
- Kelly Creighton Graham and Maria Cvach. 2010. Monitor alarm fatigue: Standardizing use of physiological monitors and decreasing nuisance alarms. Amer. J. Crit. Care 19, 1 (2010), 28--34. DOI:https://doi.org/10.4037/ajcc2010651Google Scholar
Cross Ref
- Hajime Hata, Hideki Koike, and Yoichi Sato. 2016. Visual guidance with unnoticed blur effect. In Proceedings of the International Working Conference on Advanced Visual Interfaces (AVI’16). ACM, New York, NY, 28--35. DOI:https://doi.org/10.1145/2909132.2909254Google Scholar
Digital Library
- Andreas Holzinger, Chris Biemann, Constantinos S. Pattichis, and Douglas B. Kell. 2017. What do we need to build explainable AI systems for the medical domain? arXiv e-prints (Dec 2017), arXiv:1712.09923.Google Scholar
- Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, and Heimo Müller. 2019. Causability and explainability of artificial intelligence in medicine. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 9, 4 (2019), e1312. DOI:https://doi.org/10.1002/widm.1312Google Scholar
Cross Ref
- Ahmed Hosny, Chintan Parmar, John Quackenbush, Lawrence H. Schwartz, and Hugo J. W. L. Aerts. 2018. Artificial intelligence in radiology. Nat. Rev. Canc. 18, 8 (2018), 500--510. DOI:https://doi.org/10.1038/s41568-018-0016-5Google Scholar
Cross Ref
- N. Howlader, A. M. Noone, M. Krapcho, D. Miller, A. Brest, M. Yu, J. Ruhl, Z. Tatalovich, A. Mariotto, D. R. Lewis, H. S. Chen, E. J. Feuer, and K. A. Cronin. 2018. SEER Cancer Statistics Review, 1975--2016, National Cancer Institute. Retrieved from https://seer.cancer.gov/csr/1975_2016/.Google Scholar
- Djenaba A. Joseph, Reinier G. S. Meester, Ann G. Zauber, Diane L. Manninen, Linda Winges, Fred B. Dong, Brandy Peaker, and Marjolein van Ballegooijen. 2016. Colorectal cancer screening: Estimated future colonoscopy need and current volume and capacity. Cancer 122, 16 (2016), 2479--2486. DOI:https://doi.org/10.1002/cncr.30070Google Scholar
- Maurits Clemens Kaptein, Clifford Nass, and Panos Markopoulos. 2010. Powerful and consistent analysis of Likert-type rating scales. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 2391--2394. DOI:https://doi.org/10.1145/1753326.1753686Google Scholar
Digital Library
- Matthew Kay and Jacob O. Wobbrock. 2019. ARTool: Aligned Rank Transform for Nonparametric Factorial ANOVAs. DOI:https://doi.org/10.5281/zenodo.594511 R package version 0.10.6.Google Scholar
- Jacob Kittley-Davies, Ahmed Alqaraawi, Rayoung Yang, Enrico Costanza, Alex Rogers, and Sebastian Stein. 2019. Evaluating the effect of feedback from different computer vision processing stages: A comparative lab study. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300273Google Scholar
- Thomas R. Knapp. 1990. Treating ordinal scales as interval scales: An attempt to resolve the controversy. Nurs. Res. 39, 2 (1990), 121--123. DOI:https://doi.org/10.1097/00006199-199003000-00019Google Scholar
Cross Ref
- Chang Kyun Lee, Dong Il Park, Suck-Ho Lee, Young Hwangbo, Chang Soo Eun, Dong Soo Han, Jae Myung Cha, Bo-In Lee, and Jeong Eun Shin. 2011. Participation by experienced endoscopy nurses increases the detection rate of colon polyps during a screening colonoscopy: A multicenter, prospective, randomized study. Gastroint. Endosc. 74, 5 (2011), 1094--1102. DOI:https://doi.org/10.1016/j.gie.2011.06.033Google Scholar
- A. M. Leufkens, M. G. van Oijen, F. P. Vleggaar, and P. D. Siersema. 2012. Factors influencing the miss rate of polyps in a back-to-back colonoscopy study. Endoscopy 44, 5 (May 2012), 470--475. DOI:https://doi.org/10.1055/s-0031-1291666Google Scholar
- J. S. Mandel, J. H. Bond, T. R. Church, D. C. Snover, G. M. Bradley, L. M. Schuman, and F. Ederer. 1993. Reducing mortality from colorectal cancer by screening for fecal occult blood. Minnesota Colon Cancer Control Study. New Eng. J. Med. 328, 19 (May 1993), 1365--1371.Google Scholar
- Gary Marcus and Ernest Davis. 2019. Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon.Google Scholar
- Jon McCormack, Toby Gifford, Patrick Hutchings, Maria Teresa Llano Rodriguez, Matthew Yee-King, and Mark d’Inverno. 2019. In a silent way: Communication between AI and improvising musicians beyond sound. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300268Google Scholar
- R. A. Miller and F. E. Masarie. 1990. The demise of the “Greek Oracle” model for medical diagnostic systems. Meth. Inf. Med. 29, 1 (Jan. 1990), 1--2.Google Scholar
Cross Ref
- Masashi Misawa, Shin-ei Kudo, Yuichi Mori, Tomonari Cho, Shinichi Kataoka, Akihiro Yamauchi, Yushi Ogawa, Yasuharu Maeda, Kenichi Takeda, Katsuro Ichimasa, et al. 2018. Artificial intelligence-assisted polyp detection for colonoscopy: Initial experience. Gastroenterology 154, 8 (2018), 2027--2029.Google Scholar
Cross Ref
- Mark A. Musen, Blackford Middleton, and Robert A. Greenes. 2014. Clinical Decision-Support Systems. Springer London, London, 643--674. DOI:https://doi.org/10.1007/978-1-4471-4474-8_22Google Scholar
- Joshua Newn, Ronal Singh, Fraser Allison, Prashan Madumal, Eduardo Velloso, and Frank Vetere. 2019. Designing interactions with intention-aware gaze-enabled artificial agents. In Human-Computer Interaction -- INTERACT 2019. Springer International Publishing, Cham, 255--281.Google Scholar
- Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I lead, you help but only with enough details: Understanding user experience of co-creation with artificial intelligence. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. DOI:https://doi.org/10.1145/3173574.3174223Google Scholar
- Minna Pakanen, Jussi Huhtala, and Jonna Häkkilä. 2011. Location visualization in social media applications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 2439--2448. DOI:https://doi.org/10.1145/1978942.1979298Google Scholar
Digital Library
- S. Y. Park, D. Sargent, I. Spofford, K. G. Vosburgh, and Y. A.-Rahim. 2012. A colon video analysis framework for polyp detection. IEEE Trans. Biomed. Eng. 59, 5 (May 2012), 1408--1418. DOI:https://doi.org/10.1109/TBME.2012.2188397Google Scholar
- Hans-Peter Piepho. 2004. An algorithm for a letter-based representation of all-pairwise comparisons. J. Computat. Graph. Statist. 13, 2 (2004), 456--466.Google Scholar
Cross Ref
- C. Pox, W. Schmiegel, and M. Classen. 2007. Current status of screening colonoscopy in Europe and in the United States. Endoscopy 39, 2 (2007), 168--173.Google Scholar
Cross Ref
- Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brand on Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. 2017. CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv e-prints (Nov. 2017), arXiv:1711.05225.Google Scholar
- Colin J. Rees, Siwan Thomas Gibson, Matt D. Rutter, Phil Baragwanath, Rupert Pullan, Mark Feeney, and Neil Haslam. 2016. UK key performance indicators and quality assurance standards for colonoscopy. Gut 65, 12 (2016), 1923--1929. DOI:https://doi.org/10.1136/gutjnl-2016-312044Google Scholar
- Douglas K. Rex. 2017. Polyp detection at colonoscopy: Endoscopist and technical factors. Best Pract. Res. Clin. Gastroent. 31, 4 (2017), 425--433. DOI:https://doi.org/10.1016/j.bpg.2017.05.010Google Scholar
- Shihab Sarwar, Anglin Dent, Kevin Faust, Maxime Richer, Ugljesa Djuric, Randy Van Ommeren, and Phedias Diamandis. 2019. Physician perspectives on integration of artificial intelligence into diagnostic pathology. npj Digital Medicine 2, 1 (2019), 28. DOI:https://doi.org/10.1038/s41746-019-0106-0Google Scholar
- Graham Thomas. 2007. Real-time camera tracking using sports pitch markings. J. Real-time Image Proc. 2, 2 (1 Nov. 2007), 117--132. DOI:https://doi.org/10.1007/s11554-007-0041-1Google Scholar
- Niels van Berkel, Jorge Goncalves, Danula Hettiachchi, Senuri Wijenayake, Ryan M. Kelly, and Vassilis Kostakos. 2019. Crowdsourcing perceptions of fair predictors for machine learning: A recidivism case study. Proc. ACM Hum.-comput. Interact. 3, CSCW (2019), 21. DOI:https://doi.org/10.1145/3359130Google Scholar
- Niels van Berkel, Lefteris Papachristos, Anastasia Giachanou, Simo Hosio, and Mikael B. Skov. 2020. A systematic assessment of national artificial intelligence policies: Perspectives from the Nordics and beyond. In Proceedings of the 11th Nordic Conference on Human-computer Interaction (NordiCHI’20). 1--18.Google Scholar
- Jeroen C. van Rijn, Johannes B. Reitsma, Jaap Stoker, Patrick M. Bossuyt, Sander J. van Deventer, and Evelien Dekker. 2006. Polyp miss rate determined by tandem colonoscopy: A systematic review. Amer. J. Gastroent. 101, 2 (2006).Google Scholar
- Jonathan B. VanGeest, Timothy P. Johnson, and Verna L. Welch. 2007. Methodologies for improving response rates in surveys of physicians: A systematic review. Eval. Health Prof. 30, 4 (2007), 303--321. DOI:https://doi.org/10.1177/0163278707307899Google Scholar
- Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. DOI:https://doi.org/10.1145/3173574.3174014Google Scholar
- Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y. Lim. 2019. Designing theory-driven user-centric explainable AI. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300831Google Scholar
- Pu Wang, Tyler M. Berzin, Jeremy Romek Glissen Brown, Shishira Bharadwaj, Aymeric Becq, Xun Xiao, Peixi Liu, Liangping Li, Yan Song, Di Zhang, Yi Li, Guangre Xu, Mengtian Tu, and Xiaogang Liu. 2019. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: A prospective randomised controlled study. Gut 68, 10 (2019), 1813--1819. DOI:https://doi.org/10.1136/gutjnl-2018-317500Google Scholar
Cross Ref
- Jacob O. Wobbrock, Leah Findlater, Darren Gergle, and James J. Higgins. 2011. The aligned rank transform for nonparametric factorial analyses using only ANOVA procedures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 143--146. DOI:https://doi.org/10.1145/1978942.1978963Google Scholar
Digital Library
- Jeremy M. Wolfe and Todd S. Horowitz. 2004. What attributes guide the deployment of visual attention and how do they do it? Nat. Rev. Neurosci. 5, 6 (2004), 495--501. DOI:https://doi.org/10.1038/nrn1411Google Scholar
Cross Ref
- Jeremy M. Wolfe and Todd S. Horowitz. 2017. Five factors that guide attention in visual search. Nat. Hum. Behav. 1, 3 (2017), 0058.Google Scholar
Cross Ref
- Qian Yang, Aaron Steinfeld, and John Zimmerman. 2019. Unremarkable AI: Fitting intelligent decision support into critical, clinical decision-making processes. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300468Google Scholar
- Kun-Hsing Yu, Andrew L. Beam, and Isaac S. Kohane. 2018. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 10 (2018), 719.Google Scholar
Cross Ref
- Shengbing Zhao, Shuling Wang, Peng Pan, Tian Xia, Xin Chang, Xia Yang, Liliangzi Guo, Qianqian Meng, Fan Yang, Wei Qian, Zhichao Xu, Yuanqiong Wang, Zhijie Wang, Lun Gu, Rundong Wang, Fangzhou Jia, Jun Yao, Zhaoshen Li, and Yu Bai. 2019. Magnitude, risk factors, and factors associated with adenoma miss rate of tandem colonoscopy: A systematic review and meta-analysis. Gastroenterology 156, 6 (1 May 2019), 1661--1674.e11. DOI:https://doi.org/10.1053/j.gastro.2019.01.260Google Scholar
- Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-sensitive algorithm design: Method, case study, and lessons. Proc. ACM Hum.-comput. Interact. 2, CSCW (Nov. 2018). DOI:https://doi.org/10.1145/3274463Google Scholar
Index Terms
Designing Visual Markers for Continuous Artificial Intelligence Support: A Colonoscopy Case Study
Recommendations
Initial Responses to False Positives in AI-Supported Continuous Interactions: A Colonoscopy Case Study
The use of artificial intelligence (AI) in clinical support systems is increasing. In this article, we focus on AI support for continuous interaction scenarios. A thorough understanding of end-user behaviour during these continuous human-AI interactions, ...
Artificial intelligence in acute respiratory distress syndrome: A systematic review
Abstract Background and objectiveAcute respiratory distress syndrome (ARDS) is a life-threatening pulmonary disease with a high clinical and cost burden across the globe. Artificial intelligence (AI), an emerging area, has been used ...
Highlights- This systematic review summarizes the currently available literatures on applications of artificial intelligence in ARDS
Designing for Continuous Interaction with Artificial Intelligence Systems
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing SystemsThe increasing capabilities of Artificial Intelligence enable the support of users in a continuously growing number of applications. Current systems typically dictate that interaction between user input and AI output unfolds in discrete steps, as is ...






Comments