skip to main content
research-article

Designing Visual Markers for Continuous Artificial Intelligence Support: A Colonoscopy Case Study

Authors Info & Claims
Published:30 December 2020Publication History
Skip Abstract Section

Abstract

Colonoscopy, the visual inspection of the large bowel using an endoscope, offers protection against colorectal cancer by allowing for the detection and removal of pre-cancerous polyps. The literature on polyp detection shows widely varying miss rates among clinicians, with averages ranging around 22%--27%. While recent work has considered the use of AI support systems for polyp detection, how to visualise and integrate these systems into clinical practice is an open question. In this work, we explore the design of visual markers as used in an AI support system for colonoscopy. Supported by the gastroenterologists in our team, we designed seven unique visual markers and rendered them on real-life patient video footage. Through an online survey targeting relevant clinical staff (N = 36), we evaluated these designs and obtained initial insights and understanding into the way in which clinical staff envision AI to integrate in their daily work-environment. Our results provide concrete recommendations for the future deployment of AI support systems in continuous, adaptive scenarios.

References

  1. Omer F. Ahmad, Antonio S. Soares, Evangelos Mazomenos, Patrick Brandao, Roser Vega, Edward Seward, Danail Stoyanov, Manish Chand, and Laurence B. Lovat. 2019. Artificial intelligence and computer-aided diagnosis in colonoscopy: Current evidence and future directions. Lancet Gastroenterol. Hepatol. 4, 1 (2019), 71--80. DOI:https://doi.org/10.1016/S2468-1253(18)30282-6Google ScholarGoogle Scholar
  2. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for human-AI interaction. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300233Google ScholarGoogle Scholar
  3. J. Bernal, J. Sánchez, and F. Vilariño. 2012. Towards automatic polyp detection with a polyp appearance model. Pattern Recog. 45, 9 (2012), 3166--3182. DOI:https://doi.org/10.1016/j.patcog.2012.03.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. “It’s reducing a human being to a percentage”: Perceptions of justice in algorithmic decisions. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. DOI:https://doi.org/10.1145/3173574.3173951Google ScholarGoogle Scholar
  5. Florian Block, Victoria Hodge, Stephen Hobson, Nick Sephton, Sam Devlin, Marian F. Ursu, Anders Drachen, and Peter I. Cowling. 2018. Narrative bytes: Data-driven content production in esports. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (TVX’18). ACM, New York, NY, 29--41. DOI:https://doi.org/10.1145/3210825.3210833Google ScholarGoogle Scholar
  6. G. Bradski. 2000. The OpenCV library. Dr. Dobb’s Journal of Software Tools 25 (2000), 120--125.Google ScholarGoogle Scholar
  7. F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal. 2018. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Canc. J. Clinic. 68, 6 (11 2018), 394--424.Google ScholarGoogle Scholar
  8. Amaury Bréhéret. 2017. Pixel Annotation Tool. Retrieved from https://github.com/abreheret/PixelAnnotationTool.Google ScholarGoogle Scholar
  9. Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, and Michael Terry. 2019. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300234Google ScholarGoogle Scholar
  10. Paul Cairns. 2019. Doing Better Statistics in Human-Computer Interaction. Cambridge University Press. DOI:https://doi.org/10.1017/9781108685139Google ScholarGoogle Scholar
  11. S. C. Chen and D. K. Rex. 2007. Endoscopist can be more powerful than age and male gender in predicting adenoma detection at colonoscopy. Amer. J. Gastroenterol. 102, 4 (Apr. 2007), 856--861.Google ScholarGoogle Scholar
  12. Eli P. Cox. 1980. The optimal number of response alternatives for a scale: A review. J. Market. Res. 17, 4 (1980), 407--422. Retrieved from http://www.jstor.org/stable/3150495.Google ScholarGoogle ScholarCross RefCross Ref
  13. S. S. Deeb. 2005. The molecular basis of variation in human color vision. Clin. Gen. 67, 5 (May 2005), 369--377.Google ScholarGoogle Scholar
  14. Alan Dix, Janet Finlay, Gregory D. Abowd, and Russell Beale. 2004. Human-Computer Interaction. Pearson/Prentice-Hall, Harlow, England New York.Google ScholarGoogle Scholar
  15. Endoscopic Classification Review Group. 2005. Update on the Paris classification of superficial neoplastic lesions in the digestive tract. Endoscopy 37, 6 (2005), 570--578. DOI:https://doi.org/10.1055/s-2005-861352Google ScholarGoogle Scholar
  16. European Colorectal Cancer Screening Guidelines Working Group. 2013. European guidelines for quality assurance in colorectal cancer screening and diagnosis: Overview and introduction to the full Supplement publication. Endoscopy 45, 01 (2013), 51--59. DOI:https://doi.org/10.1055/s-0032-1325997Google ScholarGoogle Scholar
  17. Kraig Finstad. 2010. Response interpolation and scale sensitivity: Evidence against 5-point scales. J. Usab. Stud. 5, 3 (May 2010), 104--110.Google ScholarGoogle Scholar
  18. M. Ganz, X. Yang, and G. Slabaugh. 2012. Automatic segmentation of polyps in colonoscopic narrow-band imaging data. IEEE Trans. Biomed. Eng. 59, 8 (Aug. 2012), 2144--2151. DOI:https://doi.org/10.1109/TBME.2012.2195314Google ScholarGoogle ScholarCross RefCross Ref
  19. Kelly Creighton Graham and Maria Cvach. 2010. Monitor alarm fatigue: Standardizing use of physiological monitors and decreasing nuisance alarms. Amer. J. Crit. Care 19, 1 (2010), 28--34. DOI:https://doi.org/10.4037/ajcc2010651Google ScholarGoogle ScholarCross RefCross Ref
  20. Hajime Hata, Hideki Koike, and Yoichi Sato. 2016. Visual guidance with unnoticed blur effect. In Proceedings of the International Working Conference on Advanced Visual Interfaces (AVI’16). ACM, New York, NY, 28--35. DOI:https://doi.org/10.1145/2909132.2909254Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Andreas Holzinger, Chris Biemann, Constantinos S. Pattichis, and Douglas B. Kell. 2017. What do we need to build explainable AI systems for the medical domain? arXiv e-prints (Dec 2017), arXiv:1712.09923.Google ScholarGoogle Scholar
  22. Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, and Heimo Müller. 2019. Causability and explainability of artificial intelligence in medicine. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 9, 4 (2019), e1312. DOI:https://doi.org/10.1002/widm.1312Google ScholarGoogle ScholarCross RefCross Ref
  23. Ahmed Hosny, Chintan Parmar, John Quackenbush, Lawrence H. Schwartz, and Hugo J. W. L. Aerts. 2018. Artificial intelligence in radiology. Nat. Rev. Canc. 18, 8 (2018), 500--510. DOI:https://doi.org/10.1038/s41568-018-0016-5Google ScholarGoogle ScholarCross RefCross Ref
  24. N. Howlader, A. M. Noone, M. Krapcho, D. Miller, A. Brest, M. Yu, J. Ruhl, Z. Tatalovich, A. Mariotto, D. R. Lewis, H. S. Chen, E. J. Feuer, and K. A. Cronin. 2018. SEER Cancer Statistics Review, 1975--2016, National Cancer Institute. Retrieved from https://seer.cancer.gov/csr/1975_2016/.Google ScholarGoogle Scholar
  25. Djenaba A. Joseph, Reinier G. S. Meester, Ann G. Zauber, Diane L. Manninen, Linda Winges, Fred B. Dong, Brandy Peaker, and Marjolein van Ballegooijen. 2016. Colorectal cancer screening: Estimated future colonoscopy need and current volume and capacity. Cancer 122, 16 (2016), 2479--2486. DOI:https://doi.org/10.1002/cncr.30070Google ScholarGoogle Scholar
  26. Maurits Clemens Kaptein, Clifford Nass, and Panos Markopoulos. 2010. Powerful and consistent analysis of Likert-type rating scales. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, New York, NY, 2391--2394. DOI:https://doi.org/10.1145/1753326.1753686Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Matthew Kay and Jacob O. Wobbrock. 2019. ARTool: Aligned Rank Transform for Nonparametric Factorial ANOVAs. DOI:https://doi.org/10.5281/zenodo.594511 R package version 0.10.6.Google ScholarGoogle Scholar
  28. Jacob Kittley-Davies, Ahmed Alqaraawi, Rayoung Yang, Enrico Costanza, Alex Rogers, and Sebastian Stein. 2019. Evaluating the effect of feedback from different computer vision processing stages: A comparative lab study. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300273Google ScholarGoogle Scholar
  29. Thomas R. Knapp. 1990. Treating ordinal scales as interval scales: An attempt to resolve the controversy. Nurs. Res. 39, 2 (1990), 121--123. DOI:https://doi.org/10.1097/00006199-199003000-00019Google ScholarGoogle ScholarCross RefCross Ref
  30. Chang Kyun Lee, Dong Il Park, Suck-Ho Lee, Young Hwangbo, Chang Soo Eun, Dong Soo Han, Jae Myung Cha, Bo-In Lee, and Jeong Eun Shin. 2011. Participation by experienced endoscopy nurses increases the detection rate of colon polyps during a screening colonoscopy: A multicenter, prospective, randomized study. Gastroint. Endosc. 74, 5 (2011), 1094--1102. DOI:https://doi.org/10.1016/j.gie.2011.06.033Google ScholarGoogle Scholar
  31. A. M. Leufkens, M. G. van Oijen, F. P. Vleggaar, and P. D. Siersema. 2012. Factors influencing the miss rate of polyps in a back-to-back colonoscopy study. Endoscopy 44, 5 (May 2012), 470--475. DOI:https://doi.org/10.1055/s-0031-1291666Google ScholarGoogle Scholar
  32. J. S. Mandel, J. H. Bond, T. R. Church, D. C. Snover, G. M. Bradley, L. M. Schuman, and F. Ederer. 1993. Reducing mortality from colorectal cancer by screening for fecal occult blood. Minnesota Colon Cancer Control Study. New Eng. J. Med. 328, 19 (May 1993), 1365--1371.Google ScholarGoogle Scholar
  33. Gary Marcus and Ernest Davis. 2019. Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon.Google ScholarGoogle Scholar
  34. Jon McCormack, Toby Gifford, Patrick Hutchings, Maria Teresa Llano Rodriguez, Matthew Yee-King, and Mark d’Inverno. 2019. In a silent way: Communication between AI and improvising musicians beyond sound. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300268Google ScholarGoogle Scholar
  35. R. A. Miller and F. E. Masarie. 1990. The demise of the “Greek Oracle” model for medical diagnostic systems. Meth. Inf. Med. 29, 1 (Jan. 1990), 1--2.Google ScholarGoogle ScholarCross RefCross Ref
  36. Masashi Misawa, Shin-ei Kudo, Yuichi Mori, Tomonari Cho, Shinichi Kataoka, Akihiro Yamauchi, Yushi Ogawa, Yasuharu Maeda, Kenichi Takeda, Katsuro Ichimasa, et al. 2018. Artificial intelligence-assisted polyp detection for colonoscopy: Initial experience. Gastroenterology 154, 8 (2018), 2027--2029.Google ScholarGoogle ScholarCross RefCross Ref
  37. Mark A. Musen, Blackford Middleton, and Robert A. Greenes. 2014. Clinical Decision-Support Systems. Springer London, London, 643--674. DOI:https://doi.org/10.1007/978-1-4471-4474-8_22Google ScholarGoogle Scholar
  38. Joshua Newn, Ronal Singh, Fraser Allison, Prashan Madumal, Eduardo Velloso, and Frank Vetere. 2019. Designing interactions with intention-aware gaze-enabled artificial agents. In Human-Computer Interaction -- INTERACT 2019. Springer International Publishing, Cham, 255--281.Google ScholarGoogle Scholar
  39. Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I lead, you help but only with enough details: Understanding user experience of co-creation with artificial intelligence. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. DOI:https://doi.org/10.1145/3173574.3174223Google ScholarGoogle Scholar
  40. Minna Pakanen, Jussi Huhtala, and Jonna Häkkilä. 2011. Location visualization in social media applications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 2439--2448. DOI:https://doi.org/10.1145/1978942.1979298Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Y. Park, D. Sargent, I. Spofford, K. G. Vosburgh, and Y. A.-Rahim. 2012. A colon video analysis framework for polyp detection. IEEE Trans. Biomed. Eng. 59, 5 (May 2012), 1408--1418. DOI:https://doi.org/10.1109/TBME.2012.2188397Google ScholarGoogle Scholar
  42. Hans-Peter Piepho. 2004. An algorithm for a letter-based representation of all-pairwise comparisons. J. Computat. Graph. Statist. 13, 2 (2004), 456--466.Google ScholarGoogle ScholarCross RefCross Ref
  43. C. Pox, W. Schmiegel, and M. Classen. 2007. Current status of screening colonoscopy in Europe and in the United States. Endoscopy 39, 2 (2007), 168--173.Google ScholarGoogle ScholarCross RefCross Ref
  44. Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brand on Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. 2017. CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv e-prints (Nov. 2017), arXiv:1711.05225.Google ScholarGoogle Scholar
  45. Colin J. Rees, Siwan Thomas Gibson, Matt D. Rutter, Phil Baragwanath, Rupert Pullan, Mark Feeney, and Neil Haslam. 2016. UK key performance indicators and quality assurance standards for colonoscopy. Gut 65, 12 (2016), 1923--1929. DOI:https://doi.org/10.1136/gutjnl-2016-312044Google ScholarGoogle Scholar
  46. Douglas K. Rex. 2017. Polyp detection at colonoscopy: Endoscopist and technical factors. Best Pract. Res. Clin. Gastroent. 31, 4 (2017), 425--433. DOI:https://doi.org/10.1016/j.bpg.2017.05.010Google ScholarGoogle Scholar
  47. Shihab Sarwar, Anglin Dent, Kevin Faust, Maxime Richer, Ugljesa Djuric, Randy Van Ommeren, and Phedias Diamandis. 2019. Physician perspectives on integration of artificial intelligence into diagnostic pathology. npj Digital Medicine 2, 1 (2019), 28. DOI:https://doi.org/10.1038/s41746-019-0106-0Google ScholarGoogle Scholar
  48. Graham Thomas. 2007. Real-time camera tracking using sports pitch markings. J. Real-time Image Proc. 2, 2 (1 Nov. 2007), 117--132. DOI:https://doi.org/10.1007/s11554-007-0041-1Google ScholarGoogle Scholar
  49. Niels van Berkel, Jorge Goncalves, Danula Hettiachchi, Senuri Wijenayake, Ryan M. Kelly, and Vassilis Kostakos. 2019. Crowdsourcing perceptions of fair predictors for machine learning: A recidivism case study. Proc. ACM Hum.-comput. Interact. 3, CSCW (2019), 21. DOI:https://doi.org/10.1145/3359130Google ScholarGoogle Scholar
  50. Niels van Berkel, Lefteris Papachristos, Anastasia Giachanou, Simo Hosio, and Mikael B. Skov. 2020. A systematic assessment of national artificial intelligence policies: Perspectives from the Nordics and beyond. In Proceedings of the 11th Nordic Conference on Human-computer Interaction (NordiCHI’20). 1--18.Google ScholarGoogle Scholar
  51. Jeroen C. van Rijn, Johannes B. Reitsma, Jaap Stoker, Patrick M. Bossuyt, Sander J. van Deventer, and Evelien Dekker. 2006. Polyp miss rate determined by tandem colonoscopy: A systematic review. Amer. J. Gastroent. 101, 2 (2006).Google ScholarGoogle Scholar
  52. Jonathan B. VanGeest, Timothy P. Johnson, and Verna L. Welch. 2007. Methodologies for improving response rates in surveys of physicians: A systematic review. Eval. Health Prof. 30, 4 (2007), 303--321. DOI:https://doi.org/10.1177/0163278707307899Google ScholarGoogle Scholar
  53. Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’18). ACM, New York, NY. DOI:https://doi.org/10.1145/3173574.3174014Google ScholarGoogle Scholar
  54. Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y. Lim. 2019. Designing theory-driven user-centric explainable AI. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300831Google ScholarGoogle Scholar
  55. Pu Wang, Tyler M. Berzin, Jeremy Romek Glissen Brown, Shishira Bharadwaj, Aymeric Becq, Xun Xiao, Peixi Liu, Liangping Li, Yan Song, Di Zhang, Yi Li, Guangre Xu, Mengtian Tu, and Xiaogang Liu. 2019. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: A prospective randomised controlled study. Gut 68, 10 (2019), 1813--1819. DOI:https://doi.org/10.1136/gutjnl-2018-317500Google ScholarGoogle ScholarCross RefCross Ref
  56. Jacob O. Wobbrock, Leah Findlater, Darren Gergle, and James J. Higgins. 2011. The aligned rank transform for nonparametric factorial analyses using only ANOVA procedures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 143--146. DOI:https://doi.org/10.1145/1978942.1978963Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jeremy M. Wolfe and Todd S. Horowitz. 2004. What attributes guide the deployment of visual attention and how do they do it? Nat. Rev. Neurosci. 5, 6 (2004), 495--501. DOI:https://doi.org/10.1038/nrn1411Google ScholarGoogle ScholarCross RefCross Ref
  58. Jeremy M. Wolfe and Todd S. Horowitz. 2017. Five factors that guide attention in visual search. Nat. Hum. Behav. 1, 3 (2017), 0058.Google ScholarGoogle ScholarCross RefCross Ref
  59. Qian Yang, Aaron Steinfeld, and John Zimmerman. 2019. Unremarkable AI: Fitting intelligent decision support into critical, clinical decision-making processes. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI’19). ACM, New York, NY. DOI:https://doi.org/10.1145/3290605.3300468Google ScholarGoogle Scholar
  60. Kun-Hsing Yu, Andrew L. Beam, and Isaac S. Kohane. 2018. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 10 (2018), 719.Google ScholarGoogle ScholarCross RefCross Ref
  61. Shengbing Zhao, Shuling Wang, Peng Pan, Tian Xia, Xin Chang, Xia Yang, Liliangzi Guo, Qianqian Meng, Fan Yang, Wei Qian, Zhichao Xu, Yuanqiong Wang, Zhijie Wang, Lun Gu, Rundong Wang, Fangzhou Jia, Jun Yao, Zhaoshen Li, and Yu Bai. 2019. Magnitude, risk factors, and factors associated with adenoma miss rate of tandem colonoscopy: A systematic review and meta-analysis. Gastroenterology 156, 6 (1 May 2019), 1661--1674.e11. DOI:https://doi.org/10.1053/j.gastro.2019.01.260Google ScholarGoogle Scholar
  62. Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-sensitive algorithm design: Method, case study, and lessons. Proc. ACM Hum.-comput. Interact. 2, CSCW (Nov. 2018). DOI:https://doi.org/10.1145/3274463Google ScholarGoogle Scholar

Index Terms

  1. Designing Visual Markers for Continuous Artificial Intelligence Support: A Colonoscopy Case Study

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!