Abstract

Despite the promises of data-driven artificial intelligence (AI), little is known about how we can bridge the gulf between traditional physician-driven diagnosis and a plausible future of medicine automated by AI. Specifically, how can we involve AI usefully in physicians' diagnosis workflow given that most AI is still nascent and error-prone (\eg in digital pathology)? To explore this question, we first propose a series of collaborative techniques to engage human pathologists with AI given AI's capabilities and limitations, based on which we prototype Impetus --- a tool where an AI takes various degrees of initiatives to provide various forms of assistance to a pathologist in detecting tumors from histological slides. We summarize observations and lessons learned from a study with eight pathologists and discuss recommendations for future work on human-centered medical AI systems.
Supplemental Material
- Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2009. Overview based example selection in end user interactive concept learning. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, 247--256.Google Scholar
Digital Library
- Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 3.Google Scholar
Digital Library
- Sophia K Apple. 2016. Sentinel lymph node in breast cancer: review article from a pathologist's point of view. Journal of pathology and translational medicine 50, 2 (2016), 83.Google Scholar
Cross Ref
- Teresa Araújo, Guilherme Aresta, Eduardo Castro, José Rouco, Paulo Aguiar, Catarina Eloy, António Polónia, and Aurélio Campilho. 2017. Classification of breast cancer histology images using convolutional neural networks. PloS one 12, 6 (2017), e0177544.Google Scholar
Cross Ref
- Eirini Arvaniti, Kim S Fricker, Michael Moret, Niels Rupp, Thomas Hermanns, Christian Fankhauser, Norbert Wey, Peter J Wild, Jan H Rueschoff, and Manfred Claassen. 2018. Automated Gleason grading of prostate cancer tissue microarrays via deep learning. Scientific reports 8, 1 (2018), 1--11.Google Scholar
- Boris Babenko. 2008. Multiple instance learning: algorithms and applications. View Article PubMed/NCBI Google Scholar (2008), 1--19.Google Scholar
- Dalal Bardou, Kun Zhang, and Sayed Mohammad Ahmad. 2018. Classification of breast cancer based on histology images using convolutional neural networks. IEEE Access 6 (2018), 24680--24693.Google Scholar
Cross Ref
- Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak, Meyke Hermsen, Quirine F Manson, Maschenka Balkenhol, et al. 2017. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318, 22 (2017), 2199--2210.Google Scholar
Cross Ref
- Marsden S. Blois. 1980. Clinical Judgment and Computers. New England Journal of Medicine 303, 4 (1980), 192--197. https://doi.org/10.1056/NEJM198007243030405 arXiv:https://doi.org/10.1056/NEJM198007243030405 PMID: 7383090.Google Scholar
Cross Ref
- Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.Google Scholar
Digital Library
- Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, and Michael Terry. 2019. Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, Glasgow, Scotland Uk, 1--14. https://doi.org/10.1145/3290605. 3300234Google Scholar
Digital Library
- Carrie J Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making. Proceedings of the ACM on Human-computer Interaction 3, CSCW (2019), 1--24.Google Scholar
Digital Library
- Gabriele Campanella, Matthew G Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J Busam, Edi Brogi, Victor E Reuter, David S Klimstra, and Thomas J Fuchs. 2019. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25, 8 (2019), 1301--1309.Google Scholar
- R. Cao, A. M. Bajgiran, S. A. Mirak, S. Shakeri, X. Zhong, D. Enzmann, S. Raman, and K. Sung. 2019. Joint Prostate Cancer Detection and Gleason Score Prediction in mp-MRI via FocalNet. IEEE Transactions on Medical Imaging (2019), 1--1. https://doi.org/10.1109/TMI.2019.2901928Google Scholar
- Anne E Carpenter, Thouis R Jones, Michael R Lamprecht, Colin Clarke, In Han Kang, Ola Friman, David A Guertin, Joo Han Chang, Robert A Lindquist, Jason Moffat, et al. 2006. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome biology 7, 10 (2006), R100.Google Scholar
- Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noémie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2783258.2788613Google Scholar
Digital Library
- Duen Horng Chau, Aniket Kittur, Jason I Hong, and Christos Faloutsos. 2011. Apolo: making sense of large network data by combining rich user interaction and machine learning. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 167--176.Google Scholar
Digital Library
- Xiang "Anthony' Chen, Ye Tao, Guanyun Wang, Runchang Kang, Tovi Grossman, Stelian Coros, and Scott E Hudson. 2018. Forte: User-Driven Generative Design. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 496.Google Scholar
Digital Library
- Dan C Cirean, Alessandro Giusti, Luca M Gambardella, and Jürgen Schmidhuber. 2013. Mitosis detection in breast cancer histology images with deep neural networks. In International conference on medical image computing and computer-assisted intervention. Springer, 411--418.Google Scholar
Cross Ref
- Tony J Collins. 2007. ImageJ for microscopy. Biotechniques 43, S1 (2007), S25--S30.Google Scholar
Cross Ref
- Trafton Drew, Karla Evans, Melissa L. H. Võ, Francine L. Jacobson, and Jeremy M. Wolfe. 2013. Informatics in Radiology: What Can You See in a Single Glance and How Might This Guide Visual Search in Medical Images? Radio Graphics 33, 1 (jan 2013), 263--274. https://doi.org/10.1148/rg.331125023Google Scholar
- Mehmet Günhan Ertosun and Daniel L Rubin. 2015. Automated grading of gliomas using deep learning in digital pathology images: A modular approach with ensemble of convolutional neural networks. In AMIA Annual Symposium Proceedings, Vol. 2015. American Medical Informatics Association, 1899.Google Scholar
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96. 226--231.Google Scholar
Digital Library
- Cristian Felix, Aritra Dasgupta, and Enrico Bertini. 2018. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 153--164.Google Scholar
Digital Library
- Qingji Guan, Yaping Huang, Zhun Zhong, Zhedong Zheng, Liang Zheng, and Yi Yang. 2018. Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification. arXiv preprint arXiv:1801.09927 (2018).Google Scholar
- David A Gutman, Mohammed Khalilia, Sanghoon Lee, Michael Nalisnik, Zach Mullen, Jonathan Beezley, Deepak R Chittajallu, David Manthey, and Lee AD Cooper. 2017. The digital slide archive: A software platform for management, integration, and analysis of histology for cancer research. Cancer research 77, 21 (2017), e75--e78.Google Scholar
- Narayan Hegde, Jason D Hipp, Yun Liu, Michael Emmert-Buck, Emily Reif, Daniel Smilkov, Michael Terry, Carrie J Cai, Mahul B Amin, Craig H Mermel, et al. 2019. Similar image search for histopathology: SMILY. NPJ digital medicine 2, 1 (2019), 1--9.Google Scholar
- Andreas Holzinger, Bernd Malle, Peter Kieseberg, Peter M Roth, Heimo Müller, Robert Reihs, and Kurt Zatloukal. 2017. Towards the augmented pathologist: Challenges of explainable-ai in digital pathology. arXiv preprint arXiv:1712.06657 (2017).Google Scholar
- Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Conference on Human Factors in Computing Systems - Proceedings. https://doi.org/10.1145/302979.303030Google Scholar
Digital Library
- Maximilian Ilse, Jakub M Tomczak, and Max Welling. 2018. Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712 (2018).Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google Scholar
Digital Library
- Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A Mong, Safwan S Halabi, Jesse K Sandberg, Ricky Jones, David B Larson, Curtis P Langlotz, Bhavik N Patel, Matthew P Lungren, and Andrew Y Ng. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. (2019). arXiv:1901.07031 www.aaai.orghttp://arxiv.org/abs/1901.07031Google Scholar
- Ellen C Jensen. 2013. Quantitative analysis of histological staining and fluorescence using ImageJ. The Anatomical Record 296, 3 (2013), 378--381.Google Scholar
Cross Ref
- Fei Jiang, Yong Jiang, Hui Zhi, Yi Dong, Hao Li, Sufeng Ma, Yilong Wang, Qiang Dong, Haipeng Shen, and Yongjun Wang. 2017. Artificial intelligence in healthcare: Past, present and future., 230--243 pages. https://doi.org/10.1136/svn2017-000101Google Scholar
- Martin Köbel, Steve E Kalloger, Patricia M Baker, Carol A Ewanowich, Jocelyne Arseneau, Viktor Zherebitskiy, Soran Abdulkarim, Samuel Leung, Máire A Duggan, Dan Fontaine, et al. 2010. Diagnosis of ovarian carcinoma cell type is highly reproducible: a transcanadian study. The American journal of surgical pathology 34, 7 (2010), 984--993.Google Scholar
- Christian Leistner, Amir Saffari, and Horst Bischof. 2010. MIForests: Multiple-instance learning with randomized trees. In European Conference on Computer Vision. Springer, 29--42.Google Scholar
Digital Library
- Joseph Carl Robnett Licklider. 1960. Man-computer symbiosis. IRE transactions on human factors in electronics 1 (1960), 4--11.Google Scholar
- Geert Litjens, Peter Bandi, Babak Ehteshami Bejnordi, Oscar Geessink, Maschenka Balkenhol, Peter Bult, Altuna Halilovic, Meyke Hermsen, Rob van de Loo, Rob Vogels, et al. 2018. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7, 6 (2018), giy065.Google Scholar
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 413--422.Google Scholar
Digital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.Google Scholar
- Raphaël Marée, Loïc Rollus, Benjamin Stévens, Renaud Hoyoux, Gilles Louppe, Rémy Vandaele, Jean-Michel Begon, Philipp Kainz, Pierre Geurts, and Louis Wehenkel. 2016. Collaborative analysis of multi-gigapixel imaging data using Cytomine. Bioinformatics 32, 9 (2016), 1395--1401.Google Scholar
Cross Ref
- Anne L Martel, Dan Hosseinzadeh, Caglar Senaras, Yu Zhou, Azadeh Yazdanpanah, Rushin Shojaii, Emily S Patterson, Anant Madabhushi, and Metin N Gurcan. 2017. An image analysis resource for cancer research: PIIP-pathology image informatics platform for visualization, analysis, and management. Cancer research 77, 21 (2017), e83--e86.Google Scholar
- R. A. Miller and F. E. Masarie. 1990. The demise of the 'Greek Oracle' model for medical diagnostic systems.Google Scholar
- Michael Nalisnik, Mohamed Amgad, Sanghoon Lee, Sameer H Halani, Jose Enrique Velazquez Vega, Daniel J Brat, David A Gutman, and Lee AD Cooper. 2017. Interactive phenotyping of large-scale histology imaging data with HistomicsML. Scientific reports 7, 1 (2017), 14588.Google Scholar
- An T Nguyen, Aditya Kharosekar, Saumyaa Krishnan, Siddhesh Krishnan, Elizabeth Tate, Byron C Wallace, and Matthew Lease. 2018. Believe it or not: Designing a human-AI partnership for mixed-initiative fact-checking. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 189--199.Google Scholar
Digital Library
- Nobuyuki Otsu. 1979. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9, 1 (1979), 62--66.Google Scholar
- Ludovic Roux, Daniel Racoceanu, Nicolas Loménie, Maria Kulikova, Humayun Irshad, Jacques Klossa, Frédérique Capron, Catherine Genestie, Gilles Le Naour, and Metin N Gurcan. 2013. Mitosis detection in breast cancer histological images An ICPR 2012 contest. Journal of pathology informatics 4 (2013).Google Scholar
- Curtis T Rueden, Johannes Schindelin, Mark C Hiner, Barry E DeZonia, Alison E Walter, Ellen T Arena, and Kevin W Eliceiri. 2017. ImageJ2: ImageJ for the next generation of scientific image data. BMC bioinformatics 18, 1 (2017), 529.Google Scholar
- Joel Saltz, Ashish Sharma, Ganesh Iyer, Erich Bremer, Feiqiao Wang, Alina Jasniewski, Tammy DiPrima, Jonas S Almeida, Yi Gao, Tianhao Zhao, et al. 2017. A containerized software system for generation, management, and exploration of features from whole slide tissue images. Cancer research 77, 21 (2017), e79--e82.Google Scholar
- Mike Schaekermann, Carrie J Cai, Abigail E Huang, and Rory Sayres. 2020. Expert Discussions Improve Comprehension of Difficult Cases in Medical Image Assessment. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--13.Google Scholar
Digital Library
- Johannes Schindelin, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, Curtis Rueden, Stephan Saalfeld, Benjamin Schmid, et al. 2012. Fiji: an open-source platform for biologicalimage analysis. Nature methods 9, 7 (2012), 676.Google Scholar
- Caroline A Schneider, Wayne S Rasband, and Kevin W Eliceiri. 2012. NIH Image to ImageJ: 25 years of image analysis. Nature methods 9, 7 (2012), 671.Google Scholar
- Burr Settles. 2009. Active learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
- Edward H. Shortliffe. 1993. The adolescence of AI in Medicine: Will the field come of age in the '90s? Artificial Intelligence In Medicine (1993). https://doi.org/10.1016/0933-3657(93)90011-QGoogle Scholar
- Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, et al. 2017. Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017).Google Scholar
- Christoph Sommer, Christoph Straehle, Ullrich Koethe, and Fred A Hamprecht. 2011. Ilastik: Interactive learning and segmentation toolkit. In 2011 IEEE international symposium on biomedical imaging: From nano to macro. IEEE, 230--233.Google Scholar
Cross Ref
- Jina Suh, Xiaojin Zhu, and Saleema Amershi. 2016. The label complexity of mixed-initiative classifier training. In International Conference on Machine Learning. 2800--2809.Google Scholar
- Mitko Veta, Yujing J Heng, Nikolas Stathonikos, Babak Ehteshami Bejnordi, Francisco Beca, Thomas Wollmann, Karl Rohr, Manan A Shah, Dayong Wang, Mikael Rousson, et al. 2019. Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge. Medical image analysis 54 (2019), 111--121.Google Scholar
- Mitko Veta, Paul J Van Diest, Stefan M Willems, Haibo Wang, Anant Madabhushi, Angel Cruz-Roa, Fabio Gonzalez, Anders BL Larsen, Jacob S Vestergaard, Anders B Dahl, et al. 2015. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Medical image analysis 20, 1 (2015), 237--248.Google Scholar
- Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Designing Theory-Driven User-Centric Explainable AI. (2019). https://doi.org/10.1145/3290605.3300831Google Scholar
- John N Weinstein, Eric A Collisson, Gordon B Mills, Kenna R Mills Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, Chris Sander, Joshua M Stuart, Cancer Genome Atlas Research Network, et al. 2013. The cancer genome atlas pan-cancer analysis project. Nature genetics 45, 10 (2013), 1113.Google Scholar
- Yao Xie, Melody Chen, David Kao, Ge Gao, and Xiang ?Anthony' Chen. 2020. CheXplain: Enabling Physicians to Explore and Understand Data-Driven, AI-Enabled Medical Imaging Analysis. In To appear at the 2020 CHI Conference on Human Factors in Computing Systems.Google Scholar
- Yao Xie, Xiang ?Anthony' Chen, and Ge Gao. 2019. Outlining the Design Space of Explainable Intelligent Systems for Medical Diagnosis. In ACM IUI. http://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-18.pdfGoogle Scholar
- Yan Xu, Jun-Yan Zhu, Eric Chang, and Zhuowen Tu. 2012. Multiple clustered instance learning for histopathology cancer image classification, segmentation and clustering. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 964--971.Google Scholar
- Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 chi conference on human factors in computing systems. 1--13.Google Scholar
Digital Library
- Qian Yang, Aaron Steinfeld, and John Zimmerman. 2019. Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes. Conference on Human Factors in Computing Systems - Proceedings (apr 2019). https://doi.org/10.1145/3290605.3300468 arXiv:1904.09612Google Scholar
Digital Library
- Qian Yang, John Zimmerman, Aaron Steinfeld, Lisa Carey, and James F. Antaki. 2016. Investigating the Heart Pump Implant Decision Process. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI '16. ACM Press, New York, New York, USA, 4477--4488. https://doi.org/10.1145/2858036.2858373Google Scholar
- Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma. 2014. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 83--92.Google Scholar
Digital Library
- Zizhao Zhang, Yuanpu Xie, Fuyong Xing, Mason McGough, and Lin Yang. 2017. MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network. (jul 2017). arXiv:1707.02485 http://arxiv.org/abs/1707.02485Google Scholar
- Zhi-Hua Zhou. 2004. Multi-instance learning: A survey. Department of Computer Science & Technology, Nanjing University, Tech. Rep (2004).Google Scholar
- Yan Zhu, Shaoting Zhang, Wei Liu, and Dimitris N Metaxas. 2014. Scalable histopathological image analysis via active learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 369--376.Google Scholar
Digital Library
Index Terms
Lessons Learned from Designing an AI-Enabled Diagnosis Tool for Pathologists
Recommendations
Improving Workflow Integration with xPath: Design and Evaluation of a Human-AI Diagnosis System in Pathology
Recent developments in AI have provided assisting tools to support pathologists’ diagnoses. However, it remains challenging to incorporate such tools into pathologists’ practice; one main concern is AI’s insufficient workflow integration with medical ...
Augmenting Pathologists with NaviPath: Design and Evaluation of a Human-AI Collaborative Navigation System
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing SystemsArtificial Intelligence (AI) brings advancements to support pathologists in navigating high-resolution tumor images to search for pathology patterns of interest. However, existing AI-assisted tools have not realized this promised potential due to a ...
Does My AI Help or Hurt? Exploring Human-AI Complementarity
UMAP '20: Proceedings of the 28th ACM Conference on User Modeling, Adaptation and PersonalizationIn a world where the use of AI is growing and evolving, where will we be in 5 years? 10 years? 20 years? What role will AI play in our society, and how will humans and AI interact? While there will undoubtedly be scenarios where AI systems will be able ...






Comments