``It Is a Moving Process": Understanding the Evolution of Explainability Needs of Clinicians in Pulmonary Medicine

Clinicians increasingly pay attention to Artificial Intelligence (AI) to improve the quality and timeliness of their services. There are converging opinions on the need for Explainable AI (XAI) in healthcare. However, prior work considers explanations as stationary entities with no account for the temporal dynamics of patient care. In this work, we involve 16 Idiopathic Pulmonary Fibrosis (IPF) clinicians from a European university medical centre and investigate their evolving uses and purposes for explainability throughout patient care. By applying a patient journey map for IPF, we elucidate clinicians’ informational needs, how human agency and patient-specific conditions can influence the interaction with XAI systems, and the content, delivery, and relevance of explanations over time. We discuss implications for integrating XAI in clinical contexts and more broadly how explainability is defined and evaluated. Furthermore, we reflect on the role of medical education in addressing epistemic challenges related to AI literacy.


ABSTRACT
Clinicians increasingly pay attention to Artifcial Intelligence (AI) to improve the quality and timeliness of their services.There are converging opinions on the need for Explainable AI (XAI) in healthcare.However, prior work considers explanations as stationary entities with no account for the temporal dynamics of patient care.In this work, we involve 16 Idiopathic Pulmonary Fibrosis (IPF) clinicians from a European university medical centre and investigate their evolving uses and purposes for explainability throughout patient care.By applying a patient journey map for IPF, we elucidate clinicians' informational needs, how human agency and patient-specifc conditions can infuence the interaction with XAI systems, and the content, delivery, and relevance of explanations over time.We discuss implications for integrating XAI in clinical contexts and more broadly how explainability is defned and evaluated.Furthermore, we refect on the role of medical education in addressing epistemic challenges related to AI literacy.

INTRODUCTION
Healthcare providers increasingly pay attention to Artifcial Intelligence (AI) to potentially expedite clinical decision-making and administer tailored care to patients.Oftentimes AI applications in healthcare come in the form of Clinical Decision Support Systems (CDSSs) [66].Despite the promising potential of AI-powered CDSSs, the adoption of these systems is crippled by issues of instability [4,157] and a lack of transparency [136,161].Unlike medical doctors, such systems do not have the same authority in the eyes of patients [40,161].
As a means to counteract such limitations, Explainable AI (XAI) -by joining eforts from machine learning, human-computer interaction (HCI), and neighbouring felds -has been designing ways to enable AI-based CDSSs to provide explanations for their outputs [15,69,71] and promote interpretability [33], reliance [74], and contestability [5,176].Despite the ever-increasing body of work within this sphere, research often adopts a positivist, algorithm-centric mindset.It seeks the ideal explainer applicable to a breadth of scenarios without pursuing an understanding around who the recipients of AI explanations and their needs are.Instead, the HCI community has recently advocated for expanding the notion of explainability by adopting a human-centred viewpoint on it [45,46,107].For instance, prior work investigated medical doctors' information and explainability needs in the early adoption stages [30], around certain tasks [159], and about health-specifc datasets [140], and subjective preferences [53] and visualisation modalities [39].
Despite the value of such contributions, existing works still sympathise with the positivist mindset typical of algorithmic research.The pursuit of generalisable insights [39,53], and experimental studies happening at, and focusing on, a single point in time [30] are the norm.Hence, we have a limited understanding of how explanations for AI-based CDSSs are (1) used in practice, (2) for what purpose (i.e., the why), and (3) how those might change over time in environments as dynamic as healthcare.In particular, in the case of AI-generated treatment suggestions, explanations should help clinicians determine how a patient's trajectory -which might (d)evolve unexpectedly -and past treatment decisions might infuence such a suggestion [144].Accounting for the temporal dynamics of user needs is crucial in high-stakes and high-pressure domains like healthcare.
Inspired by categorisations of user needs, and their evolution, in Information Science literature [83,116], in this work, we seek an understanding of the temporal dynamics of explainability needs of clinicians throughout patient care.We ground our work in the use of CDSSs in pulmonary medicine and, more specifcally, in how care for Idiopathic Pulmonary Fibrosis (IPF) is provided at Erasmus MC 1 , a large European university medical centre.Given the unknown causes and life-threatening nature of IPF, we echo prior work (subsection 3.2) and embrace a broader defnition of explainability -encompassing documentation about models, datasets, and processes -to investigate the evolution of clinicians' uses and purposes for explainability.We ask the following questions: (RQ1) What information might clinicians seek in explanations about AI systems?; (RQ2) When would clinicians engage with explanations about AI systems?; and (RQ3) To what extent are the properties of explanations from XAI literature aligned with clinicians' purposes?
To answer such questions, we conducted our study in two phases and used the patient trajectory for IPF as scafolding for situating clinicians' explainability needs over time.Because patient trajectories are often tight-knit with a country's healthcare system, we combine the national care pathway with the "Patient Community Journey Mapping" design method [88] (hereafter referred to as journey map) to outline such a trajectory.First, through an exploratory study (section 4), we (1) validated such a trajectory and (2) gained a preliminary understanding of the study context by engaging a 1 Erasmus Medical Center, Rotterdam, Netherlands: https://www.erasmusmc.nl/en/multi-disciplinary pool of participants with backgrounds in computer science, design, and pulmonary medicine.Using the journey map for IPF together with explanations exemplars (Figure 3) , we conducted semi-structured interviews with 12 clinicians, with diverse levels of expertise and specialisations, employed in treating and researching IPF at the Erasmus MC.We enquired about their medical workfows 2 , pain points, and uses and purposes for explainable AI-based CDSSs.By treating explanations as a means, rather than an end (e.g., as in [53]), we investigate the interactions between clinicians and explanations for AI-based CDSSs in pulmonary medicine, the depth of the information being sought, and how these needs evolve throughout a patient's trajectory.
Our results show several tensions around clinicians' explainability needs (Table 3).Given any particular activity, e.g., creating a treatment plan, clinicians seek diverse and mutable explanationsboth in content and modality -to cope with the dynamics of patient conditions and the unpredictability of IPF.General explanations (e.g., the patient cohort used to build the system) were preferred to estimate whether a system could be a good ft for them.However, as a consequence of their patient-centric commitment, our participants acknowledged the relevance of patient-specifc explanations.Furthermore, clinicians view explanations as tools that support responsible clinical decision-making processes happening on both an individual and équipe level.Finally, epistemic and autonomy challenges were raised in relation to clinicians' capacities to understand and interact with AI systems.As research in XAI progresses and propagates to clinical contexts, we echo prior work [46,179] around the need for tighter collaboration between its algorithmic and human-centred spheres.To that aim, this work contributes: • An understanding of how clinicians' explainability needs evolve in relation to the dynamics and uncertainties of IPF.We situate such needs across a patient journey map for IPF.• Design implications for XAI research in clinical contexts.
Taking a longitudinal perspective and tempering common pitfalls of current XAI research [57,179] are crucial to framing the role of explanations in practice.• Insights into clinicians' usage of explanations.Explanations for AI systems may constitute tools that clinicians leverageindividually or jointly -throughout their workfows whereas current evaluation criteria and processes are not equipped for that.• Suggestions for medical education to address epistemic barriers related to AI literacy [112].Promoting critical thinking about AI-based CDSSs requires close collaborations between healthcare and computer science professionals.
and Hematology.Intertwining patient care and medical research enables Erasmus MC to ofer specialised treatments to complex patient cases.At Erasmus MC, healthcare is approached and delivered in a patient-centric way, focusing on and respecting individual "patients' personal preferences, desires, and values" to provide high-quality care [37].To that aim, mutual information exchange between medical doctors and patients is crucial [25].Ideally, both parties in such interaction should be willing to equally commit to a safe and communicative space to disclose information.In practice, adherence to patient-centric approaches heavily rests on the shoulders and experience of doctors in creating such a space, formulating the right questions, and reducing uncertainty [31,149,152]. 3Ultimately, a patient-centric commitment enables doctors -individually or jointly (e.g., in MDOs4 ) -to better inform possible deviations from the established care path depending on patients' needs.

Idiopathic Pulmonary Fibrosis
Idiopathic Pulmonary Fibrosis (IPF) is a chronic and progressive lung disease causing permanent scarring and breathing difculties [54].It is estimated that about 5 million people are afected by IPF globally [120], and it is most common in people in their 70s [54].After the diagnosis, the average lifespan of patients is between 3 and 5 years [54].Due to its unknown causes -hence idiopathic -IPF is complex to diagnose and progresses unpredictably [105].Symptoms include aching muscles, clubbing 5 , severe fatigue, and weight loss in addition to shortness of breath.Diagnostic procedures for IPF combine high-resolution computerised tomography (HRCT) scans, chest X-rays, and blood and lung function tests. 6everal treatments are available to help patients cope with the disease. 7First, patients are encouraged to adopt healthy lifestyles (e.g., stop smoking or exercise regularly).Prescriptions could include antifbrotic medicationsnintedanib or pirfenidone -and oxygen therapy.Finally, for more severe cases, the existing options cover lung transplants and palliative care (in the late stages of IPF).Cooperation between clinicians, patients, and ongoing research eforts are equally fundamental to providing care to patients afected by IPF.Overall, the intersection of clinical research and patient-centric care makes this a unique context to deeply study the opportunities, challenges, and temporal dynamics around medical AI and XAI.

RELATED WORK
We present prior work in Explainable AI from both algorithmic and human-centred viewpoints on the feld.Given our focus on idiopathic pulmonary fbrosis, we discuss prior work and applications of (X)AI in (pulmonary) medicine.Finally, we highlight the lack of attention to the temporal evolution of explanations for healthcare.

Algorithmic XAI
The notion of explainability dates back to research on expert systems [26] and has been reinvigorated by the recent advances of sub-symbolic AI approaches like deep learning; which favour performance (e.g., accuracy) over model transparency.Given the proliferation of these systems in disparate domains (e.g., healthcare [161], and fnance [101]), explanations could ofer to a variety of stakeholders the means to interpret, evaluate, or contest [5] the output of AI systems.Nowadays, explainability remains largely algorithm-centred and focuses on describing the outputs of AI systems (i.e., interpretability).Under this interpretation, a plethora of explainers have been proposed [15,69,71].Prior work covers local (sample-level) or global (class-level) explainers, either in post-hoc (i.e., without altering underlying AI models) or self-explaining (i.e., embedded within underlying AI models) [35,178] fashions.Concretely, common XAI solutions can refect the importance of individual input features (i.e., feature attribution) [137,142], select infuential [72,97] or prototypical [34,126] instances from the training dataset, describe how much a data instance has to change for the model output to change (i.e., counterfactuals) [64,170], generate human-like concepts [14,61], or provide rule-based explanations [70,138].To cope with the heterogeneity of XAI methods, and to keep track of algorithmic advancements in XAI, prior work has distilled several evaluation properties for explanations [7,33,100,122,147].Such properties cover both model-or system-specifc aspects (e.g., fdelity, stability, or uncertainty) as well as human factors (e.g., comprehensibility, actionability, or coherence).

Explainability in HCI
The HCI community argues for and investigates a broader defnition of explainability, one that focuses on the recipients of explanations [65,103,125] and views AI systems as socially-situated agents [45][46][47].To do so, HCI researchers often tie together works and theories from cognitive psychology [110,111], social sciences [122], design [175], philosophy [27], and -seldom -algorithmic AI [2].A growing research strand within HCI is that of Humancentred Explainable AI (HCXAI). 8Research within this sphere aims to gain an understanding of who the recipients of explanations are [46].It rests on prior works around framing "XAI stakeholders" [15,103,125,135,158] and incorporates refexive practices from design [46] and prior discussions around users and contextfulness of explanations [33,122,147].Furthermore, in contrast with algorithmic XAI research, HCXAI posits a pluralist defnition of explanations [51] as diferent social groups might interpret technological artefacts diferently (i.e., interpretive fexibility [19]).
By adopting this lens, a number of prior works connect to the fabric of HCXAI in investigating the technical afordances and end-users of XAI systems [47].Works targeting developers and practitioners include documentation tools like Model Cards [124] and Datasheets for Datasets [60] as well as an XAI question bank covering prototypical questions around explainability [107].Instead, works targeting lay users include data-centric explanations [9] and empirical studies around the relative importance of evaluation properties of explanations [108].Finally, Langer et al. [103] and Subramonyam et al. [153] propose frameworks to aid interdisciplinary research and communication around AI systems.
In conjunction with such works, others focused on understanding the XAI needs of end-users.Kim et al. [95] enquired about the end-users of a real bird identifcation app and surfaced needs related to improving human-AI collaboration.Similarly, Cai et al. [29] investigated the needs of pathologists around AI-based diagnostic tools and, later, Cai et al. [30] compiled pathologists' information needs (e.g., capabilities and limitations) during the onboarding phases of prospective AI systems.Instead, Rostamzadeh et al. [140] adapted Datasheets for Datasets [60] for the documentation of healthcare datasets.Finally, Tonekaboni et al. [159] unveiled clinicians' interest in explanations that justify clinical decision-making.

CDSSs and XAI in Pulmonary Medicine
The application of XAI within healthcare is largely tied to Clinical Decision Support Systems (CDSSs) as the need for explanations is exacerbated by the criticality of medical doctors' decisions, issues of accountability [146], and the proliferation of sub-symbolic AI approaches [43,118,141,166,171]. CDSSs could "provide clinicians, staf, patients, or other individuals with knowledge and person-specifc information, intelligently fltered, or presented at appropriate times, to enhance health and health care" [129].Despite the practical benefts, existing AI-based CDSSs (e.g., Merative 9 ) have displayed high false positive rates in real-world settings [161].Specifc to pulmonary medicine, prior work focused around the adoption of AI [89,93], dedicated support systems [43,166], diagnostic models [181], and studies comparing CDSSs' performance against pulmonologists' [85,151,160].However, to the best of our knowledge, their wide adoption in pulmonary medicine has not happened yet.
Similarly, while guidelines for implementing XAI in healthcare have been discussed (e.g., [114,117]), existing surveys [132,143] show that the application of XAI in pulmonary medicine is sporadically explored.Das et al. [39] highlighted the potential benefts for pulmonologists of using an XAI system to assist in the diagnostic interpretation of pulmonary function tests.Instead, Diprose et al. [41] probed physicians with a hypothetical ML-based risk calculator for pulmonary embolism paired with several explainers [12,63,113,137].Finally, Evans et al. [53] investigated possible challenges for pathologists in adopting existing explainers [94,109,142].
Related to healthcare, prior works in HCI have approached the problem similarly (subsection 3.2) with longitudinal perspectives being few and far between.For instance, Jardine et al. [84] enquired about end-users' perceptions of internet-delivered therapy over 8 weeks uncovering diverse preferences, uses, and long-term support strategies.Instead, Jo et al. [86] and Blair et al. [21] focused on supporting clinicians when planning and delivering longitudinal health interventions respectively.Such works exemplify the need for longitudinal perspectives when studying clinical settings.It is indeed common for patient conditions and treatments to require clinical progression before actions can be taken -both by clinicians and researchers striving to support clinical workfows.

Research Gap
Only a dearth of research engaged in understanding end-users' explainability needs [30,159], preferences [39], and perceptions [11,41,53] in clinical settings despite their importance [10].Furthermore, because such factors are often captured in a single moment in time (e.g., diagnosis [30]), we still lack an understanding of how end-users' explainability needs might evolve over time.Indeed, the temporal dynamics of user needs have been investigated in other felds, e.g., Information Science [83,116].We argue this to be a crucial facet of research around AI-based CDSSs: high-pressure situations, uncertain patient trajectories, and doctors' experience can impact the adoption and integration of AI-based CDSSs [40] and the design of explanations they might provide.Specifcally for pulmonary medicine and IPF, prior surveys show that research around AI-based CDSSs [89,150], and explainability [10,132,143] is relatively absent.
In this paper, we investigate the temporal dynamics of clinicians' explainability needs within pulmonary medicine.We, particularly, ground our work in the use of CDSSs for providing care for IPF at Erasmus MC (section 2).Unlike prior works that focus on individual points in time [30,39,41,53,159], we situate such needs throughout patient care for IPF.When doing so, we do not seek to fnd clinicians' defnite preferences for certain explanations (e.g., as in [39,53]) but rather gain a nuanced understanding of how, and why, their uses and purposes for explanations evolve over time.Finally, we relate our results with literature on Explainable AI and revisit the relevance of the evaluation properties of explanations (subsection 3.1) within pulmonary medicine.

EXPLORATORY STUDY
To answer our research questions, and inspired by [173], we conduct our enquiry into IPF clinicians' explainability needs in two steps, namely, an exploratory study and contextual interviews.Here, we describe the exploratory study -a multi-disciplinary co-creation session -to inform the structure and instruments to be used in the contextual interviews with IPF clinicians (section 5).
The exploratory study aimed at informing the design for the main study (section 5), and particularly: • the interview protocol, by refning our questions to clinicians (e.g., vocabulary)  1: Co-creation participants, their details, and background.Years of Experience refers to the years a participant has spent in that role or has had that title.fctional male, 67-year-old patient.
To structure the exploratory study, we relied on prior literature around the efects that clinicians' past experiences have on assistive tools [30,40] and how directly testing with existing XAI methods leads to understanding users' preferences and not needs [39,53]. 10 As a confdence check, we opted to query our participants directly to confrm (or set aside) such premises in our specifc study context.

Instruments
Here we describe the instruments used in our exploratory study and tested as prospective prompts for the main study (section 5).These are shared as supplementary material.
Patient Journey.Patient journey mapping [32,119] is a design method for incorporating patient experiences in healthcare design while providing a bird's eye view of such experiences.In our work, we adopt patient community journey mapping [88], a data-driven extension aimed at alleviating the labour-intensive nature of traditional patient journey mapping.We frst collected from a US-based 10 Prior work [173] has operated similarly when investigating XAI in healthcare.platform 11 a large set (140k ca.) of experiences that IPF patients voluntarily shared.Then, we applied topic modelling 12 and manually checked for the validity and reasonableness of the topics.Finally, we aligned our topics with healthcare practice, by combining them with the care path for IPF used at the Erasmus MC.XAI Prototype.To help participants refect and envision AI-based CDSSs, we prepared an XAI system prototype that allows pulmonologists to inspect patients' experiences at a fner granularity (Figure 2).The design of the prototype was based on the authors' prior knowledge of IPF and existing literature ( [29,67,173]) but tailored to textual data, i.e., patient experiences.We picked a diverse subset of patients' experiences and associated topics (Figure 2a), and explanations around those topics (Figure 2b) to be displayed in the prototype.The topics aligned with the journey map to reduce friction and cognitive load on participants when moving away from the journey map.Mindful of the time constraints clinicians face in practice, the explanations we generated (using [154]) consisted of salient excerpts from patients' experiences (Figure 2b).

Method of the Exploratory Study
We opted for a participatory approach to include diverse perspectives and foster a fruitful discussion around the use of XAI within pulmonary medicine.We organised a 1.5-hour-long co-creation session that engaged a multidisciplinary team with expertise in IPF, design, and XAI.Participants were recruited through the professional networks of the authors.Table 1 summarises their details.
Structure.The co-creation session started with a small introduction to the research project and its goals.Then, participants engaged in discussing the journey map in a think aloud fashion.Questions covered data collection, rationales around topics, and how those related to clinical practice.Afterwards, following initial familiarisation, participants engaged in using the XAI system prototype (Figure 2) and the explanations it included.Participants were tasked to create short profles of patients based on the information displayed through the system prototype.The session closed with an open discussion about the perceived usefulness and understandability of the two instruments.
Analysis.The session was recorded with participants' consent and analysed by the frst and second authors.Participants' comments were mainly clustered in relation to the journey map and the XAI prototype to decide which instrument to use in the main study.Additional comments were coded inductively and served to inform the interview protocol.

Outcomes
Together with a rough outline for the interview protocol, the main, and concrete, outcome of the exploratory study (and prompt for the main study) is a validated patient journey map (simplifed in Figure 4).The journey map begins with patients experiencing the frst symptoms consulting additional (e.g., online) resources.Around the same time, consultations with general practitioners take place.After being referred to a hospital, patients go through physical examinations and tests (e.g., lung function) with lead practitioners and specialised nurses.The results are then discussed by a cohort of medical doctors in MDOs to reach a diagnosis.Then, patients and doctors discuss the defnition of a treatment plan.Patients receive continuous support in recurring consultations and treatment revisions.Lastly, patients might opt for a better quality of life and decide on hospice or palliative care.
Overall, clinicians engaged in the co-creation session found the journey map to be comprehensive and aligned with their experience.They, however, pointed out that the journey map represents the "the ideal situation" as the timeline can be blurrier.Conversely, participants engaged with the XAI prototype on a surface level, barely interacting with it "We still need to level it and test it.But it gives a crude impression of what AI is and what it could do."(CC-P1).Motivated by the general agreement on the phases and actions portrayed in the journey map by clinicians (CC-P1 -CC-P4), we settled on it as the contextual prompt for our main study.

METHOD FOR THE MAIN STUDY 5.1 Recruitment
Interviews were held by leveraging the authors' professional networks to seek out participants with diverse medical roles, years of experience, and familiarity with AI.Given the tight and busy schedules of our participants, we used a combination of purposive and convenience sampling for our study.Concretely, recruitment was carried out through email and in person during selected unit-wide meetings at the Erasmus MC in which the authors were given authorisation to partake.We conditioned further reaching out based on the potential to provide rich insights around XAI in pulmonary and IPF care.Data collection stopped when additional interviews failed to contribute relevant, new information.Overall, we spoke with 12 clinicians whose details are summarised in Table 2.

Conducting Interviews
Interviews 13 were scheduled from May to July 2023 and lasted on average 35 minutes, depending on clinicians' availabilities, and were recorded using videoconferencing software.Respondents were sent an informed consent form beforehand. 14 Interviews started with an of-the-record introduction about the goals and outline of the interview.After that, with the participants' consent, we started the recording.We prepared an interview guide to provide a fexible structure for the conversations.Initially, respondents were asked "grand tour questions" [106] about their medical role, and familiarity with AI and IPF.Depending on the latter, participants were then shown the journey map (section 4) as an initial prompt to establish meaningful communication [44] and to discuss their practice and knowledge.Thereafter, participants had access to the journey map as a reference for sharing their experiences.The interviews then proceeded to discuss the challenges they currently face in practice, e.g., creating treatment plans.Once a common vocabulary was established, we started shifting the attention to AI in pulmonary medicine.We enquired about their perceptions of AI, what role they see it taking, and how it could afect the scenarios disclosed thus far.Finally, we delved deep into clinicians' needs and uses of explanations in medical workfows, in the context of AI systems, and how these two domains compare or (mis)align.Given the breadth of the potential insights around explanations, we are guided by the framework from Xu et al. [174] and probed participants on what they would like to be explained (e.g., data features), when (e.g., disease diagnosis), and how (e.g., numerically) that should be explained.In this last segment, we relied on explanation exemplars (Figure 3; subsubsection 5.2.1) to surface insights specifc to pulmonary medicine.

Exemplar Explanations.
To help participants refect on the kind of explanations they might look for, we hand-crafted a selection of exemplar explanations15 drawn from algorithmic Explainable AI literature.We used the exemplars in a "what if?" fashion only after querying clinicians about their envisioned use for explainability.Furthermore, we refrained from preparing a large pool of exemplars but rather selected a variety of visualisation modalities to elicit rich and contextualised responses rather than inquiring about their specifc preferences around existing XAI methods.Inspired by Kim et al. [95] and Vilone and Longo [168], we prepared 6 exemplar explanations (Figure 3) based on existing XAI methods and ascribing to real tools and visualisations customary to pulmonologists [39], e.g., pulmonary function tests.Out of the 6 exemplars, 4 are singlemodality (Figures 3a -3d): numerical (e.g., [137]), rule-based (e.g., [138]), textual (e.g., [14]), and visual (e.g., [142]).The remaining two combine visual and textual elements (Figure 3e), and rule-based, visual, and textual elements (Figure 3f) respectively.

Data
Processing.Interviews were conducted in English (by the frst author) and in Dutch (by the second author) according to participants' preferences.Dutch-spoken interviews were manually transcribed and later translated into English. 16Instead, Englishspoken interviews were automatically transcribed. 17

Analysis
We analyse participants' (anonymised) responses through codebook thematic analysis (TA) [96,121,139].This declension of TA, situated between refexive [23,36] and coding reliability [22,68,87] approaches to TA, provides scafolding to answer our research questions in an integrative manner [24,36].From an epistemological perspective, we adopt a contextualist account [77,81] and consider responses to be valid knowledge within pulmonary medicine (RQ1, RQ2).Instead, from an ontological perspective, we embrace a critical realist account [56,65] in the attempt to expose latent information about the evaluation of XAI within pulmonary medicine (RQ3).We adopt as a reference point the criteria for evaluating explanations -both model-and human-centred ones -surveyed by Liao et al. [108].Practically, we engaged in a combination of deductive and inductive coding to identify initial central conceptsbased on literature and interview structure -and then build meaning around those and emergent concepts.The frst author took the 16 DeepL Translate: https://www.deepl.com/translator 17Microsoft Teams: https://www.microsoft.com/microsoft-teams/group-chat-softwarelead in the data analysis, frst familiarising themself with the data (by reading transcripts and creating preliminary descriptive memos [62]), and then coding the data.The second and third authors contributed with partial coding, review of the codes, and defnition of themes -ultimately mitigating individual positionalities.Coding was conducted using Atlas.ti 18while groups and themes were delineated and refned through in-person meetings between the authors.We identifed 270 codes, organised into 25 clusters, further refned into 6 groups, and fnally distilled into 3 themes.

Authors' Positionality and Perspective
To provide more clarity to readers, we disclose how the authors' perspectives and assumptions shaped the analysis.The authors (all based in the Netherlands) work in diverse felds.Authors 1 (Italian male), 2 (Dutch male), 4 (French female), and 6 (Chinese male) research in computer science.Author 3 (South Korean female) researches in design and healthcare.Author 5 (Dutch female) researches in pulmonary medicine.Despite some unfamiliarity with the study context, our interest in exploring the intersection between healthcare, AI, and XAI led to the willingness to deeply investigate a single, relatively less explored context as a means to gain focused insights.The construction of this paper was mostly shaped by author 1's views on XAI and refections with the co-authors.We acknowledge that, due to our background and occupation, we approach the domain from a position of privilege.However, despite that and the introduction of external theories in our study ( [88,108]), we commit to giving up the belief that our prior knowledge is superior to that of the involved clinicians (Krogh and Koskinen [99]) and commit to a careful and contextual interpretation of clinicians' responses around their experiences, perception of XAI, and patientspecifc examples that were brought up throughout the interviews.

RESPONSES FROM CLINICIANS
We now discuss the themes resulting from our interviews with clinicians, organised in relation to our research questions: information needs (RQ1), moments and conditions in which doctors   seek explanations (RQ2), and the alignment between properties of goodness of explanations and clinicians' purposes (RQ3).We relate our results to the journey map for IPF in Figure 4 and summarise key insights in Table 3.

Theme 1: "With paracetamol, you don't know exactly how it works" -About Explanatory Depth
The "unnatural" feeling (P11) of communicating decisions to colleagues (P9, P12) and patients (P3, P10) without explaining motivated them to look for explanations with AI-based CDSSs too.Particularly, participants expressed the need for both general system explanations about the afordances of such systems (particularly around validation) and local, patient-specifc explanations that would highlight patient-specifc factors.Regardless, multi-modal visualisation modalities seemed to align better with clinicians' needs.
"I think we need some explanation, it won't be sufcient to only say, well, "it's pneumonia".[...] We are used to explaining how we got to a certain answer [...] it would feel very unnatural to only get one diagnosis without any explanation."(P11).In light of this, almost all participants (excl.P10 and P12) denoted how having access to lower-granularity explanations would provide more credibility to clinical decision-making which otherwise would feel "unnatural" (P11).Particularly when communicating with patients, local explanations could refect information around risk factors for IPF (P9), test results (P3), side efects of treatments (P7), and historic trends in similar patients (P2)."I wouldn't just tell people that I do it this way because the AI says so [...] We sometimes deal with very rare things, and then I come across something again, and then I think 'why did we do this?' [...] It's also man's nature [to ask] why, so I think that's also what is needed."(P3) 6.1.3Visualising Explanations.Overall, participants (P1, P2, P3, P4, P5, P7, P10, P11) gravitated towards multi-modal explanations (Figures 3e and 3f).These were perceived as instruments allowing for clinicians' discretion, and enabling them to quickly glance over explanations and dive deeper if needed (e.g., in the presence of a rare mutation (P3)) in their day-to-day practice.
Instead, single-modality explanations (Figures 3a -3d) yielded mixed reactions and were perceived as highly situational instruments, dependent on the nature of the question being asked.Numerical and rule-based explanations were considered customary for clinicians -"[...] that's kind of the way how I think, [how] many doctors would also apply their train of thought."(P5) -as they resemble lab tests and reasoning processes respectively.Often, they related this information to their training in universities and hospitals (P3, P4, P5, P7, P12).On the other hand, textual explanations were considered passable (P1, P10) but "disappointing" (P5) if not related to medical literature.Participants also found this delivery modality to be useful in communicating the regulations or guidelines an AI system might follow or refer to (P5, P7, P8, P11).Instead, for disciplines like radiology and oncology where imaging techniques are more common, visual explanations would allow clinicians to visually identify elements they recognise (P9) and formulate a preliminary understanding (P1, P2, P4, P10, P11).
While these comments emphasise the nature of the information our participants seek from AI explanations, we explore their relation to medical workfows and tasks in subsection 6.2.

Theme 2: "So that you can see the clinical progression" -About Explanation Dynamics
Participants' needs over what explanations should include are tightly connected to their medical workfows.Here, we organise such comments and rationales around the patient journey map (Figure 4) to highlight how explanations translate to medical practice.In parallel to this, two major considerations emerged.First, the desire to safeguard human agency and second, the role of explanations in learning how and when an AI-based CDSS should be used.

Translating Explanations to Medical
Practice.While discussing explainability, participants focused on selected phases of the journey map (Figure 4).Namely, Diagnosis (Phase 2), Treatment (Phase 3), and Living with IPF (Phase 4).Regarding instead Phase 1 (Prediagnosis), participants did not discuss it in depth as it mainly occurs outside Erasmus MC and access to information about patients is difcult or absent.
Phase 2: Diagnosis -Patients that reach this phase of the journey, usually arrive at the hospital after being referred by a general practitioner.After some preliminary checks, e.g., pulmonary function, the frst consultations are scheduled.In this setting, participants saw explanations and AI as support tools for diagnosing IPF, acting as a second set of eyes to frst compare and contrast their judgement with that of AI, and dive into the explanation in case of disagreement (P7, P9, P11)."Well, you can have something that was already decided without AI and whether there is a match between those two [the AI's and clinician's decision].[...] But if it doesn't match, then you can look at the explanation as to why that is.Then I would look at the explanation [to see] if something else comes out that I overlooked."(P7) Critically, P6 commented on explanations possibly playing a bigger part within MDOs -their gold standard for diagnosing IPF: "If it's [the explanation] integrated also with the MDO then it's just part of the prediction.Then you're not going to say I say this, the AI says this.Then you also use that explanation with it [the prediction] in the MDO." (P6) Phase 3: Treatment -After IPF has been diagnosed, doctors and patients engage in the defnition of a treatment plan.Participants commented that for AI-generated suggestions to be meaningful, patient-specifc characteristics (e.g., mutational stages of a tumour (P7), or kidney function (P3)) and treatment side-efects should be clarifed in the explanation (P5, P7, P10, P11)."[...] even going as far as proposing treatments -that is in oncology and especially [for] me in pulmonary oncology [...] pinpointing specifcs based on mutational stages of the tumour and some other biological processes" (P5) Further, participants juxtaposed the thoughtfulness that is required for some decisions (P1, P2) to the transactional nature of most AI-based CDSSs -ask, and be told.By referring to his occupation as a lung transplant surgeon, P2 highlighted the unftness and lack of care and understanding similar systems might exhibit in certain scenarios.Carefully crafted AI explanations, could better assist clinicians in assuring credible clinical decision-making processes.
"Then you also do justice for this type of care product [lungs], to avoid missing people, or transplanting them too early or unnecessarily.[...] Lung transplantation is a big grey area it's really a small group of patients."(P2) Phase 4: Living with IPF -In this last phase, treatment is already underway and recurring consultations ensure that patients react correctly to it, that the (possible) side efects of medications are bearable, or that treatments are correctly revisited.Here, participants (P6, P10, P11) independently focused on the temporal dimension and the importance of adopting a longitudinal view around explanations by, for instance, having AI-generated suggestions explained through trends (P2, P9, P10).P2 exemplifed this by referring to the moment following a lung transplant: "Initially, lung function rises, and then it stabilizes.But it can also be a rejection, then there's a drop in lung function.Then we perform a number of steps: a CT scan, a bronchoscopy where we culture for bacteria and virus, and where we take morsels of tissue for diagnostics, and send those to the pathologist and they see, for example, an A1-B0 reaction, an A1 you can also have if there's a viral infection at play.At that point we wait to see: if the cultures are negative, we can still decide to give a rejection treatment."(P2) Although P2 is an outlier in our participant pool, this comment symbolises the breadth of contextual information that explanations for AI-based CDSSs should, in their view, relate to.
Instead, participants (P5, P10, P12) commented on re-assessing treatment plans highlighting both the value of explanations that relate to temporal dynamics and their possible initial absence due to the lack of clinical progression (P12).
"You then put the cures in an interval, so that you can look at the side efects.Those are linked to the lab, if you see that there is an increase or decrease in lab values then we know that maybe we should make an adjustment."(P10).

Safeguarding Human Agency.
While participants recognised the performance of CDSSs to be crucial towards their adoption (P6, P10), they also stressed the need for a human controller throughout patient care given high diversity of IPF patients they attend to (P3, P5, P7).Nonetheless, our participants envisioned diverse uses for those systems -as additional data points (akin to lab test results (P11)), as an additional pair of eyes (P4, P12), or as artefacts meant to give suggestions (P5) -but always with the idea that "[the AI] has to add something in practice" (P8).Such viewpoints are not surprising.Our participants -often facing complex patients' needs -rightfully consider their training and interactions with colleagues foundational to their modus operandi whereas AI could introduce unwarranted roadblocks.
Despite these interpretations, participants expressed the possibility for AI explanations to be integrated with existing workfows (e.g., in MDOs) and become aids towards more "substantive discussions" (P2), information sharing (P6, P10), and diagnostics (P1).
"I think you should always be able to discuss it [the explanation] with a colleague."(P10).Concretely, explanations could provide the means to better evaluate AI-generated suggestions related to, e.g., adjust treatment plans (P8); or ignore them altogether (P10) based on patient examinations.
On this last point, several participants (P3, P7, P8, P10, P11) stressed that having a human controller present throughout patient case does not necessarily signify attempting to become better collaborators with AI itself (P12).Instead, critical thinking needs to be exercised.To that aim, P1 directly challenged responsibilities clinicians might have in the future as AI systems get more and more prevalent: "Do we see our own role as [some] sort of interpreters of the information and having a good conversation with the patient?Or do we see that we do still have a role to see if the AI is still correct with our own ideas?"6.2.3 Explain to Learn.The fnal facet of explanations that surfaced during our interviews is related to explanations serving as learning tools on how to use an AI system.Provided that an AI system has been appropriately validated and some guarantees are given beforehand (subsection 6.1), our participants discussed more concrete positions on learning to use AI systems in practice and testing whether they hold up to the initial expectations.
Initially, doctors might look at explanations more frequently (P7, P8, P12) as a way to "to go into a little depth" (P1) into what an AI-based CDSS might be doing and get accustomed to it.During this probation period for AI, doctors can formulate a mental model of how that system might operate and come to an understanding of what that system could do for them concretely.
"So [in] a complex system where you have a lot of patients [...] it's really nice that at least in the beginning when you start using it you understand what exactly counts because everybody has in their head an algorithm how you aggregate all those patients characteristic to a product, treatment A or B." (P8) Additionally, this can be combined with a prospective validation of the system (P8) in which a system is tested against a large backlog of historical data (e.g., CT scans) and compared with the suggestions and rationales of radiologists.
Only after clinicians learn the capabilities and shortcomings of an AI system, they might start taking into account acceptance (P8, P10).However, participants hinted at difculties around the acceptance of an AI system by referencing howprolonged collaborations help them get a sense of who the more knowledgeable others 19 are when in need of a second opinion (P1, P3, P11).In this vein, the practical viewpoints of our participants highlighted the conceivable decay in the utility of explanations in the case of an AI-based CDSS that displays consistent behaviour alignment with their own judgement (P3, P4, P9, P11, P12)."At some point, you trust someone's knowledge and ability when you consult someone who you know is very knowledgeable about something.That is of course more difcult in such a large automated system."(P1).Concurrently, more experienced participants acknowledged the consequences of AI explanations and AI-generated advice for inexperienced clinicians.These might be both "enlightening" and foster learning, or detrimental and provide convincing motivations for what they do to the point they "do not know any diferent" (P2).

Theme 3: "Then it doesn't have as much value to me" -About the Goodness of Explanations
Concerning properties of goodness of explanations from XAI literature (subsection 5.3), participants naturally focused on interactivity as a means to personalise explanations to their needs while retaining agency (subsection 6.2).On top of that, participants underscored the necessity for those explanations to display actionable insights that help them chart the next course of action.

Interactivity for Personalised Explanations.
Participants viewed interactivity of explanations as a key property for them to fexibly query explanations and retain agency (subsection 6.2).They repeatedly underscored their interest in explanations that could convey clear and concise information.Explanations that are too extensive could lead to high cognitive load [6,179], or be completely disregarded: "Then it doesn't have as much value to me." (P8).Some participants framed the compactness [100, 147] (i.e., the amount of detail) of explanations as an upstream design choice dependent on the concrete task or application: "If it's too detailed then of course people aren't going to look anymore.It's really per-application how detailed it should be."(P6).In addition to this, some participants mentioned the long-term efects of explanations, e.g., building end-users' trust, and the potential benefts brought by detailed justifcations 20for AI systems: "That may be a lot of reading but that's what's going to help build trust eventually" (P3 on Figure 3c).In attempting to strike such a balance, participants highlighted the epistemic barrier that AI explanations might create.Our participants desired explanations to be slim and free from technical jargon so as to not hinder their comprehension [33] of AI's afordances.P1 -echoed by P6, P8, and P11 -stated that "some degree of knowledge would be necessary, but you don't need to exactly know how the system works in the background.".Similarly, P5 expanded this by pointing to the need to assess "the critical steps" an AI takes towards a decision and communicate those to clinicians.Given such accounts, visual modalities of explanations (subsection 6.1) play a fundamental role in the way information is conveyed to clinicians.For instance, rule-based (Figure 3b) and textual (Figure 3c) explanations should be of "manageable size" (P3, P7) for clinicians to be willing to engage with them.Oftentimes, multi-modal explanations seemed more benefcial for our participants: "Text I don't like.Visual is too little.'It should speak', mixed is best."(P3).
In this sense, some participants saw the interactivity [18, 147] of explanations as a plausible solution to their concerns about the comprehensibility and accessibility of explanations, allowing them to further query the AI and its explanations (P5, P9)."I think it's important that it's visual at a glance but that if you want more information you can zoom in for more information [...] so that you can still ask questions." (P9) 6.3.2Obtaining Actionable Insights.Participants also expressed the need for explanations to provide actionable 21 insights that help them chart the next course of action, e.g., coming to a diagnosis, or escalating the discussion to an MDO.Explanations were perceived as companions to their own decisions (subsection 6.2), particularly, as preliminary checks while waiting for more educated judgements and rationales from colleagues or lab tests.Thus, for explanations to be actionable, they should refer to information that relates to doctors' practice (subsection 6.1).While including percentages or probability values within explanations might communicate the (un)certainty of an AI system's answer, these were perceived as de-contextualised and unclear -if not useless.
"You never know for sure.Suppose within a certain patient category the system doesn't work 90% but 70%, and the output is yes or no.Something comes out [it], but you don't know which group that falls into.It's hard to look at even with [the] probability of whether that advice is right or wrong."(P8).
In view of this, respondents instead longed for explanations that would be coherent [122,147] with external sources of knowledge: prior patients (P8), expertise shared within the group, and medical literature (P3, P5).P3 considered this to be a much-needed basis for comparison given the unpredictability of IPF, the state-of-thepractice at Erasmus MC, and the credibility required in clinical decision-making.
"References, what it [the advice] is based on, from the scientifc literature, to see what the basis of the advice is.We sometimes deal with very rare things, and then I come across something again, and then I think 'why did we do this?'.If there are references there, then I understand why we did that."(P3) Related to this, several participants (P1, P2, P4, P5, P7, P12) seemed aware of the possible aversion toward suggestions and explanations from AI stemming from how they have been operating in the past.Despite their propensity for research and the frequent need to re-assess their decisions, they refected on how they might judge more harshly disagreeing information (explanation or not), and by whom it is given."We are terribly opinionated of course.Often I had a colleague ask me 'What would you do?'.I then say 'I would do that', and then they ask 'Why would you do that?'.But then they go back and do it their own way anyway.So, we are stubborn after all."(P2)

DISCUSSION
Explainability is critical for the integration of AI-based CDSSs in pulmonary medicine.By interviewing clinicians working on IPF, we identifed several tensions around clinicians' explainability needs.While general, non-technical explanations were preferred, their patient-centric commitment called for patient-specifc explanations that could enable them to maintain agency over clinical decisionmaking.Furthermore, clinicians might face challenges in engaging and understanding explanations.Results from the interviews are summarised in Table 3.We now discuss the implications of our results for future research.

Integrating (X)AI in Medical Workfows
Erasmus MC is a university medical centre where research and clinical workfows are intertwined to provide high-quality care.Because of their commitment to patient-centric care, our respondents were more welcoming of technological advances and cutting-edge treatments found in pulmonary medicine literature.Despite the presence of healthcare protocols at the national level and within Erasmus MC, our respondents were not afraid to stray from such predefned "ideal" pathways if benefcial to patients.In this sense, our respondents often referred to their 1-on-1, or group (e.g., in

Theme Key Insights
Theme 1: "With paracetamol, you don't know exactly how it works" 1) General explanations about system afordances are preferred, if free -About Explanatory Depth (subsection 6.1) from technical jargon.Confrms [30].
2) Uncertainties around patients and disease motivate the need for local explanations that surface patient-specifc factors.Confrms [173].
3) Clinicians gravitate towards multi-modal explanations that cover multiple information sources.Theme 2: "So that you can see the clinical progression" -About Explanation 1) High variability in IPF pushes clinicians to seek diverse explanations Dynamics (subsection 6.2) throughout patient care, even when re-examining patients.
2) Explanations should support group dynamics within the medical équipe and promote clinicians' agency of AI-based CDSSs.
3) Explanations' relevance can diminish in time as doctors learn when and how they can use a CDSS.Theme 3: "Then it doesn't have as much value to me" 1) Compactness of the explanations afects willingness to engage with -About the Goodness of Explanations (subsection 6.3) them and the comprehension process around the capabilities of a CDSS.
2) Interactivity can modulate the details included in explanations, allow for follow-up queries, and contribute to clinicians' sense of agency.
3) Actionability of the explanations relates to charting the next course of action, not to altering the output of the system.
Table 3: Summary of the themes and insights obtained by interviewing IPF clinicians.
MDOs), exchanges with colleagues as the benchmark for sharing and contrasting their perspectives.Their patient-centric commitment appears, however, to be in tension with issues related to the working environment: slow-moving (albeit trustworthy) administrative processes, shortage of staf, and hurdles in securing funds.Integrating explanations could spur further roadblocks in such an environment.Indeed, regardless of the presence of explanations, participants often underscored the idea of letting the AI generate its output and then enabling them to take ownership of disregarding that suggestion or testing it frsthand with patients (P5).Prior work has also highlighted similar behaviour in non-expert end-users of AI systems who, while valuing interpretability, prioritized accuracy [127].
7.1.1Taking a Longitudinal Perspective.Prior research in HCI has proposed and investigated a plethora of tools aimed at supporting healthcare professionals in their activities [11,29,30].Despite their valuable insights, those works often report about snapshots in time and do not account for the temporal dimension of supportive tools for clinical decision-making.While our work only provides qualitative pointers toward how clinicians' needs around XAI might evolve over time, we believe future research around developing and testing explainable CDSSs should adopt a longitudinal angle (e.g., [133]).Designing and conducting longitudinal studies is resource-intensive.However, they could give a broader perspective and grounding around users' explainability needs around CDSSs in addition to specifc preferences -enquired in [39,41,53] -on existing explainability methods.For instance, for diseases as uncertain as IPF and according to our respondents, it would be very easy for a hypothetical explainable system to be incorrect.Upstream system validation, either from a technical ("training", "validation", and "test" approach) or clinical (controlling for patient cohort) standpoint, would only provide partial reassurance.Directly testing a CDSS in practice with a range of real patients (e.g., as in [21,84,86]), would instead supply clinicians with enough information to determine the usefulness of such a CDSS.We stress that we do not argue for fully automating some of clinicians' activities, but rather resonate with our respondents in concealing CDSSs as recommenders over which clinicians maintain full agency (e.g., through reject options [75]) -both during and after testing a CDSS.Finally, on a general note, journey maps can provide an informative scafolding for enquiring about the temporal dynamics of user needs.For instance, they have been proven useful in retail to understand customers' behaviour, feelings, and attitudes [180].However, journey maps are contextspecifc, potentially challenging to create (or dependent on data availability; subsection 4.1), and that warrant attentive validation.
7.1.2Avoiding Shiny Objects.It is clear that nowadays advances in AI happen at breakneck speed.The same can not be said for healthcare, and for good reasons.Even at Erasmus MC new discoveries and tools from medical research do not immediately alter, or disrupt, existing practices.Reproducibility and clarity of evidence are foundational for clinical adoption.As Topol [161] said (later echoed by Antoniadi et al. [10]), AI-based tools in medicine are "high on promise and relatively low on data and proof ".While the promise of better-performing AI systems sounds enticing, we argue it is important not to fall pray of the Fear of a Better Option22 when evaluating prospective CDSSs.We concur with our participants in viewing human agency and alignment with clinical practice as more important than accuracy-based metrics on datasets which, despite being purposed for similar tasks (e.g., classifying nodules malignancy (P5)), might include a sample of patients with (very) diferent demographics.This also holds for explanations as the majority of XAI research regularly focuses on a small subset of criteria when evaluating explanations, or devises ad-hoc benchmarks that obfuscate potential pitfalls of the explainers being proposed [57].
We suggest that future researchers investigating explainable systems for healthcare frst gain a clear understanding of the needs and requirements of end-users -something that disciplines like software engineering [80] have been advocating for decades -and then seek a balance between such requirements and the technical prowess (i.e., raw performance) of prospective systems.Prior HCI works [29,144,163] shed light on some of these aspects.Indeed, we acknowledge this to be a perilous but worthwhile path to follow given the unstable nature of the current generation of AI systems and their diverse and contextualised interpretations [19].

Explanations are Part of Conversations
Our study showed the importance for clinicians to access explanations that have a translation to clinical practice, which they regularly referred to when discussing their expectations for useful CDSSs.Our participants, particularly, saw explanations as a means to provide credibility to clinical decision-making.Despite the partial overlap in results with prior work [41,53,159], our participants also viewed explanations as support tools within medical workfows.That is, something clinicians could bring up in individual discussions with colleagues and larger multidisciplinary meetings as additional data, evidence, or doubt on which to deliberate.

Informational and Transactional
Needs.The participants, unsurprisingly, saw CDSSs as additional tools at their disposal capable of surfacing relevant information either corroborating their viewpoint or novel and insightful.The information-seeking process of our participants resembles the one outlined by Sivaraman et al. [144].If a CDSS' recommendation, and associated explanation, are aligned with clinicians' judgement, they would treat it as evidence and move on with itsimilarly to how they interpret lab test results.Conversely, if misalignment is present, they would ignore the machine recommendation if under pressure.Instead, if partial alignment is present, clinicians might postpone the fnal decision and seek the opinion of a more knowledgeable colleague (e.g., with more years of experience, or a diferent specialisation).While the fnal goal might remain the same (e.g., making a diagnosis), in the latter case the nature of the information-seeking process shifts from informational [83] -clinicians intend to satisfy their information needs -to transactional [83] -clinicians intend to locate a diferent source of information to satisfy their information needs [52].
Future research could further investigate this phenomenon and connect with ongoing eforts around explanations that provide both evidence and criticisms for a machine recommendation [28,123] or that can be selected based on users' input and goals [102].Our participants indirectly underscored this aspect when discussing data used to build AI systems.If a particular patient is under-represented in a cohort, an AI-based CDSS might exhibit, e.g., popularity bias [1,148], skewing its recommendations and generating incongruency with the patient-centric commitment of our participants.Furthermore, while our participants displayed reluctance to blindly trust AI-based CDSSs, prior work uncovered issues of anchoring bias [162] in medical settings [11] related to when both AI suggestions and explanations are served (e.g., before clinicians' decisionmaking process).While we did not use a prototype system, it is conceivable that depending on the nature of the AI-based CDSS (e.g., proactive monitoring or reactive diagnosis support) diferent delivery strategies, and their timing, should be further investigated.

Opportunities for the Design of Explanations.
Comparatively to the extensive work on AI-supported clinical decision-making (e.g., [29,73,92]), the design of explanations for clinical scenarios has received little attention.Oftentimes, prior works make use of explainers that are readily available [39] which, however, are not necessarily aligned with clinicians' needs.In this context, we echo prior studies around HCXAI (subsection 3.2) around relaxing the predominant techno-centric view on XAI and extending the defnition of "explanation" beyond AI system output to include users' needs and purposes for explanations.
Besides the broad need for explanations connected to clinical practice, our results show that the temporal dimension of users' needs and purposes largely afects how explanations are used, if at all.Throughout patient care, clinicians drift from general explanations about AI-based CDSSs (sought during the early adoption phase, confrming [30]) toward local, patient-specifc explanations that beneft their practice more directly (subsection 6.1).For the former, several artefacts already exist in the form of documentation for the underlying models [124] and the training datasets [60].However, those works target a diferent audience (i.e., developers) and not clinicians.Future research in this area could focus on tailoring such artefact for clinicians similar to how Rostamzadeh et al. [140] adapted 'Datasheets for Datasets' [60] for healthcare or how Anik and Bunt [9] explain training data to end-users.
Instead, for local explanations, while our participants gravitated towards multi-modal explanations, several factors condition their use (subsection 6.3).Naively, multi-modal explanations could be achieved as a combination of existing XAI methods that is later rigorously tested [42] with clinicians to ensure its ecological validity.On a deeper level, we advise future research to be directed towards interactive explanations [125,174] and, particularly, selective and mutable [18] explanations.Selective explanations would enable clinicians to decide when to interact with them, change visualisation modality, and tweak the granularity of the information (subsubsection 6.3.1).This would allow clinicians to quickly glance over an explanation and, if necessary, expand to view additional details, e.g., medical literature and pose more questions.Selectivity directly relates to clinicians' perspectives on the visualisation modalities (subsection 6.1) and their desire to maintain agency (subsection 6.2).For instance, our participants desire explanations grounded in medical literature (e.g., when creating a treatment plan).Mutable explanations expand these ideas to encompass testing hypotheses and comparison of diferent circumstances.Clinicians could use these explanations to inquire about how a patient would react to treatments by tweaking and inspecting the explanation.Furthermore, in case multiple AI models are implemented within the same CDSS, mutable explanations would allow clinicians to reconfgure the system and get a variety of suggestions and explanations.In turn, mutable explanations support clinicians in navigating diverse sources of information (subsection 6.1), e.g., patient data, and medical literature.Finally, we stress the importance of adopting a longitudinal view to both complement existing work in the area and better inform how selectivity and mutability should be designed, implemented, and evaluated (subsection 6.3) given clinicians' specifc needs.

AI Literacy: an Absentee in Medical
Curricula?
Our results surfaced a diverse spectrum of explanations, both in terms of depth of the information (subsection 6.1) and interaction moments throughout patient care (subsection 6.2).Despite the desire for explanations, participants often mentioned that such information should not be too technical as there is no need, at all times, to know how a CDSS might work in the background.However, regardless of the inclusion of technical information or jargon, this raises the question of whether clinicians (even beyond pulmonary medicine) possess the background knowledge to properly interpret explanations of AI systems.While specifc institutions and domains (e.g., radiology) may provide some level of AI training during residencies, prior work in other disciplines [104,112,172], to the best of our knowledge, has yet to address AI literacy in healthcare staf and students beyond self-reported measurement scales [90].Oftentimes, there is no direct connection between medical knowledge and the decisions, or decision-making process, of an AIbased CDSS.For instance, causality is a crucial factor in medicine for an efective, efcient, and satisfactory clinical decision-making process [79].However, despite ongoing eforts from researchers in causal ML and XAI [20,69], it is still an elusive concept within the current generation of AI systems.
We concur with some of our participants on the impending need for a broader educational support around AI literacy [112]; seemingly missing from several medical curricula.We note that the need to include AI literacy was explicitly voiced (P1), and asked to us (P2), by participants with more medical experience.Their concerns were related to the over-reliance that younger clinicians could manifest when using CDSSs during training.They worried about inexperienced clinicians leaning too much on those tools rather than learning from more experienced colleagues and potentially reaching a point where they do not know any diferent.We do not argue for a radical shift towards a technical imprint in medical curricula but emphasise the need for introducing basic notions of AI early -additionally to what is provided via residency training -so that clinicians are better equipped to critically evaluate the outputs and explanations of AI systems.Close interdisciplinary collaborations between healthcare and computer science professionals (in spirit, similar to [153]) could assist such an endeavour.AI systems, in general, do not hold any communicative intent [16] and, as such systems (technically) advance, so does the risk of plausible-looking decisions, recommendations, and explanations.As Bender and Koller [16] argue, systems (e.g., large language models) purely built on form, do not have a way to produce meaning.In this sense, prior HCI work has investigated the potential efects of biases and misunderstandings of AI's capabilities [11,38,55].As AI-based support systems manifest within healthcare, developing education around AI for healthcare can be benefcial for clinicians and patients alike.However, special consideration should be taken as that could come at the cost of longer medical studies or compromise with existing courses and training activities.

Limitations
Our study has several limitations but future work could bring triangulation to our results [130].First, we limited our enquiry to Idiopathic Pulmonary Fibrosis.While this helped in contextualising participants' responses, the themes we constructed might not apply to other diseases within pulmonary medicine.For instance, clinicians specialised in lung cancer are familiar with IPF because the two diseases can co-occur.However, they can follow diferent care and treatment paths.Second, we conducted the study in a European university medical centre.Our results may not be transferable to other countries in the Global North with potentially diferent healthcare systems.Additional inconsistencies could arise when attempting to replicate a similar study in the Global South as prior work discussed diferences in healthcare systems [8] and inequalities exacerbated by the use of AI [128,177].Finally, while we found grounding for the XAI exemplars in the literature (subsubsection 5.2.1) and showed them only after participants disclosed their explainability needs, it is conceivable that those prompts generated anchoring bias in our participants.Similarly, while we found the journey map to align well with our participants, the participatory nature of our preliminary study could have exacerbated power dynamics between the participants and how they perceived, and agreed, on the journey map.

CONCLUSION
In this work, we involved 16 clinicians, from a European university medical centre, working on Idiopathic Pulmonary Fibrosis to enquire about their uses and purposes for Explainable AI and how these evolve over time throughout patient care.First, with the help of 4 clinicians, we outlined a patient journey map for IPF to provide scafolding for our research.Then, we conducted 12 semi-structured interviews to outline the evolution throughout patient care of clinicians' uses and purposes for XAI.We showed that several tensions arise in relation to their needs around explainability for AI-based CDSSs.Clinicians seek diverse explanations -both in content and modality -to cope with patients' dynamics and uncertainties of IPF.While general explanations of the afordances of AI-based CDSSs are valued in early adoption phases, local explanations -especially multi-modal ones -are anticipated throughout patient care to surface patient-specifc features.By adopting properties of goodness of explanations as an interpretative lens, we corroborate ongoing eforts by the HCI community around extending the scope of XAI beyond AI system outputs and how it is evaluated.Particularly, we found compactness, interactivity, and actionability of explanations to be key drivers for clinicians.However, our results also highlight the diminishing relevance of explanations as clinicians learn when and how to use such systems.We concluded by refecting on the lack of longitudinal perspectives in researching XAI for CDSSs, implications for the design of explanations in clinical settings, and the role of medical education in further promoting AI literacy.

Figure 1 :
Figure 1: Salient moments during care for Idiopathic Pulmonary Fibrosis.Clinicians seek support from AI-based Clinical Decision Support Systems (CDSSs) provided an explanation (in boxes) is present.Due to the dynamic uses and purposes of clinicians, diferent explanations (both in content and visualisation modality) are expected throughout patient care.

Figure 2 :
Figure 2: Screenshots of the XAI system prototype used in the preliminary study.Here we show the information about Mark, a

Prediction
Exemplar for numerical explanations referring to lab test results.Rule: Because Zv2 above 80%, young age and high risk employment in line with factors leading to having disease A fatigue and nutrition problems, together with small nodules on the HRCT scan it has been predicted to be A and not B. Diagnosis Textual (c) Exemplar of textual explanations reporting the rationale for a diagnosis.
Exemplar of visual explanations based on a chest X-ray.Reasoning: Because of the lower lobe prominence of the scarring in the lung and the absence of .Multi-modal exemplar combining visual and textual elements.Reasoning: Scarring pattern in the left and right lower lobe of the lungs matches with Disease A and not Disease B on the left Not match with Match with Diagnosis of disease A Mixed (f) Multi-modal exemplar combining rules, visual, and textual elements.

Figure 3 :
Figure 3: Explanation exemplars we prepared for the interviews with clinicians working on IPF.Inspired by[95,168].The contents of explanations are plausible but fctional.

"
That very much depends on what question you ask.Is it what percentage of fbrosis does a patient have?Then of course you want a picture like this [points at Figure 3d].If you ask what diagnosis your patient should give, then the factors and rules are more helpful [points at Figure 3b].If you want to decide on treatment, then you also want to see a decision tree like that [points at Figure 3f]."(P9)

Table
• the prospective interview prompts, by contrasting a patient journey map and an XAI prototype

Table 2 :
Interview participants' details and background.Years of Experience refers to the years they have spent in that role or have had that title.VLD: vascular lung diseases; ILD: interstitial lung diseases; OPD: obstructive pulmonary diseases.
6.1.1General System Explanations.First and foremost, participants highlighted the need for general explanations that would surface 1) the capabilities and limitations of AI-based CDSSs as well as 2) details about the patient cohort data used during development.Our participants saw AI-based CDSSs primarily as enhancing tools for which they do not need to know the underlying technicalities.P4 exemplifed this need by equating AI systems, and their explanations, with paracetamol, i.e., a medication about which not everything is known but it is used because of its benefts.
In this sense, participants expressed uneasiness around technical jargon which felt irrelevant in practice (P1, P3, P7).The only outlier was P6 who, given prior experience with AI algorithms, wanted explanations to cover concrete implementation details such as models Patients Figure4: Salient phases of the IPF journey map (the complete journey map is shared as supplementary material).For each phase, we highlight our results and indicate the corresponding section.Additionally, we specify what content our participants found relevant at each phase of the journey map.Initially, clinicians seek general explanations that cover, e.g., the demographics of the patients whose data was used to build the CDSS.Then, clinicians desire explanations that cover patient-specifc factors.For instance, if an AI-based CDSS is meant to support diagnosing IPF, its explanations should cover lab test results and possible comorbidities.Overall, clinicians see explanations as means to safeguard agency over clinical decision-making and learn to use AI-based CDSSs.Specifcally, clinicians found interactivity (e.g., being able to ask further questions based on the explanation) and actionability (i.e., explanations that contribute to their clinical practice) to be key properties of explanations.6.1.2Local, Patient-specific Explanations.In relation to their patientcentric commitment (P1), participants acknowledged the value of local explanations to surface patient-specifc factors that, in turn, would help them provide high-quality care.