"I'd be watching him contour till 10 o'clock at night": Understanding Tensions between Teaching Methods and Learning Needs in Healthcare Apprenticeship

Apprenticeship is the predominant method for transferring specialized medical skills, yet the inter-dynamics between faculty and residents, including methods of feedback exchange are under-explored. We specifically investigate contouring: outlining tumors in preparation for radiotherapy, a critical skill that when performed subpar, severely degrades patient survival. Interviews and design-thinking workshops (N = four faculty; six residents) revealed misalignment between teaching methods and residents who desired timely, relevant, and diverse feedback. We further discuss reasons: overlapping learning content and strategies to ease tensions between clinical and teaching duties, and lack of support for exchange of cognitive processes. The follow-up survey study (N = 67 practitioners from 31 countries), which contained annotation and sketching tasks, provided diverse perspective over effective feedback elements. We lastly present sociotechnical implications in supporting faculty’s teaching duties and learners’ cognitive models, such as systematically leveraging senior learners in providing case-based guidance and supporting double-sided flow of cognitive information via in-situ video snippets.


INTRODUCTION
Apprenticeship models of training have been key in transmitting specialized knowledge and skills from experts to novices, particularly in critical domains that involve complex cognitive processes and require high quality task completion, such as healthcare.Apprenticeship refers to direct observation and supervision between learners and experts until the apprentice is profcient enough to accomplish the task independently.While traditional apprenticeship involves learning a physical and tangible activity, many specialized practices contain less visible, yet cognitively complex tasks.Cognitive apprenticeship [14] is a model that aims to make internal cognitive models more visible by following six principles of learning: modeling, coaching, scafolding, articulation, refection, and exploration.Healthcare is a domain that contains many high-stakes, specialized, and cognitively-complex tasks, in which cognitive apprenticeship is particularly suited as the predominant training model.Medical residency programs, and specifcally the task of contouring in radiation oncology, is a unique case in that despite relying on apprenticeship methods of teaching, is prone to detrimental mistakes which can stem from scarce availability of expert faculty and subpar training methods.Contouring is a high-stakes task that refers to the identifcation of tumor and organs at risk during the radiation treatment planning process.Poor radiation planning occurs at a large scale and leads to detrimental consequences for patient well-being.Over-and under-contoured plans lead to excess toxicity to the nearby healthy organs, or insufcient radiation to the tumorous cells which will increase the risk of disease recurrence.Clinical trials reveal that protocol violations -which can occur up to a staggering 81% of radiation plans [39] -can decrease patient survival by 22% [94].Given how radiation oncology faculty possess a dual role of clinician and teacher, when availability is limited, the clinician role takes absolute priority over teaching duties, potentially contributing to a subpar apprenticeship model of training.As such, it is imperative to understand the existing mechanisms of contouring education and examine the dynamics of feedback exchange between the faculty and residents in the apprenticeship model of residency programs.This paper explores the dynamics between faculty and residents in healthcare apprenticeship, and especially the methods of feedback exchange in the transfer of contouring skills in radiation oncology.Interviews with four faculty and six residents identifed existing training strategies and revealed residents' perceptions, such as 1-on-1 contouring watch-alongsi.e., when a faculty contours an entire case and thinks out-loud their processes as the resident watches -which residents found tedious and marginally benefcial.Instead, learners emphasized the importance of timely, targeted, and diverse feedback, as revealed by two design-thinking workshops in which participants designed their ideal contouring feedback interfaces.The created designs later shaped the content of a refective-style survey study that aimed to assess the efectiveness of granular elements of these interfaces given a diverse population of physicians, including 67 practitioners from 31 countries.We discuss three socio-technical fndings arising from our studies that have implications not just for contouring education, but also for broader healthcare apprenticeship models: (1) we note that the faculty's dual role of clinician and teacher leads to the design of learning content and strategies that are not fully aligned with the learners' skill-level, but aim to satisfy clinical duties at the same time.
(2) we report on how healthcare apprenticeship aligns more closely with a traditional model, and lacks efective support for articulation, refection, and exploration of a cognitive apprenticeship model.
(3) we propose practical sociotechnical solutions that aim to mitigate points (1) and ( 2), such as, leveraging peer resident resources, and aggregating variability and promoting deliberation.
The fndings from this paper contribute to a multi-faceted understanding of healthcare residency programs via the cognitive apprenticeship model, and further ofers key sociotechnical considerations for introducing computer-supported training tools in healthcare.

BACKGROUND AND RELATED WORK
This section describes cognitive apprenticeship and Human-Computer Interaction (HCI) systems that support this model of training, contouring process, and importance of user interface design to support healthcare training.

Curricular and Technological Support for Facilitating Cognitive Apprenticeship
While traditional apprenticeship is an efective instructional model for transferring physical skills from on-site supervision of an expert, cognitive apprenticeship [12,13] focuses on developing stronger mental models and metacognitive skills, especially in tasks that are not fully observable [6,43].In other words, cognitive apprenticeship elevates the precursory model by making the tacit knowledge of experts explicit [81] using a six-step principle, as defned in Table 1: modeling, coaching, scafolding (which comprise the traditional model), followed by articulation, refection, and exploration [13].Broadly, the frst three steps are the core principles of traditional apprenticeship.The additional Articulation and Refection steps Table 1: The six principles of cognitive apprenticeship, formulated and defned by Collins et al. [12].The frst three principles comprise the traditional model of apprenticeship.

Modeling
Expert performs specialized task and externalizes internal processes and activities, while learner observes.

Coaching
Learner performs specialized task, while expert observes and ofers feedback, including hints and reminders.

Scafolding
Expert diagnoses learner's skill level and task difculty, and adjusts time and content of feedback accordingly.Articulation Learner articulates their knowledge, reasoning, and internal processes, while expert assesses learner's understanding.

Refection
Learner compares their problem-solving processes with a cognitive model of expertise involving processes of expert or peer learners.

Exploration
Expert encourages learner to pursue and solve new problems independently by setting relevant learning goals.
aim to highlight the expert's model of problem-solving, and also encourage learners to gain control of their own problem-solving strategies.The last step (i.e., Exploration) fosters learner anatomy, not just in terms of problem-solving, but also problem-setting.Many educational programs ofer heuristic strategies and logistical support to implement cognitive apprenticeship in diferent learning tasks, such as reading [60,61], writing [71], multimedia design [52], high school science [68], college math [74,75], doctoral research methods [21], and healthcare [6,69].A primary principle for these methods is to guide learners to think through and solve problems similarly to how an expert approaches it: For instance, Scardamalia et al. [71] construct a sophisticated set of procedural heuristics according to novices' "knowledge-telling" v.s.experts' "knowledge tansforming" [70]: while novice writers tend to immediately produce text by writing down ideas sequentially, experts spend time not only on writing, but also planning and revising a cohesive story.Healthcare research has also explored and implemented cognitive apprenticeship strategies in diferent contexts such as psychiatric nursing college [45], trauma life support course in a medical school [18], and junior radiology residency curricula [88].Given the need for teaching specialized medical skills in high-stakes clinical domains, more research is key to capture the intricacies of diferent felds and potentially contribute to a holistic understanding of cognitive apprenticeship in healthcare.This work sheds light on dynamics of the existing apprenticeship model training (in the case of contouring in radiation oncology) and reveals a lack of support for developing the internal cognitive models of learners.
In addition to instructional programs, HCI and Educational Computing literature further introduced computer-supported tools to support apprenticeship [83,84,89].To improve the scale of apprenticeship among crowdworkers, Suzuki et al. [84] introduced Atelier which matched less experienced workers (i.e., mentees) with others who are more skilled (i.e., mentors) and facilitated micro-internships as the mentee completed real-world tasks and received feedback from the assigned mentor.Cognitive Apprenticeship Web-based Argumentation (CAWA) [89] aimed to facilitate cognitive apprenticeship in large classroom settings by providing individualized assistance in articulating, refecting, and exploring skills related to argumentation, an important component in STEM education.Yin et al. [83] developed a system that addresses an important limitation of apprenticeship in endodontic surgery: assessing the practice outcome (in a virtual reality simulation) and providing formative and individualized feedback.In healthcare, given the physician experts' dual role of clinician and teacher, patient care takes absolute priority over teaching [66].As such, computer-supported tools that provide adaptive and timely feedback can enhance the overall cognitive apprenticeship and lead to better medical training and patient outcome.Following design-thinking workshops and refective-style survey studies, this work explores efective feedback elements of computerized support that can mitigate pedagogical duties of faculty while enhancing learning experience of residents.

Contouring: Background and Learning Resources in Residency Programs
Radiation oncologists perform contouring -using desktop based softwares such as MIM 1 and Eclipse2 -by repeatedly drawing 2D contours on relevant image slices to encompass the 3D volume of the tumorous tissues.While the fnal contours on CT scans infuence dose calculation, diferent types of images and planes can inform decision-making: for instance, physicians use MRI images to treat brain cancer, because brain organs appear more distinctly in these scans compared to CT images.The oncologists can also consult diferent orientations of the same set of images to inform anatomy of structures.
Contouring is considered the weakest link in radiation oncology treatment [58] due to substantial variability in providers' contours [26] and mistakes that lead to detrimental consequences for patient safety and survival.Radiation plans that deviate from protocol specifcations substantially decrease survival compared to patients with compliant radiation plans: for instance, two clinical trials in head-and-neck cancer revealed 20% and 22% decrease in survival due to protocol violations [63,94].In addition, clinical trials reveal sobering insights into the high frequency of poor contouring: a study on anal cancer found that 81% of radiation plans had "incorrect contours" [39] and 70% of contours on brain cancer cases were "unacceptable" [22].
While auxiliary educational resources (e.g., atlases) and emergent virtual reality tools [8,9] can improve contouring skills, direct learning from the attending faculty remains as the main method of training in residency programs.Medical reference aids (e.g., atlases and books [30,47]) can mitigate the existing variability and improve contour agreement [15].In practice, however, sub-optimal methods of development, delivery, and access hinder potential benefts from these resources [31][32][33].One strategy to improve access to contouring guidelines is web-based 3D atlases: as an example, eContour3 [77] is a browser-based atlas that can improve contouring accuracy and anatomy knowledge [27], and further demonstrated higher usability and learnability [62].Recent works explored cross-device and on-demand feedback strategies in terms of percentage of overlap with expert contours and step-by-step guidance on regions of interest [98,99].Despite the existing medical reference aids, receiving one-to-one supervision from the faculty (in an apprenticeship model [73]) remains the main method of training in contouring education, as also seen in many other residency programs (e.g., psychiatry, surgery, and radiology) [23].Residency programs in radiation oncology assign residents to one expert faculty at a time (a.k.a.attending physician) with residents learning contouring practices by observing the faculty's general workfow and re-creating their processes.This work aims to improve contouring education by examining dynamics of feedback exchange between radiation oncology faculty and residents, and further ofering practical sociotechnical solutions.

Impact of User Interface Design on Decision-making and Training in Healthcare
Many healthcare-focused HCI research investigated improving tools and interfaces used by single clinicians, while many CSCW papers in medical domains outlined problems and opportunities for designing interfaces that foster collaboration in clinical teams, with some recent works exploring Human-AI interaction in diagnostic settings.This section provides a brief overview of the relevant HCI and CSCW research, and situates this work (and the broader healthcare training) in the existing literature.
Starting with the work of Grudin in the late 80s [35], a considerable number of the HCI and CSCW literature focused on understanding why applications built for collaboration in the workplace fail to achieve their goals.Grudin attributed the lack of contextual research [92] to this failure, and Ehn and Kyng [19] advocated for better understanding of the stakeholders by "working beside them a long time in order to develop a new system that is owned by the workers".Building on this research, Markus and Connolly [55] argued that the adoption of tools that are used in a multi-user setting in the workplace heavily depends on the interdependence in the payofs of diferent users.To understand these tensions in healthcare -in which multiple stakeholders need to engage in decision-making and agree on terms that will lead to life or death outcomes -more recently Schaekermann et al. [72] studied factors that lead to experts' disagreements and their justifcations, and how the presentation of the data is key to engage in efective decision-making.This is true in terms of both medical time series data [72], but even more importantly when data have a higher degree of interpretability such as in medical image-based comparison [7,95].Specifcally, Cai et al. [7] outlined how tools in the context of image retrieval systems for medical decision-making need to facilitate interaction across clinicians and AI-aids, in such a way that clinical teams can trust and efectively While these HCI and CSCW works (among others) are key to advancing the efective development of interfaces for the practice of medicine, a large part of the clinical experience involves training medical students and residents.Learning how to use these systems and interfaces is an important part of the learning experience, but very often, the same tools that are good at delivering care, have not been designed to support training and efective decision-making for trainees.As laid out by Markus and Connolly [55], to make an interface successful, we need to look at the interdependence in the payofs of the diferent users, and one of the users in this case is a trainee (e.g., resident) who learns from the expert (e.g., attending faculty).While there is a lack of research specifcally in healthcare training, prior works in other educational settings showed benefts of careful interface design for learners: for example, recent work showed how particular user interface add-ons can alleviate confusion and enable learners to better understand the expert content communicated to them [97], and how referencing back to material that the learner previously engaged with increases satisfaction and results in more efective learning [100].
This paper takes a user-centered design approach that aims to surface similar paradigms in the context of healthcare training, specifcally for the case of image-based comparison and radiation oncology.After careful examination of the context and defning the existing interrelationships between residents and faculty, this work frst explores efective feedback design elements, and later ofers practical sociotechnical guidelines that improve contouring education, and more broadly, healthcare apprenticeship.

METHODS
This study followed a two-step user-centered design protocol.Through the ofcial residency mailing list of the Department of Radiation Medicine at UC San Diego Health, a large research and teaching hospital in Western USA, we invited all residents and faculty to participate in our study.In the frst step, four faculty and six residents (Table 2) participated via interviews to demonstrate main contouring processes and methods of feedback exchange in residency programs.The same set of faculty and residents also took part in two separate design-thinking workshops that aimed to empower the physicians to refect on the existing training breakdowns by producing design mock-ups for contouring feedback interfaces.This separation aimed to foster expressing authentic impressions, minimizing the risk of confict avoidance [85] due to hierarchical power diferences between the faculty and residents [48].The second step involved collecting diverse and granular feedback on the produced mock-ups via a survey study distributed among radiation oncologists globally (including both residents and faculty).The Institutional Review Boards (IRB) approved this study protocol.

Participants
We recruited participants through the ofcial mailing list of the radiation oncology residency program at the UC San Diego Health, one of the largest research and teaching hospital in the United States.This is one of the largest radiation oncology programs in academic settings, consisting of 12 active faculty and nine residents at the time of the study.To increase traction, one of our collaborators (and an attending faculty in this program) distributed the recruitment call.Four faculty (33.3% acceptance rate) and six residents (66.6%) accepted our invitation to participate in the study.
The residency program at this hospital follows an apprenticeship model of training, in which residents learn contouring and engage in real-world clinical tasks under 1-on-1 supervision from their attending faculty.The residents also rotate with diferent faculty who are specialized in particular disease sites, such as head/neck and prostate.These rotations can last between 6 weeks and 3 months.Beyond the general structure, the underlying pedagogical and feedback exchange methods are fexible and implemented ad-hoc by the faculty.This paper aims to uncover these methods through interviews from the perspective of faculty and residents.
Besides atlases and guidelines, this residency program lacks specialized learning tools for supporting contouring education.The common tools used by the residents are the same software used for clinical purposes, including MIM and Eclipse as displayed in Figure 2.While these tools provide a plethora of features to assist clinicians in contouring and navigating through medical images, they do not facilitate training and feedback exchange.We especially targeted this gap via the design-thinking workshops in order to explore computer-supported learning interfaces for contouring.

Faculty Interviews. Four radiation oncology faculty participated in one-hour interviews that comprise two steps:
(1) The faculty demonstrated a short contouring session using their preferred software and medical case.They also expressed their thought processes out-loud, such as how they set up contouring sessions, what images they used, and where in the screen they looked.The researchers minimally interrupted, except only when the participants had not spoken for a while, and took notes of key events and explanations.This step familiarized the researchers with the general procedures involved in contouring.
(2) Semi-structured interviews started by asking clarifying questions about the researchers' observations in the The two workshops aimed to guide participants in designing "ideal contouring feedback interfaces".
think-aloud step.Then, the researchers asked questions that aimed to reveal the faculty's workfow when training residents.Topics included feedback exchange strategies with junior and senior residents, and frequency of training opportunities.

Faculty
Workshop.The same four faculty participated in a two-hour remote design thinking workshop that aimed to create their ideal contouring feedback interfaces.Design thinking workshops provide a human-centered framework for problem solving [40], and foster exploring needs and ideas for particular stakeholders [51].Due to the collaborative nature of design thinking methodology, these workshops are commonly conducted in-person, yet circumstances such as pandemics and distant participants can call for remote accommodation.
Inspired by the Wallet project [17], our remote workshop contained fve phases (see Figure 1).The two faculty pairs frst gained empathy of their partner's contouring practices, and then defned their needs around teaching and learning of contouring skills.After understanding these needs, each pair proceeded to collaboratively generate solution ideas (by sketching designs on Google Doc) and created digital prototypes (using LucidChart4 ).These two tools were shown to be sub-optimal when conducting the remote workshop, as the participants found Google Doc unreliable in terms of formatting, and lacked familiarity with features of LucidChart.Lastly, each pair presented their design to the entire group and received feedback.Overall, two interface mock-ups emerged from this session.

Resident Interviews.
Following the faculty workshop, six radiation oncology residents participated in remote, one-hour interviews, following three steps: (1) The residents frst flled out a brief survey on their background information (e.g., age and prior medical school), primary contouring tools, and training strategies (e.g., educational resources and feedback mechanisms).
(2) The residents then contoured a case of their choice without narration while the researchers recorded these sessions.Later, the participants watched these recordings back and provided explanations and thought processes around their contouring decisions and confusion points.Retrospectively thinking out loud aimed to lessen the cognitive load of learners [90], since it can be challenging to simultaneously perform contouring tasks and verbalize thoughts, especially for early residents.
(3) The fnal 15 minutes prompted resident impressions on the feedback interface mock-ups created during the faculty workshop.The researchers presented and described both prototypes at once, because showing alternative design solutions can produce stronger and more authentic criticisms [86].The residents then evaluated the two interfaces by describing their desired and undesired features.

Resident Workshop.
The design thinking workshop with residents followed the same procedure as the faculty workshop: the researchers introduced the same objective (i.e., designing an ideal contouring feedback interface) and facilitated similar steps (displayed in Figure 1).Due to the logistical challenges faced in the frst workshop [96] i.e., formatting issues and tool unfamiliarity, as described in section 3.2.2) -this workshop incorporated Google Slides for both note-taking and prototyping.

Survey Study.
With the goal of enhancing feedback diversity and granularity in the user-centered design protocol of this study, we distributed a survey to collect impressions on the created interface mock-ups.To further elicit refective user feedbackengaging participants beyond surface level "look and feel" concerns [79] -the survey guided the participants through a mix of Likert-scale questionnaires and in-depth tasks of annotation and sketching [87].The survey was designed using Jotform5 because of the existing multi-modal features beyond simple text-based questionnaires, and later deployed among 2, 500 most active global users of eContour [77], a popular contouring atlas.The survey contained four sections (and full questions can be found in Appendix B): (1) Background information: The survey started with a demographics section to collect basic background information from survey takers, including age, gender, profession, place of residence, and years of contouring experience.
(2) Perceived usability and learnability: The second part of the questionnaire frst presented the interfaces and provided short descriptions, and then incorporated four Likert-type scale questions to gauge usability and learnability of each interface, two central pieces in successful design and deployment of learning technology.Usability refers to users' evaluation of the usefulness and completeness of interface functions, and learnability determines to what extent the respondents preferred the mock-ups for their learning processes.Inspired by surveys on usability [5] and learnability [46], the following questions were incorporated: • "I think that I would use this interface frequently." (Usability) • "I found the various functions in this interface well integrated." (Usability) • "With this interface, I would be more interested to learn the topics." (Learnability) • "With this interface, I would learn to identify the main and important issues of the topic." (Learnability) While the original questionnaires on usability and learnability contained more questions, incorporating only four statements aimed to reduce the load for the survey takers, which would potentially improve retention and leave more time for the other parts of the survey.
(3) Liked and disliked features: To granularly assess perceptions of interface features in each mock-up, the third part of the survey incorporated a brush tool to prompt annotation directly on the interface designs.Two colors were provided: green for "liked" areas, and red for "disliked" regions.Each interface further contained an open-ended text box to enable additional justifcation on the selected regions.
(4) Interface design from scratch: The last section provided space for survey takers to sketch their own "ideal contouring feedback interface", using drawing tools such as free-form pencil, eraser, shapes, and color selector.

Data Analysis
This section describes the methods used to analyze the qualitative and quantitative data sources.

Workshops.
To analyze the designed mock-ups that were created as part of the design-thinking workshops, we leveraged two techniques.First, we followed Tohidi et al.'s "quick and dirty" [87] method of analysis interface designs, in which we laid out all sketches on a large table, and further re-arranged and grouped designs based on common patterns.Second, we leveraged the fnal step of the workshops -in which pairs of faculty and residents elaborated on their designs -to draw out underlying reasoning behind the incorporated feedback mechanisms.

Survey.
We examined the survey responses according to quantitative and qualitative methods, specifcally by running statistical analyses of the Likert-scale questionnaire, creating heatmaps of the annotated regions, and mapping similarity and diferences of features across the sketches.The Likert-scale portion of the survey was analyzed using Friedman test [76] -appropriate for ordinal and within-subject data -across the six interface mock-ups per usability and learnability question, followed by pairwise Wilcoxon test [93].Annotations of liked and disliked regions were flled and overlayed across all responses to create an aggregated depiction of liked and disliked components of each interface mock-up, with green displaying majority liked, red indicating majority disliked, and yellow shades pointing to neutral regions.Meaning, the darker the green, the more positive the evaluation, while the darker the red, the more negative the overall assessment.To granularly examine the fnal sketches, we identifed what features they shared with the original six interface mock-ups which aimed to serve as a metric for functionalities that the physicians found most helpful in their learning of contouring skills (and hence, included in their respective sketches).

RESULTS
This section presents results about contouring feedback mechanisms and the overall apprenticeship-based residency training, generated from interviews, the designed mock-up interfaces by the faculty and residents (Figure 3), and refective-style survey responses.This paper refers to the participants as F1 -F4 for faculty, and R1 -R6 for residents as described in Table 2.

Three main methods of feedback exchange in residency programs
The faculty and resident interviews described three main training strategies as part of the apprenticeship-based model of residency programs, and further unveiled the associated benefts and challenges: 1) assigning clinical cases to residents and later providing contour solutions with additional text-based feedback, 2) contouring sessions where faculty contour and residents watch, and 3) ad-hoc support from senior residents.Most commonly, the faculty explained that they assigned their own clinical cases to residents, and after residents completed these tasks, the faculty re-contoured the same cases as new structures and sent them along as a source of feedback.F3 explained the benefts of having a visual comparison of both contours for residents: "they get feedback in terms of looking at what I did versus what they did.[...] I think just over time you sort of develop a skill for looking at these diferences and doing the proper windowing" (F3).He also later emailed his residents to explain the diferences, but only if he "did any major changes" (F3).While F4 provided similar visual and textual feedback, he emphasized the importance of targeted explanations that reference specifc regions in the body: "I give specifc feedback and since I'm giving them the new structure, even if we're not in person, they can see it.I will say, for example, I deleted the most inferior slice.I don't think that the tumor goes that far.I think that's a vessel." (F4) The other two methods of feedback exchange facilitated synchronous faculty-resident and resident-resident interactions.One method involved the resident watching their faculty contour an entire case in a 1-on-1 setting and talk through their strategies, aligned with the modeling principle of cognitive apprenticeship (Table 1).Refecting back on her residency, F2 found this process time-consuming, tiresome, and only marginally benefcial: "As a resident, it's a very tedious and painful process to sit there with your attending and watch them as they adjust pixel by pixel what they want covered and what they don't want covered.And whether or not it's clinically signifcant, it is up for debate.I used to have an attending that would make me sit with him at the  Lastly, the faculty mentioned that new residents can seek ad-hoc help from more experienced residents whom were more readily available.F1, a new faculty, recalled his early experience as a resident and pointed out the benefts of receiving targeted help (in a back and forth exchange) from the more experienced residents: "The residents all sit in one room.So there's usually two to six residents in the room at any given time.Mostly early on, but less later on, I would grab more senior residents, scroll through images and maybe ask them to help me through one axial variation.Because usually if you're doing one, then it's going to be somewhat similar, meaning once you fgure it out for one plane, you can follow it down" (F1).
R3 also suggested that early assistance can enhance contouring efciency, especially for new residents: during the contouring phase of the interview, R3 struggled to locate the tumor, and later (in the think-aloud phase) mentioned that "it's so much easier if you could just ask someone, because I spent too much time trying to fnd the tumor that might take anyone else like a minute" (R3).

Residents favored the visual and descriptive faculty feedback mock-ups
The frst faculty pair envisioned an ideal contouring feedback interface that aggregates contours (on a single case) and visually maps segments according to the percentage of contours that encapsulated particular regions.As shown in Figure 3a, blue regions represent 20-40% of contours, while the red regions fall within 80-100% of contours.F4 noted the important role of feedback diversity in contouring education: "it would help residents realize how much variation there is, especially since they only get to work with a handful of attendings" (F4).The left panel provides further adjustments to the visual representation: diferent types of interface users (e.g., board-certifed users, and second-year residents) can contribute to the distribution map, while contours from specifc individuals can overlay the image.
The second faculty pair produced two components in their interface.Figure 3b-left displays a visual comparison between the user contour and the consensus expert contour which highlights the clinical signifcance of under-and over-contoured areas: exclusion or inclusion of red regions are more problematic than yellow areas.Figure 3b-right provides a text-based description of regions of confict and their potential long-term impact.It also ranks the user against others with similar levels of experience (i.e., PGY 2, second year residents, in this case).F3 pointed out two unique benefts with this ranking feature, mainly "drawing on the competitiveness among radiation oncologists or to give you an idea of where you are compared to the other trainees on the same level" (F3).
Resident interviews revealed that they generally favored both faculty designs, yet weighed the benefts diferently with respect to their experience level.More experienced residents identifed that the main appeal of Figure 3a was to access a diverse set of perspective on their contours, especially when they only learn from a limited number of faculty: "Typically, the way that residency is structured, you're working one on one with an attending, and so part of it is learning their tendencies, because there's not always one exact right answer.I think that this distribution map is actually a really good idea, because there are those diferent tendencies and there's not just one right answer, you can see sort of how likely people are to include other structures." (R4) Most residents strongly favored Figure 3b mainly due to the emphasis on explaining the contouring diferences visually and textually.R4 commented on the shadings for under-and over-contoured regions: "it is not all about where exactly my contours difer from my attending, but like, why does it matter?Is it an important diference or not?" (R4).Besides, R2 preferred the text explanations on the right side: "telling me anatomically, I didn't include the RP lymph nodes or I extended to another part, that is helpful" (R2).However, some residents raised doubts about the accuracy of the provided long-term impacts, such as R5 (a third-year resident) who was skeptical about the last statement of the interface: "if I just saw this, I would be a little skeptical in terms of, where did that come from, how did you decide it is 4% more long term toxicity, as opposed to 8% or 10%" (R5).Lastly, while residents generally found both designs helpful, they highlighted that each design might satisfy diferent needs.R3 -who had just started her residency -desired more descriptive feedback: 3a] would probably be more useful to someone that's a little bit more advanced in their training versus for me right now, the other one is better, because it gives more information.I just need to know, how I should have done it" (R3).

Less experienced residents designed feedback mock-ups to support contouring sessions
The three pairs of residents designed four contouring feedback interfaces (see Figures 3d -3c).The frst pair envisioned a crossdevice system that de-couples contouring and feedback.This system contains a help button on the top right corner of contouring sessions (Figure 3d-left which shows a work set-up using a large monitor).
When uncertainty arises during contouring sessions, residents can press the help button and activate feedback on a diferent device (displayed in Figure 3d-right).This feedback interface determines most similar cases from a medical image database and sorts the images based on similarity to the current case.Two sources populate this image database: cases from resident's attending and general atlases.One member of the pair later elaborated on the signifcance of highlighting cases of the user's faculty: "as a resident, you are really only trying to impress your attending" (R5).
The second pair of residents designed an interface that leverages video for asynchronously capturing more context around residents' questions and experts' answers.This system contains a database of faculty-and resident-created videos.When user faces uncertainty during contouring, they can video record their session: residents can scroll through slices, point to particular regions, and narrate their question.Experts can later go through these video questions and provide answers, either text-based or in video formats (populated under Experts' Videos on the left sidebar in Figure 3f).R2, a member of this pair, justifed the video recording feature by emphasizing the benefts of real-time feedback: "While you are contouring a case, all these questions come up, like should I make this adjustment here?should I pull it back anatomically from this structure here?You don't always remember every single question once you are going through it with your attending or you might not have enough time." (R2) The third resident pair created two feedback designs: one interface provides tools that support contouring sessions and the other design compares the learner's contour to their attending faculty visually and textually.Figure 3e presents a collection of tools (on the left sidebar) that supports residents during contouring: Stats tracks progress and provides hints, Guidelines links to relevant external resources, Similar Cases presents example prior cases, Submit sends the fnal case for review or radiation planning, and Share downloads a de-identifed GIF of the case that captures contours on multiple slices.The pair's idea of a de-identifed GIF originated from their struggles with software dependency: "this is just a way to show someone something quickly, so they wouldn't have to be in the hospital and logged into the system" (R3).The second design (Figure 3c) appears after residents submit their contours for feedback: it displays the contours of learner and faculty adjacently and provides description of the diferences.
Overall, all four resident mock-ups by large emphasized the importance of targeted and in-session support, which can difer from the interfaces designed by the experienced faculty that prioritize aggregated and post-hoc feedback.For instance, Figure 3d (which includes an atlas of similar cases to consult during contouring) and Figure 3f -that facilitates rich multimedia support for capturing and resolving confusions -aim to address contouring breakdowns, especially ones that arise during contouring sessions.The participating faculty, on the other hand, envisioned interfaces that provide feedback post-hoc, once the learner submits their contour for review.These mock-ups especially involved aggregated feedback that captures a wide range of contours, such as the holistic visual representation in Figure 3a, and expert consensus contours and overall prediction of long-term toxicity in Figure 3b.

Comparison features with similar cases and expert contours can beneft feedback interfaces
While the interviews helped contextualize the mechanisms of contouring apprenticeship and design-thinking workshops revealed concrete mechanisms to improve feedback exchange, the survey results further shed light on key components of an ideal contouring feedback interface.The survey respondents came from a highly diverse background in terms of gender, profession, years of experience, and especially geographical location.Due to interviews and design-thinking workshops indicating that diference of experience between residents and faculty can afect perception of feedback interfaces, this section considers expertise as a potential factor of analysis.Dominican Republic (1; 2%).The rest of the respondents were from Africa (6; 10%), South America (5; 8%), and Australia (1; 2%).

Demographics
Perceived Usability and Learnability -We frst built an ordinal logistic regression [34] to investigate the efect of two potential independent variables: interface (which is the focus of the Likert-type questions) and expertise, given that the prior workshops pointed to potential diferences between how expert faculty and novice residents envision features of an ideal contouring feedback interface.We turned the contouring experience feld of the survey questionnaire into three ordered categories, based on the common training model in medical schools: category 1 representing experience level of up to 5 years (i.e., the average length of residency programs), category 2 for 5-10 years of experience to represent the pre-tenured faculty, and category 3 which corresponds to tenured faculty with more than 10 years of contouring experience.Categories 1, 2, and 3 comprised 31 (46.3%), 17 (25.4%),and 19 (28.3%) respondents, respectively.Results show that while interface is a predicting factor (b=0.0984, p < 0.025), expertise does not signifcantly impact perceived usability and learnability with the p-value of 0.898.Given the signifcant efect of interface, we then examined how choice of interface impacted each usability and learnability question.As demonstrated in Figure 4, the Likert-scale questionnaire on the original six interfaces revealed that all interfaces exhibited high levels of usability and learnability.Friedman tests showed signifcant efect of interface on usability (Q1: < 0.05).Appendix A displays the pairwise Wilcoxon tests, calculated per question.The results point to interface 5containing the resource panel on the left-side as shown in Figure 3e -exhibiting highest levels of usability, given the distributions observed on the fgures as well as the signifcant pairwise diferences with the other interfaces.The same interface was also perceived highly in terms of facilitating learning of contouring skills, as displayed in Figure 4. I4 (most similar cases on a separate device, shown in Figure 3d), however, trended towards the lowest perceptions of both usability and learnability.Heatmaps of Liked and Disliked Regions -Filling and overlaying the annotated liked (green) and disliked (red) regions from all responses pointed to granular assessment of the mock-up features.As shown in Figure 5, the participants highly preferred the text-based explanation in interface 2 and 3, meanwhile the multi-device functionality of interface 4 appears to have received a more neutral reaction: one respondent justifed disliking this functionality as "separate windows [being] uncomfortable", yet positively rated responses mentioned that it is "interesting to have this [feature]".The participants also favoured aggregating and displaying contours (top left in interface 1), and found comparison with the less experienced residents marginally benefcial, as displayed in the middle part of the left panel.Yet, the respondents negatively rated the feature for displaying individual contours (shown in the bottom-left section of interface 1).
Interface Design from Scratch -In total, nine respondents completed the sketching task of the survey which incorporated many design elements from the presented six mock-ups, granularly assessing the benefts of particular functionalities in a contouring feedback interface.As displayed in Figure 6, many sketches pointed to the potential of accessing learning resources during contouring sessions and in-situ of the main contouring window, such as guidelines and expert videos in S8 and contouring pearls (i.e., information about case-specifc imaging and anatomy) in S9.S4 further developed this principle and envisioned comic-style pop-up hints that spatially reference particular regions on the contouring window and can be toggled on or of.Direct comparison with similar cases was another common theme in many of the sketches (e.g., S2, S5, and S7).
To further analyze the granular components of these sketches, we examined their common features with the original six design mockups.All nine sketches incorporated one contouring window as the central piece of the design, similar to I1, I2, I5, and I6.In addition, S1, S2, S6, S8, and S9 showcase the inclusion of the resource bar feature observed in I5.Notably, S2 specifcally includes similar cases, resonating with elements from I4, I5, and I6.S4 and S5 also highlight similarities with the on/of interactive button and select/deselect panel for contour overlays, respectively, drawing parallels with I1 and I2 features.The case description emerges as a focal point in S6, mirroring its prominence in I3, I5, and I6.The scorecard -identifed as a representative component in I2 -features in S6 and S8 as well.Lastly, S9 introduces the expert contour as a distinctive feature, heavily infuenced by concepts from I1, I2, I3, and I6.

DISCUSSION
This section frames the fndings around feedback-exchange tensions in residency programs (Sec.5.1), dual role of faculty (Sec.5.2), and cognitive apprenticeship (Sec.5.3).Specifcally, we present how faculty's feedback methods are not in alignment with learners' needs, and later discuss how this misalignment stems from training strategies and content that aim to address clinical duties in addition of teaching, as well as lack of support for learners to examine and share their cognitive processes.Sec.5.4 describes sociotechnical strategies to improve learning of highly specialized and critical

Tensions between teaching methods of faculty and learning needs of residents
The empirical fndings of this work shed light on interrelationships between faculty and residents in the apprenticeship model of residency training, and how the existing mechanisms of feedback exchange do not align with needs of residents, especially in terms of timeliness, relevance, and diversity of training methods.The asynchronous methods of training introduce signifcant delay in feedback exchange which leads to subpar in-time support and can degrade overall learning of critical and high-stakes medical tasks.As mentioned in the results section, a common feedback exchange strategy is when faculty assign their own clinical cases to residents as practice opportunities, and later re-contour the entire case and send it back, so the residents can learn by comparing their contour with the expert's.However, this method lacks accounting for confusions and questions that arise when contouring patient cases, evident by interface mock-ups that residents generated during the workshops (e.g., on-demand support features in Figure 3e).Many components of the survey results further showcase the beneft of in-time support, such as high usability and learnability scores of the interface design with the left panel support, as well as the incorporated interactive mentoring functionality that provides hints during contouring sessions (i.e., S4 in Figure 6).Seeking feedback from peer residents is another method of training, yet the ad-hoc nature of this support mechanism can minimize benefts for learners, since there might not be adequate support in place when help is needed, such as unavailability of a senior resident with experience relevant to the case at hand.
The interactions between faculty and residents are limited in supporting unique and granular learning needs.For instance, comparing contours of the entire case with the solution (provided by the faculty) might not directly address gaps of contouring knowledge and skills, since it remains up to the residents to interpret diferences as essential concepts or subjective tendencies.The faculty further shared sending notes via email, mainly to provide specifc and critical learning points.While the explanation can help clarify some confusions, the barrier to provide detailed and targeted feedback on a disjoint, text-based mediumi.e., email content that needs to map to specifc segments of particular images, in a case only accessible by separate contouring toolscan introduce additional burden for the faculty and discourage providing granular feedback.In the training method of in which faculty think out loud their processes as they place contours on images, these processes can difer from learning needs of the less-experienced residents.While the resident can contribute in this training strategy and ask for clarifcations, these questions might difer from the confusions they face when contouring themselves.As such, learning needs can remain unaddressed and only be uncovered when residents engage with contouring tasks in-depth, and when relevant feedback is provided.
Similar to many training programs for specialized healthcare procedures, trainees are matched with only a limited number of expert physicians, which can hinder diversity of feedback especially in complex tasks that involve a certain degree of subjectivity.While access to senior residents can improve variety in feedback, the participants indicated that impressing attending  faculty is a main goal of residency programs.As further unveiled via the workshop (i.e., Figure 3a) and survey (e.g., I1 heatmap in Figure 5), learners valued getting exposed to contouring tendencies and diverse perspectives.In addition, facilitating access to expert physicians (from varying backgrounds and experiences) can substantially improve equity in healthcare: prior research shows that a "quality gap" exists in cancer treatment, in which medical institutions at rural locations (with fewer volume of patients) provide substandard treatment compared to the counterpart urban providers with higher patient volume [1,50].

Content and strategies of training that blend pedagogical and clinical duties
The existing misalignment between provided methods of teaching and desired styles of learning can stem from the dual role of clinician and teacher among the attending faculty at medical institutions, as also reported in prior works [66].These expert physicians are not only expected to provide a quality educational experience to their assigned residents, but also attain a high level of clinical throughput via contouring patient cases, cases that can be particularly critical and time-sensitive.Consequently, when the availability of expert resource is limited, the clinical duties take priority over teaching.Our fndings reveal how the constraint of performing both clinical and pedagogical responsibilities specifcally manifests itself by the faculty tailoring the content and strategies of feedback exchange to also progress through clinical tasks.Evident from the training mechanisms laid out in Sec.4.1, the use of own clinical cases as educational content, while convenient, might not exactly address the learning needs of residents, given that difculty, size, and type of case might not be in alignment with the expertise level of residents.Research suggests that educational content that deviates from the medium-difculty level for learners negatively impacts learning performance [56].In addition, the feedback strategy of re-contouring the entire case (post residents' submission), while a necessary component of clinical duties, can lack the granularity and depth of feedback that residents need.Contouring watch-alongs can also help satisfy both responsibilities: the faculty can spend time completing clinical tasks, as the resident watches along and marginally benefts in the periphery.As learners elaborated, while thinking out loud about contouring decisions can be helpful, adjusting the contours pixel-by-pixel (on cases that might contain hundreds of slices) takes a long time, time that could be spent on more targeted and specialized practice content.

Moving from traditional apprenticeships towards cognitive apprenticeship
The fndings of this work revealed elements of a residency program that more closely resembles a traditional apprenticeship (a.k.a. the frst half of a cognitive apprenticeship), in which the three principles of modeling, coaching, and scafolding are moderately supported.As reported in the results, 1-on-1 contouring watch-alongs -in which experts perform contouring and externalize their internal processes -centers around the frst principle (modeling).The second step of cognitive apprenticeship model (coaching) is partly fulflled by interacting with peer residents: some participants benefted from working through a small subsection of cases, while the senior resident evaluates their thought process and provides guidance.However, this strategy can be unstructured and ad-hoc, meaning support might not always be available, or expertise of the senior resident can difer from the learner's need.The case exchange and re-contouring method, while mainly a form of experiential learning (i.e., "learning by doing") [42], shares core elements with the scafolding principle of cognitive apprenticeship, in which the faculty adjust the depth and modality of feedback according to the skill-level of the residents.
As noted in the results section, the faculty decide to either only provide the contours, or also add text-based justifcations via email exchange when the residents can beneft from the additional hints.Given the asynchronous (i.e., delay in sending feedback) and contextually-limited (i.e., textual v.s.richer video formats) nature of the feedback, this training mechanism might not adequately gauge the expertise level of learners in order to provide the relevant help.
The existing training strategies lack support for the second part of cognitive apprenticeship (Table 1), in which learners focus on developing and solidifying their cognitive processes.Given the existing curricular infrastructure, however, residents have limited opportunity to engage in articulation and refection, principles that aim to delve deeper into the cognitive processes of learners and enable comparison with experts' model.These steps require investing signifcant time and resources, an investment that might not directly contribute to clinical throughput, further highlighting the constraint of the dual role of clinician and teacher (as elaborated in Sec.5.2).Lastly, the exploration phase promotes fading, not only in problem-solving, but also in problem-setting, in which learners apply their newly learned skills to seek and tackle other problems that align with their learning goals.This can be difcult, especially in healthcare, for two reasons: frst, patient cases involve a high degree of sensitivity and privacy which can pose a barrier for access.Second, it can be challenging to gauge complexity of patient cases, and specifcally, what learning goals they cover.As such, residents might need additional guidance in selecting new contouring cases, the type of support that lacks in current training methods.For a complex, critical, and cognitively loaded task like contouring in radiation oncology, as well as clinical workfows in many other healthcare domains, it is imperative that all six principles of cognitive apprenticeship are adequately supported to yield improved learning, and consequently, quality clinical outcomes.
The designed features in the interface mock-ups can especially complement the existing training model by supporting the last three principles of cognitive apprenticeship.For instance, the video-enabled feedback exchange (interface 3f) can be an efective strategy in communicating the internal processes of residents (i.e., articulation) and facilitating refection opportunities when enabling learners to compare their cognitive model of expertise (via the proposed feature of Experts' Videos) with their own processes.The Similar Cases feature -introduced in mock-ups 3d and 3ecan also beneft exploration, in which residents can explore cases relevant to their current learning task.This approach, however, might require further scafolding to align these clinical cases with the underlying principles that residents need in improving contouring knowledge and skills.

Sociotechnical Methods of bridging dual faculty roles and facilitating cognitive apprenticeship
While fundamental solutions for the current residency programs might suggest complete decoupling of clinical and teaching roles among the expert physicians hired at academic institutions, we recognize that these changes require signifcant re-structuring of existing societal and monetary models and, as such, we ofer more attainable curricular and technological strategies to mitigate the shortcomings of healthcare training.This section presents sociotechnical solutions that can, not only address constraints of faculty's dual role (to a reasonable extent), but also support all principles of cognitive apprenticeship.Many of these approaches directly apply to other healthcare domains that incorporate similar apprenticeship models of training.
Leveraging peer resident resource to lower teaching duties and enrich learning -As discovered in the interview sessions, residents seek guidance from their more experienced peers, in a back and forth exchange where the pair can work through a subset of the case together.While especially benefcial for early residents, this method of ad-hoc and informal help-seeking involves additional overhead and uncertainty (e.g., fnding senior residents with the relevant level of expertise, when needed) and can further deter residents from pursuing these resources.
Contouring education can especially beneft by systematically leveraging the knowledge and skill-set of senior residents to train a larger number of novice residents.When structured according to case difculty and expertise level, senior residents can be valuable training resources as they can engage with learners on deeper and longer sessions, and hence, lower the teaching responsibilities of faculty.Many prior HCI works on crowdsourcing explored leveraging the wisdom of expert workers and matching their expertise to the needs of novices by providing concrete learning tasks with representative descriptions, measuring the extent of expert knowledge, and defning reasonable incentives [37,67,84].While residency programs operate at smaller scales than these systems, our fndings point to how similar principles can help streamline this process: 1) contouring cases for early residents can be defned by their attending faculty who better understand the complexity of tasks and required skills, 2) senior residents who have specialized in these particular cases can be matched for extended co-contouring sessions that engage learners in deeper cognitive processes, and 3) these expert residents can later be compensated with academic credits or monetary incentives.
Facilitating convenient capture of video snippets to share cognitive processes -As shown during the design-thinking workshops and survey, embedded video recording can help residents capture questions and uncertainties that arise during contouring sessions, and facilitate targeted post-hoc review and learning from the faculty.
Video-assisted feedback is an efective method of feedback exchange in healthcare, as it signifcantly improves clinical skills [57], and in some cases benefts learners on par with direct expert feedback [64,65].In surgery, video summaries -especially if developed via human leaning models [25] -can beneft resident training by showing alternative ways of performing gestures and enabling residents to trace their mistakes [3].Creating reusable video snippets, not only lowers the burden on the faculty time in the long-term, but also provides a necessary space for residents' review and self-refection [24].These video snippets can especially beneft residency programs, since reviewing and responding via short videos (especially if embedded in the main contouring tools) avoids adding signifcant overhead to the current workfow of the attending faculty that are responsible for many clinical and teaching tasks.
In addition, convenient video capturing system can facilitate articulation principle of a cognitive apprenticeship model, in which learners can express their cognitive processes in-depth and contextually, and further compare their problem-solving skills with experts'.It is particularly benefcial to record these processes in-session, as given the existing training strategies, residents forget many important contouring details and confusions, or there might not be sufcient amount of time during review sessions with the faculty (result presented in Sec.4.3).
Aggregating variability to capture unique tendencies and yield deliberation -The interviews and interface mock-ups (e.g., the design of 3a and the following positively rated heatmap annotation shown in Figure 5) pointed out the nuanced diferences in experts' contours, and how residents raised concerns about the lack of exposure to diferent contouring tendencies and emphasized learning from diverse styles.
Capturing and presenting other experts' unique contouring tendencies can complement residency programs that facilitate apprenticeship with only a few faculty.Despite existing guidelines (e.g., [49] and [78]), physicians can interpret images diferently [44], and hence, introduce contouring variations.As shown in the results, one main source of variation stems from clinicians' dissimilar judgements in including or excluding certain regions around the tumor.Expert disagreements appear in many clinical decision-makings, such as identifcation of abnormal spikes in brain signals [4] and eye assessment in referral diagnoses [91].Capturing and presenting contouring disagreements can further encourage deliberation and enhance learning, by especially promoting the refection principle in the cognitive apprenticeship model.Group Deliberation refers to sense-making of the collected uncertainty [29,72] by leveraging dissenting positions to generate necessary information that can be otherwise lost in consensus-reaching procedures (e.g., majority voting) [80].In-depth discussions over diferent contouring tendencies can enable more opportunities for learners to compare their internal cognitive model of expertise with the faculty and peer residents, further aligning with the cognitive apprenticeship model.
An important sociotechnical consideration of collecting variability -according to Ackerman's list of challenges that should be considered in computer supported cooperative work [2] -is critical mass, and specifcally in healthcare, scarcity of highly skilled physicians.Critical mass is the idea that a certain threshold of participants is required for the success of a social movement [59] and can afect the perceived usefulness and acceptability of sociotechnical systems [20,36,55].Attracting radiation oncologists, to contribute to a diverse collection of contours, might face challenges due to the lack of critical mass, especially given the already small number of physicians in this feld.Careful design of cooperative contouring systems that incorporate elements of the Technology Acceptance Model [16] can enhance user adoption and address critical mass: for instance, since perceived critical mass (e.g., through personal interactions) can improve system acceptability [53], feedback solutions (that leverage collection of contours) can start by advertising predominantly to major medical hubs, such as medical schools and oncology clinics.
Providing in-situ and anchored resources to enhance asynchronous faculty feedback -As noted in the interview sessions, a prominent method of feedback exchange is providing solution contours and additional text-based comments provided separately via emails.However, this method can pose learning challenges given the disconnect between the contexts of contouring (via the clinical tools) and the feedback (via email).
The disjoint set of modalities (between the contour solution and textual feedback) can hinder establishing common grounds and exacerbate interlocutors' joint communicative eforts [10,11].Prior works in facilitating visual/spatial referencing produced higher quality comments [28], lowered confusion [97], and increased satisfaction [54,100].Leveraging the unique characteristics of medical images and spatially anchoring faculty's comments to specifc image slices is especially benefcial in contouring residency, in which due to limited availability of the expert faculty with dual roles of clinician and teacher, asynchronous feedback is likely to continue as a prominent training method.An example solution appears in one of the sketches in the survey study (S4 in Figure 6), in which hints are displayed on top of medical images with arrows pointing to specifc regions.
Feedback type and presentation can be adjusted to refect the difering goals of novice and experienced residents.As discussed by Ackerman [2] this is an important consideration for increasing the feasibility of computer supported cooperative systems.Prior research demonstrated how members of organizations can have difering or (sometimes) conficting goals which can stem from diference of knowledge, meanings, and histories [38,41,82].In contouring education, while targeted and anchored feedback can especially help new residents -who might struggle on region detection and fundamental contouring procedures -experienced learners might beneft more from holistic and diverse feedback.Healthcare training tools should account for the varying goals and experience level of learners to provide efective feedback and avoid disrupting the learning workfow.

LIMITATION AND FUTURE WORK
Despite the novel insights that this work extracts and discusses, some sources of limitations exist.All 10 participants (during the interviews and workshops) were from the same medical school, and might have developed similar perceptions about contouring feedback techniques.While the survey study specifcally addressed this limitation by engaging clinicians globally, recruiting a larger and more diverse set of radiation oncologists for in-depth participation can further enhance the external validity of our fndings.Future survey studies can also instruct respondents to evaluate interfaces from a particular perspective (i.e., novice v.s.expert) given that the level of expertise might impact perception of learning tools.In addition, this paper examined faculty and residents in separate interview and workshop studies.While this allowed us to capture authentic perspectives from both stakeholders given the existing hierarchical power dynamics in residency programs [48], more interactive studies, such as synchronous tutoring simulations, text analysis of email exchanges, and contouring observations might provide deeper insight into the content and techniques of contouring education.

CONCLUSION
How is healthcare apprenticeship facilitated in order to transfer highly specialized and critical medical skills, and what are the implications of faculty's dual role of clinician and teacher in mechanisms of feedback exchange?To answer these questions, we examined the inter-dynamics between expert faculty and novice residents in the case of contouring: the high-stakes task of identifying tumours in radiotherapy treatment.Following interviews and design-thinking workshops with faculty ( = four) and residents ( = six), our results revealed tensions between the teaching content and strategies that the faculty provide, and timely, relevant, and diverse support that residents need in order to learn the skills.We describe how this tension arises from overlapping clinical and pedagogical responsibilities of the faculty, and the lack of support for capturing and sharing internal cognitive models of learners.The follow-up survey with practitioners from 31 countries ( = 67) provided diverse perspectives over efective feedback elements of training tools in healthcare.
To resolve the current obstacles, we presented practical sociotechnical solutions that can improve the existing training model in residency programs, including leveraging peer resident resources to lower teaching duties of faculty, facilitating convenient capture of video snippets to share internal cognitive processes of learning, and aggregating variability to yield group deliberation.We believe that understanding the dynamics of apprenticeship training in healthcare is key to improving the quality of training and patient outcome, and future work can especially build on the inherent organizational issues uncovered and discussed in this paper.

A PAIRWISE WILCOXON TESTS FOR LIKERT-SCALE QUESTIONS IN THE SURVEY
This section presents the pairwise Wilcoxon tests for the the four Likert-scale questions in the survey: Question 1: "I think that I would use this interface frequently." (see Table A1).
Question 2: "I found the various functions in this interface well integrated." (see Table A2) Question 3: "With this interface, I would be more interested to learn the topics." (see Table A3) Question 4: "With this interface, I would learn to identify the main and important issues of the topic." (see Table A4)

B SURVEY INSTRUCTIONS B.1 Step 1 of 3: Demographics
Please provide the following background information.This helps us contextualize your responses later in the survey.
• How old are you?
• What is your gender?
• What is your afliated industry/academic institution?
• What is your job title?
• How long have you been contouring?

B.2 Step 2 of 3 (a): Demographics
Here, you will see six contouring feedback interfaces.Each image contains a description on the right side of the image.Please familiarize yourself with these designs before moving on to the next questions.

B.3 Step 2 of 3 (b): Perceived Usability and Learnability
Please answer the following prompts by navigating the drop-down menu on each interface.
• I think that I would use this interface frequently.
• I found the various functions in this interface well integrated.
• With this interface, I would be more interested to learn the topics.
• With this interface, I would learn to identify the main and important issues of the topic.

B.4 Step 2 of 3 (c): Interface Annotations
In this section, please evaluate specifc components of the 6 interfaces above.For each design: • Use the pencil tool to specify what parts of the interface you like and dislike.You can draw around the components of your choice with the colours green (for regions that you like) and red (for regions that you dislike).
• Explain your reasoning for the liked and disliked regions underneath the images.The left column is for liked regions and the right column is for disliked regions.

B.5 Step 3 of 3: Sketching
By this fnal stage of the survey, you have seen six feedback interfaces.It is now your turn!Use the space provided to design YOUR ideal contouring feedback interface.Don't worry about creating a professional-looking design!A quick sketch/drawing that illustrates the essential elements of your interface would be sufcient.You can even choose to insert text boxes in place of complex drawing components.

Figure 1 :
Figure 1: The fve goals and nine steps of the design thinking workshops with the radiation oncology faculty and residents.The two workshops aimed to guide participants in designing "ideal contouring feedback interfaces".

3. 3 . 1
Interviews.The faculty and residents' interviews contributed to understanding feedback exchange mechanisms in the apprenticeship model of residency training.To examine the semi-structured interviews of the faculty and residents, including residents' perceptions on the mock-ups designed by the faculty, the frst author open-coded the transcribed interviews and identifed the main topics.Iterative discussions among the team merged these initial codes into preliminary, and then, fnal themes.
(a) A faculty's contouring session on a cancer case using a contouring software called MIM.F4 used a three-image set-up with diferent orientations of the same set of medical scans.(b)A resident's contouring session (using Eclipse, another contouring software) on a patient with lung cancer.R4 contoured and viewed a single-image set-up that fused two types of images.

Figure 2 :
Figure 2: Two examples of the anonymized interview sessions with a faculty (i.e., left image) and a resident (i.e., right image).Both contouring tools contain a main contouring canvas and a number of delineation tools (e.g., brush and eraser) on the side.
up to L4/L5 (a) Aggregated presentation of contours across all physicians.Multiple user categories and contours from specifc individuals can be selected (left).The contouring regions are then color-coded based on distribution of areas covered.Video based interface that enables asking questions by recording snippets of contouring sessions, and later receiving feedback via viewing expert videos and consulting similar prior cases.

Figure 3 :
Figure 3: The generated contouring feedback interfaces in all workshop sessions.

"Figure 4 :
Figure 4: Divergent charts presenting survey answers for the Likert-scale questions regarding usability (Q1 & Q2) and learnability (Q3 & Q4) of the six mock-ups.Overall diferences of distribution are signifcant, and appendices A1 -A4 present the pairwise tests.I1 -I6 refer to the six designed mockups during the workshops, shown in Figure 5 with added heatmaps collected from the same survey.

Figure 5 :
Figure 5: Heatmap annotations for liked and disliked mock-up regions.The heatmap annotations aimed to directly visualize survey takers' preferences for design elements in the six mock-ups.

Figure 6 :
Figure 6: Nine free-form sketches collected from the survey which shared many elements with the six interface mock-ups.

Table 2 :
Background details on the four faculty and six residents who participated in this study.

Satisfactory! Compared to experts' contours, your contour ... -Had 62% agreement -Did not include the retropharyngral lymph nodes -Extended too far posteriorly -Resulted in 4-7% more long-term toxicity after radiation Expert Consensus Contours You scored in the 75th percentile of PGY2 Your Contour
as the most representative country (8; 13%).The second largest population came from Asia (11; 18%) with India as the most representative country (3; 5%), and North America with the same size of population (11; 18%), including United States (10; 16%) and 58 year old, Female, FIGO IIIC1 Cervical