Brief, Just-in-Time Teaching Tips to Support Computer Science Tutors

As enrollments in computing-related programs continue to rise, computer science departments are increasingly relying on teaching assistants (TAs) to provide additional educational support to students, such as one-on-one tutoring or office hours. Tutoring is more effective with highly trained tutors, but most TAs receive little to no training in pedagogical skills. How might we provide support to TAs working with students one-on-one, especially in online settings? We propose a just-in-time intervention that shows a tutor actionable teaching tips and relevant information right before they begin an online tutoring session with a student. We conducted a crossover experiment (n = 46) where participants engaged in two tutoring roleplays for an introductory computer science programming task and found that participants demonstrated effective instructional strategies for much longer periods of time after receiving the intervention. We discuss the implications of these findings for both educators looking to support tutors and researchers seeking to build technology for tutors.


INTRODUCTION
As interest in computer science and programming continues to grow, enrollments in computing-related majors are increasing year after year [1].Many students struggle with the introductory CS curriculum [9,25], so departments often hire graduate and undergraduate students as teaching assistants (TAs) to provide additional educational support [16].Despite the critical role that TAs play in students' education, they typically receive little to no training in pedagogy [12,26].Also, the sheer number of TAs required to support large class sizes means that it is likely that many of them will be novice tutors.This is unfortunate because prior work suggests that tutors with more teaching expertise can be more effective [19].
If we could help TAs become more effective teachers, we might be able to improve the educational outcomes and experiences of introductory computer science students.However, additional pedagogical training for TAs would likely be expensive or time-consuming.Is there a way to help TAs become better tutors that is cheap and quick, yet still effective?
We propose a "just-in-time" intervention that suggests a short list of teaching tips and information immediately before teaching to help TAs employ more effective pedagogical strategies, of the type often employed by expert teachers.If such an intervention is effective, it could be scaled to support many TAs with little cost.
To explore this possibility, we consider the task of someone trying to remotely support a student debugging their code, a common task in TA-led office hours in introductory programming classes.We chose an online setting because the pandemic has necessitated the need for high-quality remote teaching, and even in-person teaching has become more commonly hybrid with TAs offering online office hours.
In the treatment condition, we showed participants a short informational screen with (1) pedagogical advice with suggestions on how to approach the tutoring session and (2) information about the student's status in the course (specifically, whether they had attended lecture).We conducted a lab experiment to study if this informational screen could significantly change a subject's pedagogical practices during a subsequent tutoring session.In our experiment, we had 46 graduate and undergraduate students engage in a tutoring roleplay with a researcher acting as the student and the participant as the tutor.
We investigate two research questions: RQ1: Does providing pedagogical advice and information about the student just before a tutoring session increase a tutor's use of that technique compared to no support?RQ2: Does providing this information lead to the tutor increasing the amount that they act on that recommendation in the next tutoring session, even without further prompting?
Overall, we found strong evidence that showing helpful information to tutors right before a tutoring session increases their use of effective teaching techniques for that session.We also found evidence that they continue to use these techniques in a subsequent session, though it is less conclusive.We discuss the implications of our findings for computer science educators and suggest promising future research directions.

RELATED WORK
CS departments rely heavily on graduate and undergraduate TAs to provide support to a large student body.However, simply scaling the number of TAs does not necessarily also scale the availability of quality teaching.In their dataset of tutoring recordings between undergraduate tutors and introductory CS students, Krause-Levy, et al. found that students often do not receive high-quality instruction [10].For example, in 17% of student-tutor interactions, the tutor does not ask the student a single question [10], despite the importance of asking questions to support student learning [28].The tutors in their sample were also fairly experienced, having previously tutored an average of over five times for the CS department.This suggests that experience alone may not be enough-there is a need to support tutors to teach more effectively.
One way to improve TAs' teaching skills is through training, but lack of training is a frequent problem among TAs, both inside and outside of computer science [12,22,26].While TA training is offered by many departments and schools, such training-when it exists-often focuses on school-wide or department-wide policies and procedures, rather than pedagogical strategies [17,26].Educators have been exploring more pedagogy-focused approaches to TA training, such as multi-day workshops [5], courses aimed at TAs [23], and online training modules [29].While recent work has aimed to create more flexible curricula for TA training [18], many of these solutions still require a large investment from the department, staff, and faculty.
Online and hybrid tutoring settings provide an opportunity to scale teaching [7,27], but they also present their own challenges [21].Prior work emphasizes the importance of training tutors to be effective online tutors [6,8], but there is little work on how to make that training most impactful.
Recent work has explored more personalized, technology-based methods for tutor training, such as GPTeach, a novel and scalable way for TAs to practice their teaching with AI-simulated students, without the time pressure of a real teaching scenario [14], and M-Powering Teachers, a system that uses natural language processing to provide automated formative feedback on teaching.[4].To our knowledge, however, there has been minimal research on just-intime support for teaching in computer science, which is the focus of this present work.

METHODOLOGY
We created and evaluated a just-in-time intervention to inform tutors about effective teaching strategies.

Procedure
We conducted an experimental study ( = 46), in which participants were asked to play the role of the tutor in two online one-on-one tutoring roleplays.Participants were university students (undergraduate and graduate) recruited from the CS department student mailing list.In total, 46 participants were recruited for the study.Each study took approximately an hour, and participants were compensated with a $25 gift card for their time.Informed consent was sought at the start of the study before the roleplay began.
In the tutoring roleplays, a researcher played the role of a student in an introductory CS class working on the classic FizzBuzz problem.Before the study began, participants were given time to review the problem description and a sample solution.For consistency, the first author played the student in both sessions, but each session featured one of two different versions of the student's code to ensure that both sessions would not go exactly the same.The order of these student code versions were randomized to avoid ordering effects, and the roleplays were video recorded for later analysis.
Additionally, immediately before each session, participants were shown either the treatment intervention, which contained teaching tips and information about the student's lecture attendance, or the control, which instead had logistical tips for online meetings.The order in which the conditions were assigned was randomized for each participant, and the researcher playing the student was blinded to the condition.
After a participant completed both roleplay sessions, we conducted a recorded semi-structured interview to ask how useful the information they read was and better understand what they were thinking throughout the tutoring sessions.
Finally, we deployed a survey at three points throughout the study (before the first session, after the first session, and after the second session) to collect self-reported measures such as interest in teaching and confidence.
3.1.1Tutoring Roleplay.By having the first author play the role of the student in both sessions of the study, we ensured a consistent experience that allowed us to isolate participants' teaching as the independent variable of interest.To make it feel more like the students were actually different, we gave them different names ("John" and "Tom"), and one student used a virtual background while the other did not.

Choice of Problem.
We chose Python, a typical intro CS programming language, and FizzBuzz as the task: "Write a program that prints the numbers from 1 to 100.But for multiples of three, print "Fizz" instead of the number and for the multiples of five, print "Buzz" .For numbers which are both multiples of three and five, print "FizzBuzz" .The number 100 is specified using a constant MAX_NUMBER." We chose FizzBuzz because it only uses basic programming concepts (for loops, if-else), which decreased the barrier to entry for participants.It is also a tricky problem for novice programmers, which made it easier for us to write realistic buggy programs.

Student Code.
Since each participant engaged in two tutoring sessions, we could not use the same exact student code for both.If we did, the participant would know exactly where the bugs were, and they would either act on that information or pretend that they did not know, which are both undesirable.We prepared two different versions of the student code and tried to make them equivalent in terms of the number and style of bugs.See Code A and Code B in Figure 1.The order in which these were used was randomized.
After collecting the data, we did not find a significant difference in session duration between Code A (mean = 808, SD = 229) and Code B (mean = 831, SD = 145),  (90.0) = −1.466, = 0.146, which suggests that the two buggy programs were fairly similar.

Survey.
Participants filled out a survey at multiple points throughout the study, which asked attitude-related questions such as, "How interested are you in teaching or tutoring?"They answered these questions before the first session, after the first session, and after the second session.After reviewing the data collected, the roleplays did not seem to affect self-reported attitudes, so we omit the survey data from our analysis.
The survey was also where we deployed our intervention (Figure 2).Before starting each session, the survey displayed either the control or treatment text for the intervention to the participant (see Section 3.2).To ensure that the researcher playing the student was blinded to condition, the survey automatically assigned participants to either the control-first or treatment-first group and displayed the appropriate intervention without any input from the researcher.
3.1.5Interview.After both tutoring sessions were complete, we conducted semi-structured interviews with the participants.Our interview questions included the following: • What information were you given before Session 1 (or 2)? • To what extent did you use that information?Did you also think about that in Session 2? • What information did you find most useful?
We recorded the interviews and used Zoom's automated transcription to generate transcripts.

Intervention
The order in which the conditions were assigned was randomized: 22 participants received the treatment first, and the remaining 24 subjects received the control first.
In the treatment condition, the participant read the following information before the tutoring session: -<NAME> should be talking most of the time to learn the most!For these tips, we selected effective teacher talk moves [15] that are easy to understand and execute and that novice tutors may not think to do [10].
In the control condition, the participant read the following information before the tutoring session: Your next student is <NAME>!<NAME> is enrolled in a introductory CS programming class and needs help on this week's assignment.

Tutoring recommendations:
-Keep your camera on so that both of you can see each other face-to-face.
-Make sure that you're unmuted so he can hear you.
-Camera and microphone controls are located at the bottom left of the Zoom interface.-Wait for <NAME> to connect to audio and turn on his camera before starting to talk.

EVALUATION
To answer the research questions posed in Section 1, we employed a mixed-methods approach to analyze the recorded tutoring sessions and interviews.We describe our evaluation methods here.

Transcript Coding
To understand to what extent tutors used effective teaching moves in the tutoring sessions, we recorded the sessions and coded the transcripts.These coded transcripts then served as the basis for our quantitative analysis.
Coding proceeded in two phases: an inductive (open) coding phase and a deductive (closed) coding phase.In the first phase, the second author went through several transcripts from the pilot studies to identify patterns in participants' teaching moves and created codes for them.They then discussed these codes with the first author to refine them.After several rounds of discussion, the two authors developed a codebook for the events of interest.
Because the pedagogical tips in our treatment intervention center around asking questions, guiding conversation, and encouraging student talk, we focused on using our codes to identify productive talk moves [15] that are encouraged by our intervention, specifically for an introductory programming task: • Prompts students to fix code on their own • Asks the student explain their code • Checks for student understanding • Asks the student to explain what their code does for a given input • Asks the student to figure out test cases • Asks student to explain the output We coded the transcripts to identify what we will call productive teaching events: sections of dialogue that correspond to the tutor exhibiting a specific talk move and the resulting student-tutor interactions from that move.
An event begins with an initial prompt by the tutor (e.g., "Could you walk me through your approach?")and can contain several follow-up interactions between the tutor and student resulting from that prompt (e.g., the student's entire explanation of their code).The event ends as soon as the student-tutor interactions no longer coincide with the event's corresponding teaching move.The end of one event will be replaced by another event-to simplify analysis, we enforce that there is always a single event occurring at any given time in a tutoring session.
In the second phase of coding, the second author coded all of the transcripts using the codebook.To increase reliability, we brought in the fourth author to also code all the transcripts.To calibrate their coding process, the two coders compared their responses for two participants (or four sessions).Otherwise, they coded independently.Both coders were blinded to condition while coding to reduce bias.Our coders achieved 80.6% agreement, and we use the second authors' coded transcripts in our analysis.

Measures
To assess whether the treatment intervention impacted participants' teaching behaviors, we separated our event codes into different categories and measured the duration of productive teaching events, as well as the tutor talk time.

Total Productive Teaching Event Duration.
For each tutoring session, we sum the duration (in seconds) of all the productive teaching events within the session to measure how long the tutor and student spent in quality tutor-student interactions.We expect that our treatment intervention will lead to longer productive teaching events overall.

Proportion of Time Spent in Productive Teaching
Events.We also normalize the duration of productive teaching events by dividing over the total session length.This gives us an idea of how much of each session was spent on quality tutor-student interactions.We expect that our treatment intervention will lead to a higher proportion of time spent in productive teaching events.

Proportion of Tutor Talk Time. A growing body of evidence
shows that students learn more when given opportunities to talk and positioned as thinkers [20,24].Student talk also gives the tutor information about the student's level of understanding [2].
We measure the ratio of tutor talk time to student talk time in each session using the speaker labels provided in the automated transcription of our recordings.For example, a tutor talk time ratio of 0.55 means that the participant (tutor) spoke 55% of the time, while the researcher (student) spoke 45% of the time.We expect our treatment intervention to reduce tutor talk time.

Interview Coding
To assess how participants felt about our intervention, the second author coded the interview transcripts using a closed coding approach, specifically looking for (1) which pieces of information in the treatment intervention that participants found most useful, (2) whether any pieces of information were not helpful, and ( 3) to what extent those in the treatment-first group reported using the information from the treatment intervention during the second (control) session.

RESULTS
Our main finding is that, after reading the treatment information screen, participants significantly increased their total duration spent in productive teaching events by about 1.4 times over the duration they spent on such strategies in the control condition.Participants took an average of 28.5 seconds (SD = 14.9) to read the treatment intervention 1 .This suggests that a very brief intervention was sufficient to significantly increase participants' use of certain effective  1: The total time a participant spent in productive teaching events is averaged and displayed in seconds as "Productive Teaching (PT)."Group 1 is of particular interest for interpretation, as that group received the control first and then the information sheet treatment with pedagogical recommendations.

Productive Teaching (PT) (sec)
Proportion  2: The results of the two-way mixed effects ANOVA.The eta-squared term ( 2 ) indicates the effect size.P-values less than 0.01 are marked with double asterisks (**) and p-values less than 0.05 are marked with a single asterisk (*).
pedagogical training strategies during tutor.We now provide more details on our analysis and results.
To understand the impact of the intervention on participants' (1) total time spent in productive teaching events, (2) their proportion spent in such events, and (3) their total talk time, we conducted two-way mixed-effects ANOVAs on each of these three outcomes (Table 2).A mixed-effects ANOVA allows us to make both betweensubjects and within-subjects comparisons and accounts for the fact that each participant received both treatment and control.In Table 1, we also report the means and standard deviations of each dependent variable, grouped by configuration.

Teaching Behavior Positively Changed
Overall, we found good evidence that our information sheet shown just prior to tutoring increases the duration of productive teaching events and increases the proportion of time spent on productive teaching events.Our ANOVA model (Table 2) reports a significant main effect of the condition on both the duration and proportion of time that tutors spend engaging in productive teaching events.
Within the group that received control first and the intervention second, subjects increased the duration of productive targeted teaching events with a Cohen's  effect size of  = 0.72, a medium to large effect size.This change was statistically significant (  < 0.001, computed using post-hoc Tukey tests).We found similar impacts within subjects on their proportion of productive teaching events ( = 0.66,  = 0.014), suggesting that subjects were likely substituting some prior teaching behaviors for more effective approaches.
We saw a similar but not significant trend when we did pairwise comparisons across subjects (between those that got control first and those that got the treatment first), perhaps due to our limited sample size (duration difference, Cohen's  = 0.46,  = 0.112, post-hoc Tukey test).
We found similarly strong evidence that our treatment information sheet decreases the proportion of the session the subject talks ("Proportion Tutor Talked") relative to the "student." Our ANOVA

Counts (out of 46)
Tip about asking open-ended questions 18 Tip about checking student understanding 8 Tip about letting the student talk more 17 Info about the student not attending lecture 10 Table 3: Counts of which information participants found most useful.Note that they could mention multiple, so the total does not sum to 46. model (Table 2) reports a significant main effect of condition on the proportion of tutor talk time.We found that within the group that received control first and treatment second, subjects decreased their talk time significantly, with a large effect size (Cohen's  = −0.92, = 0.002, post-hoc Tukey test).

Understanding Effect Transfer
In RQ2, we asked if our intervention could lead to the tutor continuing to act on that information in a subsequent tutoring session.Our results here are inconclusive.Subjects who got the treatment first continued to have a similar rate of productive teaching events and proportion and talk time in their second session (which was a control condition), as shown in Table 1.However, our ANOVA did not return a significant interaction effect (between session and condition) that would suggest a sustained transfer effect.However, from our interviews, we found that 16 people (out of 22) in the treatment-first group self-reported transfer: they said that they were thinking about the information they received before first session while teaching during the second session.

Feedback from Participants
In our post-study interview, we asked participants which pieces of information from the information sheet they found most useful (Table 3).We found that "Ask open-ended questions" and "Let the student talk more" were felt to be the most useful.
While some participants expressed knowing whether the student had attended lecture helped them figure out how to approach the tutoring session, three participants explicitly called out that piece of information as unhelpful and preferred to infer the students' knowledge by interacting with them.

DISCUSSION
Our findings suggest that receiving the treatment helped tutors spend more time engaging the student with effective talk moves and gave the student more space to talk.We now discuss limitations of our study, implications for educators, and future work.

Limitations
Our controlled experiment does not account for external or contextual factors such as time constraints, tutor motivations, or teaching anxiety.It also does not account for different kinds of students.In real-world settings such as universities and online tutoring platforms, there is often pressure to finish sessions quickly [10,13], and not all students are as talkative as the one in our roleplay.Furthermore, students do not always appreciate the teaching techniques that are best for their learning [11], which can make it difficult for tutors to employ effective teaching moves in practice.
Lastly, our experiment only tested the effects of our treatment intervention in the one or two sessions immediately after exposure.It remains to be seen whether tutors continue to incorporate what they learned in future interactions with students, and whether further "doses" of this intervention would be necessary for retention.Also, because our study did not involve real students, we cannot establish whether increasing use of effective teaching moves leads to improved downstream student outcomes.

Implications for Educators
In our work, we demonstrate that a quick, just-in-time intervention deployed right before a online tutoring session can significantly increase a tutor's use of effective teaching moves.Although our particular deployment was not at scale, it was designed to be scalable and short.We believe that this kind of tutoring support can be useful for practicing educators.As computer science programs enrollments continue to grow [1], the need for skilled TAs also grows.While our intervention was built for an online tutoring setting, we expect that providing lightweight teaching tips for TAs when they need it will improve their teaching and support their growth as teachers.For example, an instructor could send out an email to remind TAs about teaching moves that they could be practicing.

Future Work
There are many interesting directions for future research.One key question is whether a just-in-time pedagogical tip will improve a teaching assistant's use of the suggested strategies in standard office hours.There exists promising prior work on providing automated feedback to instructors on their use of various pedagogical strategies (including their talk time), and that this can impact their future use of those strategies in later instructional sessions [3,4].
We also suspect that just-in-time pedagogical tips might have a greater impact on tutors with less teaching experience.We explored this hypothesis during our analysis and found some evidence that it may be true, but lacked the statistical power to make a strong claim.We plan to continue investigating this in our future work.
Another interesting question is whether feedback or direct recommendations are more effective.Providing actionable information to teachers/tutors may better respect the autonomy of the instructor but may be challenging to act on for novices if it is unclear how to remedy poor instruction 2 .Another important question is to understand how the gains and performance achieved through the just-in-time suggestions compare to the pedagogical practices exhibited by expert tutors, to help contextualize the gains, and if further support may be needed.There are also many open issues around what recommendations to make, and how frequently, and in what contexts.An interesting possibility would be to combine feedback with specific suggestions, thereby creating a customized mini-curriculum to support tutor development.
Another option would be to provide actionable suggestions in real-time, during a tutoring session, such as through a large language model.During our early pilot studies, we explored both just-before and real-time intervention designs and observed that real-time support during a video call was helpful to some but made others feel judged or distracted.As a result, we chose not to explore it in this study, but it might be useful for chat-based TA support.
It also remains to be seen how this approach compares to typical TA training methods, and to what extent it could be deployed alongside existing training methods as a supplementary tool.

CONCLUSION
Extremely skilled teachers can be more effective at supporting students one-on-one, but many teaching assistants in introductory computer science classes will have limited pedagogical training.Prior work suggests that when providing direct support, many such teaching assistants are not frequently using pedagogical practices that are expected to be most effective for student learning [10].In this work, we studied whether providing just-in-time brief teaching recommendations could significantly change a participant's teaching behavior in a remote simulated programming tutoring session.This intervention, which took less than 30 seconds on average, significantly increased participant's use of effective teaching strategies, as coded from the video recordings, with an effect size of  = 0.72.While many open questions remain, including how such an intervention would translate to standard practice, such an intervention shows significant promise in supporting minimally trained teaching assistants to be more effective teachers.

Figure 1 :
Figure 1: The two versions of buggy student code used in the roleplay.The student who wrote Code A (above) has a major gap in understanding and does not know about elif, while the student who wrote Code B (below) has a number of smaller misunderstandings.

Figure 2 :
Figure 2: A screenshot of the info screen shown in the treatment intervention.This was embedded as part of the survey that participants filled out over the course of the study.