CS1 with a Side of AI: Teaching Software Verification for Secure Code in the Era of Generative AI

As AI-generated code promises to become an increasingly relied upon tool for software developers, there is a temptation to call for significant changes to early computer science curricula. A move from syntax-focused topics in CS1 toward abstraction and high-level application design seems motivated by the new large language models (LLMs) recently made available. In this position paper however, we advocate for an approach more informed by the AI itself - teaching early CS learners not only how to use the tools but also how to better understand them. Novice programmers leveraging AI-code-generation without proper understanding of syntax or logic can create "black box" code with significant security vulnerabilities. We outline methods for integrating basic AI knowledge and traditional software verification steps into CS1 along with LLMs, which will better prepare students for software development in professional settings.


INTRODUCTION
The recent rise in large language models (LLMs) and their capabilities for programming tasks promises a significant change coming for software development.Users with no programming experience can leverage AI code generation to piece together software, and experienced developers can out-source subtasks with such tools, which are already available in their development environment.
In the education setting, these tools are creating new challenges for assessing learning.Traditional assessments centered on syntax and basic programming practices are easily solved by current LLM services [1], and students and professionals alike are already leveraging them.As CS educators, we can choose to either ignore it or embrace it.Ignoring it may involve significant extra efforts, such as disabling LLMs built into IDEs or creating significant driver code for assignments in an effort to prevent students from easily obtaining the solution from an LLM.Further, these types of approaches may not fully prepare students for enterprise-level software development, where they will likely utilize these tools on a regular basis.
Embracing LLMs in CS education means either incorporating them into a lesson (e.g.intentionally teaching them by explicitly having students interact with them for credit) or by coaching prompt engineering (e.g.teaching to the skills students need to interact with the LLMs on their own).The former will be notoriously difficult as the LLM changes as it learns, which has been demonstrated over the last year.In fact, the accuracy of the same LLM service on benchmark tests for code generation had decreased by over 41% within three months of evaluation [3].Teaching students about generative AI as a new tool, how to interface with it, and to verify its output will be a more "future-proof" approach to learning programming, as well as introduce important concepts in software verification early on.
In this position paper we argue for the careful and deliberate integration of AI-driven code generation tools into CS education in an effort to both fully prepare students and to avoid overreliance on the technology.From the beginning, CS learners need to not only use AI-code generation, but also understand (1) what it can/can't do, (2) how to prompt it properly, (3) how to evaluate its response critically.The aim of this paper is to recommend specific new pedagogical approaches which are responsive to the student's programming experience, particularly at the introductory level.We offer recommendations on these 3 issues, and further discussion on the opportunities and challenges of LLMs in CS education by incorporating a better understanding of their fundamental capabilities.We further demonstrate how these topics can be put into practice via small learning activities integrated into a CS1 course schedule.

BACKGROUND & CONTEXT
Over the last year, wider access to conversational AIs and generative LLMs has increased interest and participation with the technology.Previously on a smaller scale and accessible only via APIs and programming frameworks, these models have become more publicly available in their own websites and plugins, increasing use across a variety of applications.Models such as OpenAI1 's GPT-3.5 and GPT-4, Google's Bard2 , and many others have hundreds of billions of trainable parameters and are trained on substantial amounts of data, enabling them to generate coherent content across topics.
As these services become more prevalent, AI-code generation has continued to grow, enabling powerful new tools for software developers and a host of AI-based coding assistants.A recent survey [21] of 500 U.S.-based enterprise developers demonstrated the scale of the impact of these LLMs, finding that 92% of developers surveyed were already using AI coding tools both at home and at work in mid 2023.This included tools such as Copilot3 , which are built into their development environment already.This highlights an ongoing problem in the disparity between typical CS coursework and the programming skills expected on-the-job [5].We advocate for incorporation of LLMs early in CS coursework in order to better prepare students, noting that ignoring a widely used tool may not prepare them for internships and early experiences.
Within the education community, research on LLMs for teaching and learning has been prolific over the last year, demonstrating both new opportunities and new challenges for students and teachers [1, 7-10, 14, 15, 17].In CS1 in particular, some recent research investigates the efficacy of the models for assisting with teaching tasks, including creating programming exercises [9,17] as well as incorporating LLMs as a customized tutoring system [2].LLMs are capable of generating code explanations as well, in varying depths of detail, helping novice programmers understand not only their own code but the code generated for them [9,16].
In terms of teaching with LLMs, techniques for effective interaction have focused on strategies for prompt engineering, a skill requiring some abstraction of the problem in order to direct the LLM to generate a correct response [7,8].These identify concerns for novice programmers prompting models which could fail, contain inherent biases, or return overly complex code.In this work we aim to avoid assumptions for using LLMs in CS1 -especially that prompting a model a certain way will always return a specific type of result or desirable output.As these models learn and services change over time, the use of LLMs in education will need to focus on teaching broad skills and AI understanding rather than teaching to the tools themselves.

POSITION
Our position is as follows: As AI-generated code becomes the norm for professional developers, CS education needs to shift to incorporate these new tools.However, redirecting the focus of CS1 courses to more abstract concepts using LLMs can lead to less-secure code generation and novice programmers dealing mainly with black boxes.Instead, students need to know not only how to interface with the tools but also how to critically verify the code generated for them.
We take this position with two main motivations.First, that the LLMs of today are not the LLMs of tomorrow -unlike IDEs and tools taught for programming, LLMs evolve rapidly and unpredictably as they learn.AI-based coding assistants are made available by large companies whom are not incentivized to open-source the code or training datasets for these models, and will control access to the technology, for example via subscriptions.In addition, LLMs are subject to model drift, a phenomenon where a learning models' performance on a task can decrease over time due to the difference between the data it was trained on and the data it encounters under "real world" use.This changes their performance on almost a daily basis -a recent study demonstrated both GPT-3.5 and GPT-4 changed significantly over a three month period in early 2023, showing over 30% decrease in accuracy on an example task identifying prime numbers [3].
Our second motivation is concern for the security of LLMs in programming practice.In addition to being inconsistent, LLMs can be incorrect in their data, calculations, and assumptions, and disclaimer is included on all current services hosting these models.Learning models capable of drift can also be corrupted, drifted intentionally by malicious users for the purpose of attacking performance of the model or output to other users.In the case of code generation, this can mean a model may insert malicious code into a response generated for an innocuous task [22].If taken as a black box by a novice programmer, this code (or even a well-written program with inherent vulnerabilities) poses serious security concerns in an enterprise environment [23].
For these reasons, we argue that computer science students need to learn how LLMs work at a high level, rather than simply how to interact with them.Steering introductory CS coursework toward program decomposition and abstraction is attractive to attempt to recruit and retain students to the field, enabling them to encounter higher-level concepts earlier on.However, we argue for educating CS1 students on the basics of AI (in order to better understand the underlying technology) and a "lite" version of software verification (in order to be able to understand the code that is returned by it).This does not need to drastically change CS1 as it is typically taught, and endeavors to instill secure programming habits while teaching with AI assisted software development.We advocate for small activities to be incorporated into CS1 coursework in particular, in an effort to introduce the concepts in the earliest programming courses.

NEW PEDAGOGICAL APPROACHES
LLMs are already significantly impacting programming in both professional and academic settings.However, developers leveraging AI-generated code need to fully examine it in order to not only ensure that it meets their requirements, but to also prevent it from becoming a black box with unknown vulnerabilities.For novice programmers such as CS1 students, we argue LLMs need to be a tool which is not only taught but also somewhat understood.That is, students need to know (1) what LLMs can and cannot do, (2) how to prompt LLMs properly for code generation, and (3) how to evaluate code responses.Further, security considerations around AI-generated code should be introduced early on.This constitutes a significant change in the CS1 focus, however the topics can be feasibly integrated without requiring advanced programming knowledge.We identify approaches for each of these 3 identified skills as follows, and discuss guidelines for more secure code generation with AI.

Using LLMs at the CS1 Level
While there are many tools a novice programmer will need to learn over their education, including development environments and version control systems, most of these are not tools they will have otherwise encountered outside of their coding journey.These new programmers will however have heard of (and potentially used) large language models for other tasks, from writing essays to planning their meals.
Whether or not they have experience with LLMs, students will need to know what it can and cannot do in terms of code generation.The following sections will discuss what is needed for input to the LLMs and assessing output.An advanced understanding of artificial intelligence is not necessary to interact with these services, however a basic understanding can be helpful in setting expectations.
First, new programmers need to know that the results they receive can be incorrect -LLMs are not infallible.This can be difficult for new users who are accustomed to highly accurate search tools.For programming tasks in particular, these LLMs were trained on code in order to learn all practices -they were introduced to both good and bad code.General users of the LLM services do not have access or insight to these training approaches or data, nor how well the model performs on various tasks.We must assume there will be mistakes in syntax, format, and/or logic of the generated code, and that responses to the "same" request can vary over time as the model updates [3].This is in stark contrast to how we handle most modern interactions with computers.
Next, new users should know that while there are many LLM services, each has been uniquely trained, evaluated, and made available.How it has been trained and whether or not it has real-time access to the Internet or various data sources will affect how accurate some services are in comparison to others on a specific task.While some services like Google's Bard are currently freely available, other popular LLMs and services such as OpenAI's GPT-4 and GitHub Copilot are now under a subscription.Various plugins to these services enable extended capabilities.

Teaching with LLMs.
For novice programmers who have not encountered an LLM, AI-assisted programming may appear a daunting task.For these students, an "ice breaker" lesson requiring them to interact with the AI service in a low-stakes way may be useful, such as making a recommendation for a meal based upon their dietary preferences.If there is a group project in the course, encourage the team members to provide their availability and have the LLM assist with setting up a schedule for their meetings, in keeping with project deadlines.From there, students can leverage these AI models as a customized tutor by creating a chat focused on basic CS questions, enabling them to have their early programming questions addressed 24/7.
Beyond syntax, many CS1 courses teach the fundamentals of computer science, including the concept of algorithms.The basics of problem decomposition, asking students to break a task down into a sequence of subtasks, enables them to practice computational thinking.This step-by-step identification of subtasks is important for engineering effective prompts to the AI.Additionally within this unit, students could also be introduced to the basics of LLM code generation as a tool for many of these concepts: identifying subtasks from a larger or higher-level task, and completing some of these small tasks they have already identified, such as writing a print statement.
Teaching an understanding of LLMs will help students as these services rapidly change in the coming years.Unlike search services which have not changed drastically in user interaction over the last decade, AI services will be different one year from now in their capabilities, interactions, and availability.

A Sample CS1
Semester with LLMs.To lend further credence to our proposed approach to teaching CS1 with LLMs, Table 1 outlines a sample semester schedule for the course, accompanied by four brief activities focused on LLMs for programming.These in-class mini activities can be as brief as 10 to 15 minutes, or could alternatively be self-guided online tutorials for students to complete on their own.
The four activities themselves are outlined in Table 2.The introductory and prompt engineering activities (activities 1 and 2) are as previously described, meant to introduce novice users to the basics of the LLM paradigm and integrate basic programming concepts.For example in activity 2, students may learn how to prompt the LLM to generate a basic method in Java which reflects the content of that week (weeks 7 and 8 in our sample).Activity 3 aims to introduce the verification topic for students to begin evaluating the generated code and checking for errors beyond syntax.Activity 4 aims to enable student practice with LLMs as a tool in order for them to understand when and how to use it.
The following section proposes a novel game for students, to be used as Activity 4 (Practice).To further demonstrate the efficacy of our proposed content, we continue this activity throughout the remainder of this paper, as additional details for software verification are recommended.

Sample Course Activity: Safe Bet or Monkey's Paw? (Game).
In this game, students can form pairs or groups to attempt to anticipate if an AI can generate code correctly, given a brief description.A series of predefined descriptions is provided however access to an LLM is not -students in this exercise will begin to "think like a computer." Innocuous prompts such as "write a program in Python that prints prime numbers" can be included in the list.Students should independently consider whether or not the prompt is thorough enough to be considered a "safe bet" to provide to an AI for an expected/optimal response, or a "monkey's paw" -a more literal interpretation of your request which may come with unintended consequences.In this example case, a resulting program (if run) might print prime numbers indefinitely.Students can then discuss in groups and justify their choices, before sending the prompt to an AI.At the CS1 level, learners would not be assessing the code meticulously in their early experiences with an AI code generator, and therefore this exercise can help instill some caution toward the code it provides.
Additional examples of starting prompts for this activity are outlined in Table 3.Each is labeled as either "safe" or not, providing a brief description of potential pitfalls.

Prompting LLMs for Programming at the CS1 Level
With a fundamental understanding of the technology behind LLMs, users need to then understand how to interface properly, or prompt the model.Think about prompting like having a conversation (not writing the perfect search term) -it goes on over time, not in oneshot episodes.As the conversation continues, the AI remembers and retains the context for the conversation, which helps to guide its responses.As LLMs evolve over the coming months and years, prompt engineering will change as well.Even a well-structured prompt will have different results across services today, which further motivates teaching students how best to interact with AI rather than an exact, model-specific approach.

Teaching Prompt Engineering.
For early programming students in CS1, effective prompt engineering for code generation means not only aiming for a correct response, but also a response from which they will be able to learn.An overcomplicated solution to their request will not serve to teach them the intended concepts, emphasizing the importance of communicating needs to the AI.An important reminder to students is to converse over a series of prompts, that the goal is not to perfect a single prompt.
At a high level, students engineering prompts need to provide the following basic information: • A description of the task at a high level (includes programming language, input & output, relevant variable and method names) • A description of context at a high level (use within a larger software, use of specific data structures, access to external information) • Examples of preferred results or output of the code to be generated • Details to discuss with the AI: level of commenting/ documentation, readability In the final point, users can request the AI to respond at a given level -for example, CS1 programmers may provide additional context that they are new programmers and would like the resulting code to be as simple and easy to follow as possible.
As students progress in their understanding of CS topics, so should their use of their programming tools.In courses beyond CS1, prompt engineering should be a continuous topic, building upon this recommended foundation to incorporate more advanced concepts.If specifically requested, most AI code generators can allow for control of code complexity, use specific algorithms for a task, include test cases for the evaluation of the code, handle errors, call upon specific libraries or APIs, and generally create more robust code.

Sample Course Activity: Safe Bet or Monkey's Paw? (Game).
Furthering the game outlined in section 4.1.3,students can leverage new concepts in prompt engineering to attempt to correct the previously misleading directions.Working in teams, students can recommend solutions to the problems with the previously encountered prompts.Team members can take the role of the AI and consider whether or not the prompt has been detailed thoroughly enough to consider it now a "safe bet." If problems are found, teams can converse, as they would with an LLM, to further refine details for the task.The goal is to identify the conversation around the desired solution, rather than perfecting the initial request.In this way, students learn to clarify tasks and context at a high level, occasionally providing examples of preferred behavior.

Evaluating LLM Responses at CS1 Level
When introducing an evaluation method for LLMs in CS1, one of our goals is to help build awareness of code reviews and formal methods early in CS curriculum, where they can be built upon in subsequent courses.Often formal methods are not brought in until advanced theory courses, which are often not required or built upon in a practical setting.Introducing a lite version of verification early in CS curriculum could help these tools to be used more consistently when these students enter the workforce.
There have been discussions on the importance of incorporating software verification into CS1 for decades [6,11,20].However, with the advent of code generation by LLMs the need to incorporate formal verification is even greater.Beginning students are now able to generate code for complex tasks that could contain errors, solve the wrong problem, and contain potential security risks.It is imperative to have early coders learn to scrutinize code from various sources, including their own and AI-generated code.

Teaching Introductory
Software Verification for LLMs.The method that we are recommending would be a combination of a code review and a "lite" version of software verification.We recommend using a similar approach for code reviews as outlined by

Sample Programming Problem Description
Safe Bet?* Justification In 100 lines of code, write a Java program to print the days of the week.

No
If interpreted that the program must be 100 lines of code, this will generate significant extraneous code.Write a code fragment with a loop.
No May generate an infinite loop.Write a code fragment which loops from 1 to 5 and prints the current time.
Yes N/A Write a basic program which adds two numbers together.No Ambiguous as to source of the numbers as well as the programming language (e.g. may be interpreted as "write in the Basic programming language.. ") Write a program in Java which takes in two numbers from the user and adds them together.
Yes N/A Hundhausen et al [12], where students performed code reviews on code that was submitted during a previous course assignment.Our method is similar however we will suggest fewer steps during the code review as we are also doing "lite" software verification and instead of reviewing their peer's code they will be reviewing code generated by an LLM.First, students should perform a short code review using the following steps simplified from [12].A goal of this short code review is to help students get in the habit of peer reviewing code and content that they find in sources such as LLMs and on the Internet in general.They also may be able to determine that the code generated by the LLM is insufficient for the assignment before entering the software verification steps listed below.
• Structure and design: Does the code follow the specified structure such as specified function tasks, function name, program name, etc ?• Variables and constants: Do variables have useful names?
Are constants used when appropriate?Are variables of the right scope (local or global)?• Errors: Does this code compile?Does it issue warnings?• Does the code solve the assignment as specified?
For software verification we recommended a modified version of the steps listed for proofs of program correctness provided originally by Gerhart [11], omitting the formal proofs for CS1 instruction.The goal for this is two fold: we want students to think critically about code and problem solving, as well as to incorporate software verification into the CS curriculum from the very beginning.This will assist students when they take more advanced topics in verification.They will benefit from already have exposure to using assertion statements, which will enable them to focus more on learning how to proof these assertions to verify their code.
For students to understand our combined approach, they must first learn what code looks generally like, how to read unfamiliar code, and the basics of assertion statements.When developing assertions, students must think about what needs to be true at various stages of execution for the code to be correct.A crucial step is that assertions must be based off of the initial problem statement not only the AI generated code.Otherwise the code could be logically correct however not for the problem that was asked.
The following are our suggested verification steps: (1) Attach "assertion" comments to a program where each assertion is stating the property that we need to verify in that code segment.Assertions at minimum should be at program entrance, exit, in every loop (called the loop invariant), and before every function.We recommend these assertions be represented in the form of pre and post condition comments when applicable.Students should request that the LLM generate these comments and then review/change these comments as needed.(2) (optional step for advanced students or CS2) Break the program up into sections such that all code belongs to a section and there are no nested assertions in a section.For each section ask the LLM to generate a verification condition.This condition is important to formally prove correctness and brings looks at showing if all the assertions that we looked at are true in a section then the assertion at the end of the section is also true.(3) Additional print statements to trace useful variable values throughout program execution are also recommended.(4) Review the outputs of all print statements on multiple inputs (when applicable) to assess correctness while referring to the comments and assignment specifications.

Considering Security in AI-generated Code
CS1 students and novice programmers using generative AI for code may result in black-box situation, where they do not fully understand the code generated.In an academic setting this may be permissible while learning, however it becomes a security concern in professional settings, if vulnerabilities in their software are not recognized [18,23].This problem is in parallel to general use of AI learning modelstreating these models as a black box with unknown explanations can be problematic.For applications leveraging these models, a lack of understanding how the model arrived at its response can have larger implications.
A recent study showed that when developers were provided with an AI assistant they were more apt to think that the code they produced was more secure than the code that they wrote without one [19].This reinforces our position that education on an LLMs capabilities is critical in CS curricula.This study also showed developers that had less trust in the LLM and focused more on prompt engineering had a higher likelihood of producing more secure code.This provides additional support that focusing on how to prompt LLMs and evaluating the code generated by LLMs is crucial for future developers.

CHALLENGES 5.1 Communication Outside of CS Community
General misunderstanding of LLMs outside of the AI community can lead to general confusion of the potential of the technology, either as underestimation or overreliance.Understandably, there is a significant divide in the education community in terms of embracing or banning such tools in the classroom [14].In an effort to maintain CS programs in their traditional methods, some educators aim to prevent students from using AI code generating tools.Likewise we see overreliance on these new tools in the form of grade-based educational assignments which aim to get the students to leverage the LLM in a specific way, expecting a specific or consistent output over the course of the semester or year.
Along the lines of communication, recent studies have examined the effect of tone on users perception of LLMs, finding that GPTbased chat appears authoritative in its responses [24].This can be particularly problematic for novice users who are learning a new topic or skill, and care should be taken to not take the output of the LLM on face value.

Ethics & Access
It is important to acknowledge that use of LLMs for code generation comes with the inherent risk of misuse in the academic setting and beyond.Students can easily circumvent traditional assessments, and AI detection tools for identifying use of such platforms for academic dishonesty have struggled to keep pace [4,13].In the professional setting, this question of authorship can translate to a question of ownership, inviting additional concerns as to the copyright of the code.
Of additional concerns is the potential divide in access to the technologies and/or quality thereof, as companies decide on payment tiers and availability for their services.For example in mid 2023, GPT-4 had a $20/month premium whereas GPT-3.5 is freely available however is not updated beyond September 2021, experiences frequent lag and down time, and the overall quality of results is demonstrably improved in GPT-4.These differences between two versions of the "same" LLM service demonstrates the growing disparity in toolsets students in academic settings as well.Those in underserved communities may be especially impacted, affecting their experience as introductory programmers in comparison with peers.

CONCLUSIONS
As large language models improve, AI-generated code is becoming more common for software development.Early exposure to these new tools will help ready novice programmers for professional programming, but also ensure more secure code.Taking this position, we recommend approaches to teaching early programmers in CS1 not only how to use AI code generators, but also provide some insights into their operation.We outline approaches for teaching these students (1) what LLMs can and cannot do for them in code generation tasks, (2) how to best prompt these models, and (3) how to evaluate code responses they receive.In this way, novice programmers can utilize the tools they will need throughout their careers, in a more secure and efficient manner.

Table 1 :
Sample integration of 10-15 minute LLM activities into CS1 course schedule.

Table 2 :
Outline of the four proposed in-class activities (10-15 minutes each).

Table 3 :
Example programming problems for the sample activity, "Safe Bet or Monkey's Paw?" game.[*Note that for prompts labeled "safe", there are no guarantees.] After completing the activity in 4.3.2,students can test the prompts they developed and improved throughout iterations of the game on one or more LLMs.Noting the differences between responses from each LLM will demonstrate the variability across services and types of response styles.Comparing the difference between the initial prompt/description and their engineered improved prompts will demonstrate the value in understanding LLMs as a tool for software development and how best to leverage them.