The Promise and Peril of ChatGPT in Higher Education: Opportunities, Challenges, and Design Implications

A growing number of students in higher education are using ChatGPT for various educational purposes, ranging from seeking information to writing essays. Although many universities have officially banned the use of ChatGPT because of its potential harm and unintended consequences, it is still important to uncover how students leverage ChatGPT for learning, what challenges emerge, and how we can make better use of ChatGPT in higher education. Thus, we conducted focus group workshops and a series of participatory design sessions with thirty students who have actively interacted with ChatGPT for one semester in university and with other five stakeholders (e.g., professors, AI experts). Based on these, this paper identifies real opportunities and challenges of utilizing and designing ChatGPT for higher education.


INTRODUCTION
The advances of large language models (LLMs) that leverage artificial intelligence (AI) and natural language processing (NLP) technologies is rapidly changing our daily lives.Because they produce language and dialogue similar to humans in terms of fluency and naturalness, people use and receive help from LLMs when doing various tasks, ranging from mundane (e.g., composing emails [10]) to creative (e.g., writing a novel [32,49]).LLM-driven chatbots have been used for specialized tasks also, such as calling and checking in on socially isolated people [37] as a public health intervention [90].The influence of LLMs is expanding on all fronts, including our homes, workspaces, and public institutions, such as in health and education, which are areas regarded as the most human.
Notably, the sudden advent of ChatGPT is now shaking up the ivory tower and the long-standing foundations of education (e.g., education system, philosophy, policy).People are shocked by the continuous stream of news articles and reports with headlines like "ChatGPT Passes Wharton School's MBA exam [107,108] and US Medical Licensing Exam" [20,46].Due to its potential and capability, ChatGPT attracted 100 million users just two months after launch.This sets a new record for the first time in 20 years, surpassing the advent of the internet [60].At schools, students have started leveraging ChatGPT in all types of school work, including searching for information and ideas, and writing academic reports and papers.
However, this use of ChatGPT has created unprecedented impact in academia and universities.While students are fascinated with its power and accessibility, universities are scrambling to address its unintended consequences and scalability.Students have taken advantage of ChatGPT to cheat on examinations and ghostwrite their papers, but many universities and their personnel have not developed school or class policies.This delay has ignited more severe conflicts between students and teachers and has perpetuated an inequitable environment among students who have (ethically or unethically) leveraged the tool to their advantage and those who have not.This undermines fairness, compromises ethics, and seriously violates having a right of education.Concerningly, such issues continue because ChatGPT is not easily detectable.Recently a professor at Texas A&M University falsely accused students of using ChatGPT and cheating on final essays, and the fuss went viral on Reddit [3,45].
It is difficult to grasp and solve such complex problems caused by the use of ChatGPT despite existing, extensive research into conversational agents (CA) and their practical applications [11,12,19,23,38,52].Research has investigated how to improve CAs for education by integrating them with intelligent tutoring systems (ITSs), and a countless number of CAs, often called "AI tutors, " have been developed to promote students' learning experience.However, these greatly differ from ChatGPT for two reasons.First, the CAs were developed for specific goals by researchers with great domain knowledge in education to teach particular subjects (e.g., mathematics, science, language, etc.) [23] or topics (e.g., explaining knowledge as a museum curator) [19].Thus, many of them have been recognized both in academia or practice [19,23] for proving that AI tutors in CA form can outperform human tutors through meta-analysis [28].However, ChatGPT was not purposely designed for teaching specific subjects or conveying particular knowledge, and it is actively being used in a much broader context.Second, prior AI tutors were elaborately developed by utilizing systemic learning methods (e.g., hints, questions, and feedback) and mixed approaches (e.g., rule-based or hybrid-based) to prevent system errors [23].However, ChatGPT has not been designed this way: it neither actively utilizes well-structured learning methods nor passes through humans' hands and interventions (e.g., rulebased methods).Thus, the sequential prediction model of ChatGPT, which generates content based on statistics and relationships among tremendously large data, is fundamentally different from the AI tutors that were developed for specific educational contexts via humans' elaborate and intentional designs.
Because ChatGPT was designed for a broader context of use but is being applied for the specific purpose of learning, it is creating a number of serious problems in university.Nonetheless, it is impossible to turn back the clock before ChatGPT was released or curtail students' pervasive use of it, so we need to pinpoint how and why students are using ChatGPT in higher education, what challenges and needs they have, and what facilitates its responsible use in education.In fact, whenever emerging and disruptive technologies (e.g., Google Alexa [99], virtual [36] or mixed reality [30], smart glasses [28]) enter the world or are about to be adopted for educational purposes, the human-computer interaction (HCI) community has played an indispensable role in scrutinizing how students perceive and use such technology to mitigate harmful use and increase its benefits.In line with such prior works, we aim to uncover the real use of ChatGPT for learning and its implications within higher education.
To do so, we primarily conducted focus group (FG) workshops and participatory design (PD) sessions with students who have actively used ChatGPT for educational purposes during a semester, in a university located on the East Coast of the United States.Additionally, because the use of ChatGPT also impacts other stakeholders (e.g., professors, AI experts), we invited them to our interviews and participatory workshop sessions.By synthesizing students' views and experiences with other stakeholders, the following findings were identified: • Students found ChatGPT usable (e.g., efficiency) and expressed high satisfaction with user experience (e.g., personalized learning) and opportunities of ChatGPT in terms of its scalability.• Students found it challenging to use ChatGPT for educational purposes due to algorithmic problems (e.g., hallucination, algorithmic bias), human and social problems (e.g., decreased learning and academic integrity), and other usability problems (e.g., difficult prompt engineering).• Based on participatory design sessions, we present functions to 1) address hallucination, 2) improve usability, 3) provide less biased (i.e., more diverse) perspectives, and 4) foster social interactions.
Our contributions to the HCI field are as follows.
• We discovered student-centered opportunities and challenges of using ChatGPT in higher education by providing opportunities for students to actively engage with ChatGPT in a real university setting.• We present macroscopic and practical design ideas that could mitigate the issues of ChatGPT's use in higher education by integrating stakeholders' insights into student-centered design.• We articulate a list of broader takeaways beyond design, which include sociotechnical implications for universities.

RELATED WORK 2.1 Human-AI Interaction in Education
The communities of HCI and AI in education (AIEd) have long endeavored to improve education by applying AI (e.g., educational robots [84,91], adaptive assessment [27]) and leveraging a variety of interfaces (e.g., voice, text) [55,92].For example, AI has been used to adaptively manage students' learning [61], provide teachers with real-time analysis of students [30], or even predict students' dropout rates in higher education [18,83].Among these diverse applications in AIEd, ITSs are the most prominent tool and well-established area of research [62,74,75] due to their scalability, accessibility, and personalization.Numerous successful ITSs-ANDES, AUTO-TUTOR, ATLAS, and Coach Mike, to name a few-are tailored to specific subjects (e.g., math, physics) or topics (e.g., basic algebra) and were developed to boost students' learning through various pedagogical and conversational techniques.Graesser et al. [22] listed the three major affordances ITSs always have: ITSs systematically handle students' actions (i.e., interactivity), provide information depending on the students' attributes, actions, and knowledge levels (i.e., adaptivity), and present feedback on the quality of students' performance and how to improve its quality (i.e., feedback).Some have regarded ITSs as superior to earlier forms of learning technology, such as computer-assisted instruction or computerbased training [22,80,87].While traditional tutoring systems select learning tasks (i.e., outer loop) and focus on students' answers by giving immediate error messages, ITSs capture and elicit intermediate steps and/or substeps within each task (i.e., inner loop), give guidance at each (sub)step, and further debrief by discussing students' reasoning [87].Most ITSs borrow an essential pedagogical technique from human tutoring called scaffolding, which proactively "encourages students to extend a line of reasoning" [87].Past research has shown that these strong pedagogical and conversational techniques (e.g., scaffolding and support for students' self-repair that an ITS suggests) often lead students to complete each learning task (e.g., problem-solving), whereas answer-based tutoring systems, which are unlikely to provide scaffolding, cause students to quit in the middle of learning [87].Additionally, several meta-analyses suggest that certain ITSs better assist students' learning and performance than other computer-assisted instruction (CAI) systems or a human teaching a large class [19,80,87].VenLehn [87] has argued that human tutoring is not as effective as expected; in contrast, the results of ITSs are as good as human tutoring.Later, Steenbergen-Hu and Cooper [80] focused on the effect of ITSs in higher education.After conducting a meta-analysis of 39 empirical studies and 22 types of ITSs used in university settings, they found that ITSs have a moderate, positive effect on college students' learning relative to other learning approaches (e.g., CAI and traditional classroom instruction).Furthermore, their metaanalysis showed that ITSs' effectiveness was not dependent on the ITS type, intervention conditions, or subject matter [80].
The repeated evidence that ITSs are more effective in students' learning than humans teaching large classes has left fundamental questions of whether education still needs human teachers in classrooms, and if so, what roles would partner with ITSs to maximize students' learning.Regarding this, several field studies discovered that human teachers gained more time and resources to check in and give one-on-one guidance to students who needed extra help while an ITS took the teaching role.However, several researchers discovered a unique challenge: teachers would need to monitor and manage each students' educational activities because each student engages with different levels of content personalized to their pace by the AI tutor.To solve such problems, Holstein et al. [29] created Lumilo, smart glasses that support teachers to assist students struggling learning with AI tutor by displaying real-time information about the students (i.e., their learning, meta cognition, and behavior).They further found that students learn and perform better when teachers and AI collaborate in a partnership in classroom [28].
While researchers in HCI and AIEd have long accumulated relevant knowledge of educational AI and even established a basis for human-AI partnership in the classroom [98] beyond the development of successful AI tutors, little is known about how people should approach and prepare for LLM-based chatbots that are being used in all areas of education.This is because, while ITSs under the name AI tutor are tailored to a specific educational context, LLM-based chatbots are built and used commercially for more general and broader purposes (e.g., searching).Thus, based on prior works targeted to specific educational contexts, it is difficult to fully understand how students use such LLM-based chatbots in the specific use context of academic work and what real opportunities and challenges emerge in the wild.For such reasons, recent studies [41,72,85] set out to list potential benefits and drawbacks of using ChatGPT, the most popular LLM-based chatbot at present.Kasneci et al. [41] has written a position paper that highlights how LLMs could potentially be utilized to produce educational content, personalize students' learning, and induce deeper engagement or interaction from students.Tlili et al. [85] has analyzed early-adopters' sentiments (i.e., positive and negative emotions) towards ChatGPT use in educational settings by reviewing the public discourse in social media and showed potential use scenarios by interviewing three educators.However, this line of early studies did not deeply engage with university students who actually use ChatGPT for class, so it is difficult to know how exactly students are using LLMs in the wild, what real (not potential) benefits and challenges they indeed face, and what new design features they have need of.Our user-centered approach and field study of a university class where all the students have a chance to interact with ChatGPT for educational purposes, share their experiences with peers, and further design necessary and improved features for LLMs provides deeper, practical, and more student-centric insights for LLMs in education.

Large Language Models
The recent emergence of generative AI models has catalyzed significant transformations, not just by automating routine tasks but also by making notable contributions to creative fields.These largescale models consume enormous volumes of data and increasingly surpass human performance across a range of activities [21].As an illustration of this burgeoning capability, AI recently won the first place awards in art and photography competitions [24,109].Such achievements underscore that the potential of generative AI to match or exceed human effort is already being realized, signaling the onset of significant changes in numerous domains.In this evolving generative AI landscape, ChatGPT, launched in November 2022, has significantly impacted the field of NLP.By leveraging massive text datasets and a human-involved feedback system known as 'Reinforcement Learning from Human Feedback' (RLHF) [65], Chat-GPT has elevated the capabilities of NLP to unprecedented levels.It generates text that not only closely mirrors human communication but also exhibits human-level proficiency in text-based tasks, such as information retrieval, summarization, translation, question answering, and sentiment analysis.As a result, ChatGPT has seen the most rapidly expanding consumer application ever recorded, a noteworthy milestone for any AI-based solution [31,104].
Accordingly, recent HCI research has expanded LLMs' application to tasks such as co-writing with LLMs [15,32,49] and providing health interventions for socially isolated or marginalized individuals [37,90].Research has further explored what challenges (e.g., prompt-engineering) users face when interacting with LLMs [94,101], but the permeation of LLMs in academia and how it will change all aspects of education [44,82] has not been investigated to answer 1) how exactly college students are using LLMs in the wild, 2) what opportunities and challenges students face in the specific context of education, and what unique needs they have after experiencing ChatGPT.In line with previous CHI studies [7,48,71,99] that do not specifically focus on ChatGPT but value exploration of students' unique experience with commercial AI (e.g., Alexa) and user-centered design, we explore students' real experiences and perceptions of ChatGPT in higher education and present studentcentered solutions to their challenges.

METHODS
The purpose of our study is to discover the challenges and opportunities facing undergraduate students and to mitigate them by providing student-centered design ideas.At the time of conducting this research, most universities in the United States have already released university policies completely banning ChatGPT for any type of educational purpose either in class or at home.Thus, conducting user studies with students who have used ChatGPT for educational purposes is unethical and dangerous because the recruited students would have violated university policy.For this reason, we conducted a user study at an American university based on an accredited, undergraduate-level AI class where a total of thirty students (P01-P30) were permitted to use AI-based tools (e.g., ChatGPT-3.5,DALL•E2) for legitimate educational purposes.The experiment occurred during the spring semester of 2023, which started in January-about one month after ChatGPT-3.5 was first released.User research on other updated or paid versions (e.g., ChatGPT-4 or Pro) is out of our research scope, because ChatGPT-3.5 was the latest version at the time of carrying out our study and students could not pay for other versions.
Our method naturally followed the semester-long, three-phase curriculum of the class (see 3.1).The class and research received official permission from the university and the IRB to proceed, and the authors confirmed multiple times that no sensitive content regarding the students appeared in the research.We decided not to collect or publicize any personal information of our students (e.g., age, sex, ethnicity) due to the sensitive nature of a small class and potential for accidental de-anonymization issues.We judged that using personal information was unnecessary because comparing views on ChatGPT by socioeconomic status was out of our research scope.We strictly followed the research and ethical protocols reviewed by the IRB and all of our students agreed to publicize their assignments (e.g., presentations, design works) for research purposes.All of the students actively participated in all of the curriculum and successfully finished the semester, receiving credit for the class.

Procedures
We recognized that undergraduate students often lack a deep understanding of AI and sufficient design experience with AI tools (e.g., ChatGPT-3.5 or DALL•E2).Thus, the initial goal of the class was to give students ample chances to become acquainted with both AI technologies and essential design principles relevant to the participatory design processes [42,76,100].Based on such experiences, the class objectives were to build a better understanding of AI and promote responsible use of AI.

3.1.1
Step1: Familiarizing Students with AI as Users (4 weeks).This first-phase curriculum was designed as a hands-on, guided experience, which involved a series of carefully sequenced assignments and mini-projects.
Hands-on experiences: To enhance the students' proficiency and deepen their understanding of AI, the hands-on exercises enabled students to engage with state-of-the-art text generation (i.e., ChatGPT) and image generation AI models (i.e., DALL•E2, Stable Diffusion, and Midjourney).Instructors provided comprehensive lectures on the utilization of AI (e.g., prompt engineering tips) and best practices of AI technologies (e.g., creative applications of ChatGPT, analysis of exemplary prompts shared online).Then, students could either apply the AI models to various topics of their choice or participate in pre-defined tasks aimed at bolstering their AI experiences.
For instance, early in the course, students submitted personal essays and debated the merits and drawbacks of incorporating ChatGPT into the classroom environment.Subsequently, they had the opportunity to directly observe AI's capabilities by comparing ChatGPT-generated essays with collaborative student-written essays.Notably, no student-produced comments surpassed those generated by ChatGPT.In another illustrative task, students were instructed to write a 500-word outline for a short story of their own.This assignment allowed them to explore various methods of collaborating with ChatGPT, such as using the model for ideation or for making substantive edits to their manually written outlines.Students then compared their outcomes with and without the assistance of ChatGPT, gaining valuable insights into the capabilities and limitations of AI in creative endeavors.
Research Assignments: Through targeted assignments, the instructor guided students to explore and investigate text or image outputs generated by AI models that they found intriguing or noteworthy.Students were tasked with learning the specific prompt engineering techniques that led to these generated outputs.Subsequently, they attempted to replicate the outputs based on their research.This educational approach offered students insights into the complexities and challenges associated with effective prompt engineering.It also served as a practical lesson in how to interact with AI models to produce desired outputs.These assignments contributed to a more nuanced understanding of AI capabilities, underscoring the importance of precise human-AI interaction for achieving specific goals.
Creative Competitions: To foster effective and enjoyable collaboration with AI, two distinct creative competitions [see Figures 1  and 2 for students' design work] were organized.The first, "A Mini Art Contest, " invited students to employ advanced AI image generators to produce their own unique artwork.To achieve the desired outcomes, students researched preexisting artwork for inspiration, studied prompt engineering techniques, and experimented with various AI image generators such as DALL•E2, Stable Diffusion, and Midjourney.Each student submitted one piece of AI-generated artwork, which was then peer-reviewed by classmates.The top-rated submissions were acknowledged with bonus points.
The second competition, titled "Becoming an Author Contest, " required students to create a short, illustrated story utilizing both ChatGPT for text and AI image generators for illustrations.Students had the freedom to select any genre and storyline.Together with ChatGPT, they collaboratively developed detailed characters, settings, and key events, which then generated a complete story of approximately 1,000 words in length.AI image generators were used to craft illustrations for their narratives.The final deliverable was a 10-page short story book, featuring a brief paragraph and accompanying illustration on each page.
Throughout these competitions, students were encouraged to devise and execute their own winning strategies to compete for bonus points.This setup not only bolstered their proficiency in using AI for creative tasks but also imbued them with a greater understanding of effective human-AI interaction.

3.1.2
Step 2: Acquainting Students with AI-Related Concepts and Designs (8 weeks).While hands-on experience with AI is valuable, a theoretical understanding of AI and experience in design are also essential for meaningful participatory design outcomes.The second phase of our curriculum combined 1) lectures and 2) design exercises to delve into a wide range of AI-related topics, which are summarized below.
Lectures: Lectures consisted of an array of cutting-edge topics, including but not limited to fundamentals of machine learning (ML) and deep learning (DL), ethical considerations in AI, algorithmic bias and fairness, privacy concerns in AI, explainable AI and algorithmic transparency, generative AI, the future of work in the AI landscape, and AI in organizational settings.Each topic was delivered in one to three 50-minute sessions (see A.1.1 in the appendix for more details).
Design Exercises: To address the gap in methodological knowledge and experience in designing human-AI interactions, the course incorporated specialized design exercises (i.e., designing optimal collaboration flows and methods between humans and AI throughout the creative writing process).These exercises were crafted to deepen students' understanding of how to consider HCI methods when designing AI systems.
We selected a familiar topic for these exercises: human-AI collaboration in creative writing that had been addressed in the becoming an author contest.This was to help students focus on learning and experiencing the design process.Students were divided into small groups (four to five members each) and were tasked with designing optimal collaboration flows and methods between humans and AI throughout the creative writing process, which is divided into four stages: ideation, outlining, writing, and editing.Each team came up with a very detailed process flow and roles for humans and AI to derive an optimal human-AI collaboration for creative writing.These exercises culminated in presentations and discussions where students shared their findings and insights.

3.1.3
Step 3: Participatory Design Sessions (4 weeks).In the concluding phase, the objectives were twofold: 1) to clearly identify the opportunities and challenges that ChatGPT presents within higher education settings, and 2) to propose design solutions aimed at tailoring these opportunities for educational contexts, thereby maximizing the benefits that ChatGPT can bring to higher education.
Brainstorming session: Students were tasked with brainstorming the opportunities and challenges of ChatGPT in higher education.These take-home assignments required students to reflect on their personal experiences with ChatGPT, and where possible, provide real-world examples to substantiate their points.Subsequently, in class, students were divided into small groups of four to five members to discuss, refine, and prioritize their ideas.Through a collaborative process involving discussions and voting within their groups, each team generated a prioritized list of opportunities and challenges.The class then moved to a thematic analysis stage [8], in which both the instructor and students actively participated.This involved identifying points that were consistently prioritized across different teams.Additionally, qualitative analysis was conducted to recognize points that were not frequently mentioned in the group reports but were deemed significant enough to be included.As a result, the class synthesized a unified report detailing both the opportunities and challenges of ChatGPT in higher education, drawing on the collective insights and experiences of the students and the instructor alike.
Idea sketching session: The purpose of this session was to propose various design solutions for a successful future education application of ChatGPT based on the opportunities and challenges identified during the brainstorming sessions.To do so, each group freely selected three topics (e.g., personalized learning experience, difficulties in prompt engineering, privacy concern) from the collective topic list and proposed design solutions for them.They first clearly stated the problem of each topic.For example, what characteristics of ChatGPT prevent it from providing a personalized learning experience, or what features does ChatGPT have with the potential to promote diversity, equity, and inclusion but is not utilized for such purposes?They then considered and proposed specific design ideas for each problem statement.Finally, they sketched out user experience ideas on how to apply these designs to actual development (see Figure 3 for students' sketches).Throughout these design sessions, the instructor provided real-time feedback and guidance, aiding in the creation of a range of problem statements, design concepts, and viable solutions.
Expert session: To add depth to the student-led efforts, we also consulted with five external experts (P31-P35) in the field, including AI engineers and scholars.Before reviewing the students' findings, these professionals were invited to freely share their thoughts on the challenges and opportunities ChatGPT in higher education.After that, we presented them with the students' collected insights on these topics.The experts then offered feedback, selected key design themes, and made specific suggestions for refinement.This session enriched the class's conclusions by adding valuable external perspectives, leading to a more comprehensive view of ChatGPT's potential in higher education.

Data Analysis
Through our curriculum-based activities, the first two authors obtained a variety of student-generated content including reports on opportunities and challenges of ChatGPT in higher education, individual/team reports, sketches, final presentations, and the focus group design activities and contest submissions.We used a qualitative approach [13,16] and thematic analysis [8] to examine all of the student-generated data and identify similar topics and categories.Based on an inductive and iterative process, we grouped together similar themes (e.g., opportunities and challenges) and matched them with design ideas that the students produced.We repeatedly looked through students' work (e.g., reports, presentation findings, design sketches) to connect them with their design solutions until we agreed on final themes.We then used thematic analysis and an inductive and iterative process to analyze our interviews with the experts, due to the similar nature of the qualitative studies.We did not measure intercoder reliability in line with similar qualitative and participatory design papers [25,69,99], following the justification presented in [9].

Opportunities
In this section, we present three broad themes drawn from the opportunities the students identified: usability (4.1.1),user experience (4.1.2),and scalability (4.1.3).
4.1.1Usability.Efficiency.All of the teams praised the speed of ChatGPT in answering students' questions, describing it as "immediate, " "instantaneous," and "fast and quick," and further felt that it increased their productivity and efficiency when searching and learning.Surprisingly, they naturally compared the capability of ChatGPT with that of existing search engines and even human counterparts (e.g., teachers, colleagues).One example among similar others is "ChatGPT allows for faster and more effective collaboration than with a human counterpart.The AI can offer immediate feedback to the student without delay, thus increasing the productivity of the student and the quality of the work made." Especially, FG03 explained why they felt that using ChatGPT increased their learning efficiency and productivity, saying: "My peers and I have seen ChatGPT be useful in the context of asking it 'why' questions when it comes to coding.We have found it useful in asking what an error code means, why particular code is not running, or even the basic syntax of a query in any coding language.This is particularly useful because it saves time searching through the internet for answers, or waiting to get back to class to ask a question, or even waiting for an email response from a professor." Availability and Accessibility.All of the teams liked the aspect that they can talk to ChatGPT anytime they want and used similar expressions: "ChatGPT is available 24/7, " and "ChatGPT is accessible 24/7 to students."Participants naturally compared the availability of ChatGPT with busy professors, saying: "This is beneficial to students because they may only be able to receive assistance from their professors during their professors' office hours or during class time.Instant assistance to students is a very valuable aspect of ChatGPT." Interactivity and Ease of Use.As learning is inherently an interactive process, students liked ChatGPT's conversational format, which makes it easy to use and more interactive.FG06 explained, "ChatGPT is easier than using humans for feedback because Chat-GPT can provide it immediately." Also, as a part of its interactivity, students emphasized ChatGPT's ability to answer additional and sequential questions.For instance, FG01 said, "ChatGPT has a conversational format and has an easy-to-use design, which makes it more effective than relying on the internet and teachers for questions." Similarly, FG04 described, "ChatGPT can be used as an interactive learning assistant.It is capable of responding to additional questions that students have while trying to complete homework or while studying." Simplicity and Conciseness.Many students were satisfied with ChatGPT's ability to simplify complex concepts as well as use simple words or language.Specifically, students said: "ChatGPT tends to give clearer and less convoluted answers than that of humans and some internet resources." "ChatGPT helped me understand the quadratic formula in very simple terms." "ChatGPT uses concise language and knows how to explain concepts in the simplest of terms." 4.1.2User Experience.Personalized Learning Experience.Many students felt that ChatGPT's feedback was personalized to their questions.For example, FG02 noted, "ChatGPT provides individualized feedback to students.This is beneficial because it gives objective feedback that is specific to each students' work or their questions." Also, students liked the flexible feedback that ChatGPT provides according to their preferences or interests.FG05 said: "You can expand or narrow your learning topics in any direction you want.ChatGPT has been flexible based on my experience with it because if I don't like something or I am confused about something, I can ask ChatGPT questions or give it different prompts to better understand the answer or get a better answer." Furthermore, students expressed it can enhance personalization in terms of learning styles and methods.FG02 said, "ChatGPT can teach through methods that appeal to different learners because of its design, which makes it more effective than teachers."Also, FG04 mentioned, "ChatGPT can identify students' weaknesses and strengths so that a class can be more personalized to different students." Objectiveness and Fairness.Surprisingly, many students perceived that ChatGPT is objective, fair, and consistent when treating students and delivering information to them.They criticized humans' subjective, emotional nature when compared with ChatGPT's objectivity.Especially, we found that students are very sensitive towards professors' attention or affection to certain students in class.FG03 said: "ChatGPT is fairer because it is not a human.The fact that ChatGPT is not a human means it cannot pick favorites within a classroom.[. ..]Humans will naturally pick favorites, so teachers will have favorite students and treat them better or dedicate more time to them." Also, we discovered that students value unbiased and diverse viewpoints in university-level lectures.FG05 described their education experience, saying: "ChatGPT does seem to be more objective than certain humans and people who write emotionally fueled articles on the internet.Even though ChatGPT has a bias, there are some teachers who become too emotional and subjective when teaching.Many teachers only want to teach their [own] viewpoints and will often force them onto students." Similarly, FG04 said, "This eliminates some bias that humans may provide because sometimes teachers express their different ideas in their lessons, which may confuse students.This eliminates some bias that humans may provide because teachers may not be consistent with each other in terms of their beliefs, so students in the same grade may learn different things depending on the teacher." For such reasons, many students felt that "over time, ChatGPT could become an everyday browser and a go-to for those wanting to look at an unbiased opinion, " and that "ChatGPT provides unbiased and objective guidance to students." A Sense of Privacy.We found that students are sensitive about being judged by the professors or students in class, which is similar to the situations in which professors favor certain students in class.Hence, students felt more comfortable asking questions to Chat-GPT.FG02 described, "ChatGPT can provide a sense of privacy to students who may prefer a more private learning experience.Students may be intimidated by certain teachers or feel judged.Many students are afraid of asking what others would consider dumb questions and are afraid of embarrassment.By using ChatGPT, students can ask any questions they want to the AI and don't have to worry about being judged or embarrassed." FG01 said, "Students with questions during class could ask ChatGPT for clarification.This would keep students from interrupting lectures, or reveal holes in the lecture that can be corrected by students." 4.1.3Scalability.Diversity, Equity, and Inclusion.Many students believed that ChatGPT can foster diversity, equity, and inclusion in higher education.Especially, they found ChatGPT helpful for students with disabilities, saying: "ChatGPT can be easier for schools to work with students with disabilities and provide them with more assistance than a human could.[In the future], Chat-GPT can provide text-to-speech capabilities, or viseversa, for students with hearing or vision impairments.Additionally, it can make real-time captions or sign language interpretation for lectures.It can accommodate students based on their specific disability, which can be hard for an actual teacher to do in a classroom setting." Also, students emphasized the monetary advantage of ChatGPT and its powerful impact on students who need free resources.Multiple groups praised, "ChatGPT is free, meanwhile many programs force people to pay.Having humans teach requires us to pay them, so using ChatGPT to teach saves this expense.ChatGPT could potentially be used to tutor for free as well." Extensibility.We found that LLMs such as ChatGPT can have synergy with other educational technologies as plugins.In fact, many participants anticipated endless potential and opportunity to extend LLMs with other educational tools for purposes of summarization, lucid explanation, note-taking, Q&A, etc.For example, FG05 described, "ChatGPT can be used in virtual classrooms and act as an aid in teaching.Some students struggle to learn online, but ChatGPT can be used to help further explain concepts in virtual classrooms (or summarization)." Especially, students wanted ChatGPT to be interconnected with online video learning or remote learning through videoconferencing tools (e.g., Zoom) so that they can ask it to summarize concepts, expound on or simplify the content, give examples, or take notes.FG03 complimented ChatGPT's role in making taking notes easier, saying: "When you ask ChatGPT questions and it gives you a response, that response is typed out and saved.This means that the user/student doesn't have to type up notes, because ChatGPT has already provided them.Students can then copy the information into their notes and use it to study.Students don't have to worry about losing this information because Chat-GPT requires users to make an account and saves everything.I sometimes have difficulty keeping up with everything the teacher is saying when I am taking notes, so having ChatGPT write everything out is very convenient and helps students who take longer to type."

Challenges
In this section, we present three broad themes of the challenges that the students discovered: algorithmic problems (4.2.1), human and social problems (4.2.2), and usability problems (4.2.3).

Algorithmic Problems.
Hallucination.The most critical problem of ChatGPT that our participants frequently mentioned was its hallucination, which is, "generated content that is nonsensical or unfaithful to the provided source content" [64].Many students reported puzzling experiences of ChatGPT telling them plausible but fake stories or giving them spurious sources.FG02 commented: "Whenever ChatGPT is asked to pull an article, it tends to pull an incorrect or non-active article.The problem arises of wondering where the information is coming from if the article does not exist.This is a drawback when compared to using the internet, [where] websites that a user has drawn information from can be cited." FG04 was concerned with ChatGPT's hallucinations potentially leading students to believe false information.FG03 noted, "Chat-GPT produces incorrect answers to questions.The limited data that the machine is trained on can lead to false information being provided to questions, leading to misinformation being spread and lack of understanding.Teachers will have to reteach topics that students learn incorrectly from Chat GPT." Algorithmic Bias.Many students repeatedly reported negative experiences of ChatGPT producing algorithmic bias, which is different from the hallucinations mentioned above.While hallucination may induce students to believe in false or nonexistent knowledge, algorithmic bias exposes students to biased information.Specifically, during the creative story writing session, students discovered interesting examples of algorithmic bias-that is, ChatGPT generally suggested a specific race when creating a character (e.g., if students put "the quarterback of an American football team, " Chat-GPT sets the character as a "white male"; if a character is a gangster or good at rapping, then it suggests "black male"; if the character is smart or nerdy, it sets the character as "Asian").FG06 strongly argued, "ChatGPT is prone to bias in its responses because it is trained on data that includes biases.This could be challenging because if students are not aware of these biases, they may not make the correct decisions based on the information ChatGPT provides them with.This could also perpetuate pre-existing biases in higher education." The aim of higher education is not only teaching a specific subject but also improving students' capacity for critical thinking in liberal arts, cultural studies, political viewpoints, etc.Thus, it is especially problematic that ChatGPT could reinforce prejudices and biased viewpoints.
Lack of Transparency and Interpretability.Surprisingly, we found that some students have tested how ChatGPT evaluates and gives feedback on their school assignments.However, they were disappointed with its answers or feedback due to its lack of transparency and interpretability.FG01 commented on their negative experience: "ChatGPT does not clearly show how it comes up with its answers/feedback.As a student, it is important to understand why you get certain grades and feedback on an assignment, but this is something that ChatGPT can't offer that a human instructor can.ChatGPT's lack of transparency can make it difficult for students to understand their strengths/weaknesses, making it hard for them to make improvements to their work." Other students further explained that a lack of transparency and interpretability led to distrust towards ChatGPT.Specifically, FG01 described "It is sometimes difficult to understand the decisionmaking process behind ChatGPT's responses.Transparency is very important, so not understanding this decision-making process may pose a challenge for students regarding trust and accountability." Especially, in the context of higher education, students found it essential to know why ChatGPT presented or prioritized certain sources over others.FG02 commented, "AI has limited visibility as to how it selects; this lack of clarity can lead to issues when it comes to how the model prioritizes certain references over another.For example, it could choose one statistic that better suits a point rather than the more reputable, more objective one." In addition, students judged that ChatGPT's lack of transparency and interpretability can not only invade student's privacy but also deteriorate academic integrity.FG06 explained, "Limited decipherability, the process that ChatGPT uses to present information, could be difficult for students to interpret.This also relates to situations that include accountability like academic integrity and student privacy information." Lack of Recency.Many students were unsatisfied with Chat-GPT's limited knowledge on recent events due to its heavy dependence on past training datasets.Because having access to up-to-date knowledge or keeping up with new trends is especially important in higher education, ChatGPT's lack of latency was regarded as a fatal weakness by the students.FG02 commented, "Oftentimes, ChatGPT may not be up to date and may not provide the latest course material.ChatGPT may also be unaware of unexpected events." Limited Capability.Multiple students found ChatGPT lacked understanding of the context or nuance in their dialogue.FG03 said, "ChatGPT lacks context and understanding, which causes it to struggle to answer harder questions that may require human input." FG05 said, "It struggles with humor and cannot necessarily understand nuance.For example, if you told ChatGPT a joke, it could likely tell you why the joke is funny, but not be able to produce its own unique joke." Students also perceived that "ChatGPT cannot handle multi-tasking, " and that "the tool works best under limited circumstances, only processing one task at a time." FG06 gave an example: "For example, if you repeatedly ask it to shorten some paragraphs, it will edit all of them and then hit a strange string where it condenses the paragraphs into bullet points.Similarly, if you ask the model to edit, then rewrite, then draft a new paragraph, it would likely fail one of these tasks.This is an issue with the model's functionality." We found that students were puzzled about why ChatGPT struggles when it has to explain a specific topic or complex knowledge that requires higher level of expertise.For instance, FG03 shared, "ChatGPT takes a large amount of information from different sources, so the AI system is not an expert on specific topics.It cannot answer extremely deep and complex questions asked on a specific field, mainly just the basics." Unexpectedly, many students criticized the limited number of words that ChatGPT is capable of producing or that can be processed in prompts.FG01 complained, saying "ChatGPT cannot review an overly large document and answer questions.Since the input range is limited, if a student wanted to review or edit their long essay as a whole, it would have to be reviewed in parts." Students also perceived that ChatGPT has trouble producing long but well-organized content.FG06 shared their experience: "It's great at summaries or lists, but longer, structured stories it tends to struggle with, though its productions are typically rational and grammatically correct." 4.2.2Human and Social Problems.Decreased Learning and Academic Integrity.We discovered that students are worried about a serious deterioration in deep learning, critical thinking, and creativity that the convenience of ChatGPT will bring about.Especially, many students expected that themselves to use ChatGPT to complete their assignments instead of using it to learn.P07 noted "ChatGPT is easy to use and easily completes assignments for students, which may mean that students will try to use ChatGPT more to complete their work and will stop doing their own work.I know that when I use ChatGPT to get answers, I am not being as creative or learning how to think as hard as before." Some students were concerned with overreliance on ChatGPT for learning: "For the student, there's a likelihood of developing an overreliance on the AI platform.There was a significant rise in the development and quality of the AI-produced work, but long term, this could hinder the output quality if the student isn't challenging themselves." Reduced Social Interaction.Many students were worried about the reduced social interaction that a heavy dependence on ChatGPT could bring about.They expressed concerns that reduced social interaction could deteriorate students' communication skills, mental health, and opportunities to interact with peers and instructors for group work.Surprisingly, students strongly emphasized the aspect of mental health that reduced social interaction can cause.P15 said: "Using ChatGPT instead of learning through humans will limit social interaction and may take a toll on students' mental health.For some students, class is the only social time they get during the day, and socialization is an important human need.[. ..]I know that I enjoy getting to know other people by working with them and it boosts my mood." Lack of Human Ingenuity.Many students criticized Chat-GPT's lack of emotional support as a drawback, because teachers' positive emotions often motivate students to explore and learn more.FG04 described, "ChatGPT does not include the emotions that humans provide [when working], so the students who need empathy and emotional motivation will be at a disadvantage." We found that students need more than feedback; they seek advice tailored to the academic circumstances surrounding their lives.FG06 noted, "Human instructors can give their students personalized and detailed feedback that takes their individual needs and circumstances into account; they are also able to encourage students rather than solely give them feedback." Also, students perceived that relying on ChatGPT could limit students' exposure to tacit knowledge, know-how, and wisdom by watching and learning from other people.FG01 emphasized, "Chat-GPT is unable to replace the value of human instructors.While it can provide automated answers and instant feedback, it cannot replace the wisdom and expertise of human instructors." However, students were also well aware of how much they can learn by interacting with not only teachers but also peers.FG05 explained, "ChatGPT cannot be as creative as humans in the brainstorming phase.Even if an idea seems new, it is a combination of old ideas and therefore not genuine in nature.When it comes to developing new story plot lines, humans are more creative." Privacy Concerns.We found that, while students felt they could ask ChatGPT questions without being judged, other privacy concerns emerged when they interacted with ChatGPT.Students perceived that ChatGPT could invade their privacy in two ways: by acquiring their personal information and by stealing their intellectual property.FG02 explained, "Students' data and everything they input to ChatGPT will be stored.This raises privacy concerns if the data gets into the wrong hands.This data may be vulnerable to theft, which may make some people uncomfortable to use it."Similarly, FG01 said, "ChatGPT has security concerns.It may be collecting data from users without the users knowing." 4.2.3Usability Problems.Difficult Prompt Engineering.We found that many students face difficulties in prompt engineering when interacting with ChatGPT.Although students believed that "prompt engineering is essential to achieving high-quality results, " many of them did not know where to start and what to say.This is due to the nature of ChatGPT's interface that pursues dialogue by taking open-ended questions as an input.FG02 described their experience, saying "It is common for students to be unable to formulate prompts that yield their desired results.It's crucial that students learn how to engineer their prompts explicitly so that the responses generated by the AI are accurate and sufficient for the students' intended purposes." Meanwhile, students perceived that the difficulties of prompt engineering can be attributed to ChatGPT's failure to understand context.FG04 noted, "Due to its inability to understand context, the model can give factually correct information that fails to meet a prompts' specifications effectively because of this lack of context." FG06 explained, "Sometimes it may not know exactly what you are trying to say.When inputting complex ideas, it has a hard time understanding what you want the output to be.If this were a human interaction, a student would be able to further explain themselves without thinking of a new prompt or adjusting a written prompt." Lack of Hands-On Learning.Many students perceived that ChatGPT is too restricted to text-based interaction; thus, it limits the utilization of diverse learning methods (e.g., hands-on learning) or educational materials and equipment (e.g., using diagrams on whiteboards).P22 pointed out: "ChatGPT can provide users with information in different formats, but as of right now, students cannot complete hands-on activities through ChatGPT because it is only online.This limits the learning of students who are kinesthetic learners because they need some sort of physical activity or movement to learn the best.I personally learn a lot through handson activities that require students to move around and physically work with the material." Similarly, FG01 mentioned, "ChatGPT's responses are limited to text-based interactions.This may not be sufficient for educational contexts that require more interactive and engaging learning methods."

Co-Design Results and Design Implications
In this section, we present functions to address hallucination (4.3.1),improve usability (4.3.2),provide less biased but more diverse perspectives (4.3.3), and foster social interactions (4.3.4), and include design implications at the end of each section.

Functions to Address Hallucination.
Prompt engineering -which is known as crafting specific prompts or input to guide an AI language model in generating desired responses-is one of the most important elements when interacting with ChatGPT, because it is the only means of communicating with ChatGPT.Thus, better prompt engineering has an impact on the results that ChatGPT produces.We found that some students believed better prompt engineering could prevent hallucination, but AI experts stated that better prompt engineering cannot solve the problems of hallucination.P31 explained, "Prompt engineering isn't that helpful for controlling hallucination.Although Chat-GPT looks like understanding the context plausibly and responding to questions properly, LLMs are just mere models that stochastically generate texts based on learning data.There is no such fact-checking function within it in the first place." Thus, our participants suggested three ways to mitigate hallucination: finetuning, embedding, and incorporating external fact-checking algorithms.
Several AI experts emphasized the necessity of fine-tuning and designing LLMs specialized to unique subjects or domains (e.g., math LLM or history LLM) by giving an example of Med-PaLM2, which Google developed to "align to the medical domain to more accurately and safely answer medical questions."P35 explained, "Fine-tuning based on domain-specific data is needed.We need smaller but safer algorithms in higher educational contexts.If fine-tuned and domain-specific LLMs like math or English LLMs are created, even though it's not perfect, it can solve many of the hallucination problems caused by ChatGPT for general purpose or use." Other AI experts suggested embedding, which allows the transformation and storage of text, images, and structured data into a lower-dimensional feature space in an external (vector) database, because fine-tuning is not a perfect solution to hallucination.They conceived a design for when ChatGPT receives a question as an input.ChatGPT could send the question to an external vector database to search the most similar response based on text similarity.Once ChatGPT receives the best answer from the external database, it can deliver the matched response to the end users.Regarding this, P34 explained: "Easily speaking, it's mounting [an] external search system into ChatGPT.Fine-tuning may technically increase the accuracy but cannot fundamentally solve hallucination.So, there should be other technologies that can be used with ChatGPT.Embedding is currently at the center of attention in industry.. . .In a nutshell, it's like establishing domain-specific search system external to ChatGPT, but using it by connecting it with ChatGPT." Similarly, P33 added, "Copilot developed by Microsoft uses such embedding to help code generation that is fitted to user context with high accuracy.Embedding can have a synergy with fine-tuning together." Additionally, several AI experts proposed to incorporate external fact-checking algorithms into ChatGPT.P31 explained, "As generative AI gets popular these days, problems such as fake news or AI-generated public opinion manipulation has arisen.Various algorithms are being created to fact check and solve such issues.We can try such things out in the educational context." [Design Implication] Starting from medical education [20,46], scholars have begun developing LLMs through fine-tuning that are specialized to unique subjects (e.g., math [78]) or specific domains (e.g., programming [53]).Because the education field is well segmented into specific subjects, curriculums, and domains, it might be easier to acquire and fine-tune data that is well-structured and validated.Moreover, there has been a long history and existence of specialized ITSs and AI tutors in education that one can easily extend or improve by applying LLMs.Thus, we suggest both researchers and practitioners to explore the development of specialized LLMs to mitigate hallucination.Moreover, the HCI community has long designed and studied fact-checking algorithms [4,26,35,39,89,103] to identify fake news, false information, and rumors [47,[56][57][58].Following these prior works, future research should attempt to incorporate fact-checking algorithms into LLMs and further investigate whether it could eventually reduce hallucination and help users to discern the truth.

Functions to Improve Usability.
As discussed above, many students found prompt engineering for educational purposes difficult.Especially, students reported cognitive burdens when communicating with ChatGPT for personalized learning because they were unaware of what they do not know and where to start.P09 compared his prompt-engineering experience with "learning a completely new foreign language." P34 further added: "What I found after letting students use ChatGPT during class was the gap among students is huge.While a few students make good use of ChatGPT by prompt engineering in a creative way, most of the others don't know how to do prompt engineering.We perceive designing some features that reduce students' bottleneck burdens toward prompt engineering is necessary." Similarly, P34 stated, "People expect that ChatGPT enables personalized learning, but in fact, most of the time, students do not know what they know and don't know.Personalized learning is [when] learners lead their learning in the direction they want by asking questions or through chatbots' instruction, but I'm not sure if it's possible for anybody." We found that the inherent nature of text-based interactions confused students because ChatGPT sometimes answered sequential (additional) questions with more in-depth information but at other times gave off-topic answers.
Prompt Suggestions.To mitigate students' burdens towards prompt engineering, many participants emphasized the need for prompt suggestions, which would guide students' learning.They believed that prompt suggestions could induce exploration based on personalized needs.Participants designed button-style prompt suggestions: "show different perspectives, " "explain in detail, " "move to related topics, " "help me ideate, " "show me others' thoughts on this topic," etc. FG03 further explained why such prompt suggestions are needed, "When a user experiences ChatGPT for the first time, there is very little guidance provided to help them understand prompt engineering.. . .There are a few prompt examples provided, but user interaction could be more guided."Similarly, FG02 suggested, "An automated prompt feature, similar to autocomplete on Google searches, could be added to ChatGPT.As users type in their prompts, ChatGPT could suggest how to finish the prompts; this way users would have a better idea of what they want generated." In addition to prompt suggestions, students thought that tutorials on how to better prompt engineer is needed.FG05 explained, "The tutorial would start with the user inputting key words about whatever topic, then based on those keywords ChatGPT would generate sample prompts or questions.The user could then develop their own prompts based on the structure of the examples." Learning Map and Navigation System.Both students and professors emphasized the need for summarizing and structuring a student's learning journey to display what a student has learned and will learn.This could help students better grasp their learning direction and not get lost when interacting through long text-based dialogues.P11 expressed such difficulties in a long text-based dialogue: "When I ask many additional questions, there are times when I had to scroll up the page to see what I had asked before.If ChatGPT starts being used in earnest and students begin to use it in complex and different ways, I believe such issues would be exacerbated." In this context, an instructor affirmed the need for a learning map: "It is exciting that students' learning can be personalized in any direction based on their diverse backgrounds and levels.But we generally have a common curriculum in class and there [are] specific and essential topics that are already decided and should be covered in each session.Of course, it's great to show diverse perspectives and dig deeper into curious parts, but going back to the main flow of the original curriculum after trying various searches is also important.For this, it would be helpful to have a learning map and navigation system that shows students where they are at, what topics they should go through, [and] which direction learning should go so that students do not get lost in their learning." AI-Helper Agents.Interestingly, many students designed other AI agents that play either a 1) translating role or a 2) curriculum designing role, and professors desired AI agents that manage students' engagement, prevent churning, and build rapport with the students.Regarding translation, students wanted an AI agent that understands students' intent even if they do not articulate clearly, and then, translates their intent into well-stated prompts.Also, students envisioned other AI agents that suggest an optimal curriculum based on their needs, learning environments, and academic achievements, which is another advantage of the learning path design mentioned above.For example, P13 stated, "I would like a camera that could detect my responses to see whether I keep up with the class or to grasp if I understand the content well enough, or something.If it notices that I don't know something, it can ask additional questions to pick out things that I don't know about." Other students also liked the idea of adopting cameras along with Chat-GPT, and they wanted to exercise discretion over its use.P19 said, "Students can choose whether they want to use a camera so that ChatGPT can see their reaction and understanding of the material given." Meanwhile, P34 (professor) presented a different example function saying, "It would be useful if there is a function that automatically diagnoses students' level.Rather than testing students to check their levels, students can upload their essays which they have submitted as assignments before, [and] it lets them know their levels by connecting the data to a school database." While students emphasized their individual needs, professors were concerned that ChatGPT could become a tool to carry out a task on behalf of students, without promoting students' learning.This is possible because students can be easily distracted or churn in online learning environments that fail to induce student engagement.Regarding this, our participants emphasized the roles of AI agents that elicit students' curiosity and participation by observing students' sequential interactions and providing feedbacks.Also, they felt a necessity for AI agents to moderate students by summarizing topics of their discussions, nudging silent students to talk and participate more, and giving emotional support.P26 explained, "ChatGPT can group students together based on similar interests, classes, or topics, and then can create conversation topics between the students.ChatGPT can connect students to other students, and influence interaction by providing icebreakers.This will keep social interaction between students alive." Also, a professor (P33) explained: "Whether students use ChatGPT at home or in class, if it doesn't draw active participation from students, it will only be used as a tool that does chores for students.In this sense, ChatGPT should constantly induce student engagement by giving feedback, hints, and questions.This is the method that I enjoy using to increase student participation in my physical classroom." [Design Implication] Although prompt engineering is regarded as one of ChatGPT's main usability issues for general users in broader (non-educational) contexts 1 , we found that the difficulty of use in an educational context is exacerbated.Current LLMs' distinctive utilization of free-form conversation in open-ended topics is contrary to ITSs that have been systematically designed for education after analyzing the discourse and pedagogical strategies of human tutors.Although ChatGPT provides direct information, which is certainly an important part of tutoring, it lacks the core function of "co-constructing responses in specific tasks, such as solving problems, answering challenging questions, and creating artifacts" [22].
Surprisingly, most of the strategies and functions that our participants desired to improve the usability issues of ChatGPT, such as hints, feedback, prompt suggestions, a learning map and navigation system, and AI-Helper agents, resemble the traits that ITSs have long had and are already proven to be effective.For example, prompt engineering when using ChatGPT requires students to perform everything (e.g., forming a question and task) on their own, but ITSs operate through a systemic dialogue frame [22]: (1) both tutor and student can present a task; (2) the student handles the task at first; (3) the tutor provides short feedback on the quality of the 1 We wish to note that a newly released version of Google's Makersuite [110] includes a prompt suggestion feature and tutorial for developers: once a developer writes a prompt, it shows examples of both improvements and alternatives based on the original prompt.Although this new feature is useful for developers to write and fix their prompts, it is still insufficient for educational use by students.Thus, we suggest further releases and improvements to LLMs for specialized educational contexts by incorporating education-oriented features (e.g., buttons) that give hints, summarize students' questions, or proactively ask new questions back to students.response; (4) the tutor and student improve the responses together; and (5) the tutor checks whether the student comprehends the answer correctly and follows up when needed.Especially, during (4), the ITS leads students to generate correct answers by using pumps (e.g., "what else?"), hints, and prompts after checking their list of "expectations (anticipated good answers, steps in a procedure) and misconceptions (errors or bugs)" [22].
The proven effectiveness of ITSs' traits [23,80,87] aligns with the needs of our participants (both students and professors), so we can confidently claim that ChatGPT (including other LLMs) could benefit from ITSs' systemic approach by actively utilizing dialogue and pedagogical techniques: (1) (sub)step-based interactions, (2) feedback, pumps, and prompts, and (3) co-constructing, reasoning, and scaffolding.Specifically, we anticipate that the combinations of these three major techniques could solve most of the challenges our student participants voiced (see details in 4.3.2).

Functions to Provide
Less Biased but More Diverse Perspectives.Contrary to the common speculation that 'ChatGPT may increase students' exposure to diverse opinions or perspectives', both students and professors were worried that using ChatGPT could convey uniform information or knowledge.High-quality prompt engineering is essential for students to gain diverse perspectives and knowledge, but it is still difficult for students to ask questions by precisely composing prompts.A professor described his experience in gaining uniform results from students' ChatGPT use: "Lots of people expect ChatGPT to bring diverse perspectives to students.However, when examining students' real uses of ChatGPT in my class, students find it difficult or tiresome to compose and try various prompts.So, they tend to ask common questions and it leads to gaining just general and common answers.I once gave students an assignment to use ChatGPT to discuss a certain topic, but I was really surprised that students submitted results that are so similar to one another.This means that students do not acquire diverse opinions but only uniform perspectives [by] asking similar questions." Regarding this, our participants (both students and professors) felt the need for multi-agents that foster diverse perspectives while mitigating algorithmic bias.A vast array of AI agents was envisioned: a moderator agent that collects and summarizes diverse perspectives from students, or introduces other opinions and fosters discussion; and multi-agents that show and advocate different political positions.P15 described multi-agents' roles saying, "Multiagent AIs [could] generate suggestions [from] different perspectives and let them argue, discuss, and aggregate results." In addition to the development of new multi-agents that reveal unique viewpoints, many students emphasized the agents' roles in fostering students' discussion and participation so that students can actively think and express their views.P29 noted: "Differing opinions aren't right or wrong, but discussions are a large part of learning in classes as they open up students' minds to think from more than one point of view.. . .Feature that allows students to discuss a certain topic with each other and ChatGPT will analyze the discussion and come out with an aggregated response." [Design Implications] Our findings imply that, students are unlikely to leverage the free-form and open-ended conversation style of ChatGPT to explore diverse perspectives; instead, they ask common or standardized questions that lead to uniform thinking.To avoid this outcome, we suggest both the type and pool of prompt suggestions be massively increased to lead students toward diverse perspectives, and further, a feature (e.g., a button that says 'show other views') be included to easily nudge students to explore ideas.At the same time, both researchers and practitioners should think ahead to future educational scenarios in which multiple students interact with one agent (one-to-many communication) or multiple agents (many-to-many communication).Regarding this, prior CHI studies [43,50,77,79,93] have investigated potential scenarios of interacting with multi-party based chatbots [77] as well as diverse roles of chatbots and their positive impact on familiarization among teammates [79], generating more ideas and diverse views [43,50,79], and nudging silent people to speak up [43].By applying and extending this line of work to an educational context, future researchers and practitioners could reform current LLMs to foster discussions and elicit ideas among students, beyond ChatGPT generating and sharing diverse views with students.

4.3.4
Functions to Foster Social Interactions.Many students pointed out the risk of reduced social interactions with professors and peers and its subsequent negative effect on their wellbeing.In that sense, P14 said, "Students' mental health might become worse because they are less active socially and may not see their friends or classmates as much as before." To increase social interaction among students, many participants liked the idea of incorporating the Metaverse into ChatGPT and combining it with AI agents' active social roles (see design implications in 4.3.3above) to moderate discussions, match students, etc. P26 suggested, "A solution to ChatGPT's lack of socialization could be to create a metaverse classroom or a similar online interactive space where students can interact, collaborate, edit, and create with other students using ChatGPT." Also, P33 mentioned: "Because the scenario that ChatGPT completely replaces human tutors or physical classrooms hasn't come yet, most of the ChatGPT uses would occur at individuals' homes when we're doing assignments.If we are surrounded by other friends and can see and share how others are interacting with ChatGPT in the metaverse, it is actually increasing opportunities to socialize with classmates beyond the physical classroom." [Design Implications] Our findings imply that students are concerned with social isolation and subsequent mental health issues that a heavy dependence on ChatGPT can cause if 1) ChatGPT replaces human tutors or physical classrooms, or 2) ChatGPT is used as an assistance tool outside the classroom (e.g., home).In fact, Lee et al. [51] found that it is common for students to feel loneliness when studying alone, so to mitigate loneliness and feel the illusionary presence of others, an increasing number of students play "study with me" videos (that have scenes of others studying, writing, and flipping pages) while studying alone.Some students also self-study with others by using video-conferencing tools (e.g., Zoom) to share their appearance or study materials, which creates social presence and ambience [51].In this context, the reduced opportunities for social interaction or mental health issues that our participants reported are not just a trivial matter but a reality that is already here.Thus, participants criticized that "when using ChatGPT, all the conversations are between the human user and the AI program" and emphasized "social interaction [that] facilitates social learning."We first suggest LLMs incorporate features that facilitate social interaction, such as expanding the current form of ChatGPT (which is often called a one-on-one based LLM or dyadic chatbot) to support one-to-many communication.This would not only increase students' chances of social interaction with peers but also promote social learning: that is, students can learn tacit knowledge by simply observing how other peers prompt-engineer or leverage ChatGPT for learning.
However, this upgraded, social form of ChatGPT may still lack social presence due to the nature of its text-based interface.Thus, we call for bolder industrial trials that combine the strength of LLMs with virtual reality (VR) in a bigger technology ecosystem.We also suggest that instructors provide socialization opportunities for students by utilizing current computer-based VR technology or platforms, rather than leaving students to use ChatGPT alone at home.Among recent CHI studies [36,68,95,96] that demonstrate the strong effects of social presence of working or learning in the metaverse, Jin et al. [36] argued that although university students wanted VR adoption when learning to increase social presence, utilizing VR in higher education is still challenging due to the inaccessibility caused by its high price.However, prior works have shown that 2D or 3D-based metaverse platforms that run on personal computers (PC) still do increase social presence.Thus, professors' efforts to introduce PC-based metaverse platforms would be a valuable way to provide social learning environments where students can explore LLMs together.

Towards the Complementarity of ITSs and LLMs
Our study provides valuable design insights for blending ITSs and ChatGPT for educational purposes; that is, the differing strengths of ITSs and ChatGPT can supplement each other's weaknesses.For example, while the systemic approach (see 2.1) that ITSs leverage is proven to be effective for learning, ITSs have four main weaknesses (choice, nonlinear access, linked representations, and open-ended learner inputs) [22].Intelligent tutoring systems do not always allow students to decide what to learn (i.e., self-regulated learning) or empower students to choose learning activities that deviate from the rigidly structured orders and scripts.Additionally, ITSs do not offer quick links between representations that stress differing viewpoints, pedagogical techniques, and media and do not enable students to present sudden opinions or questions through natural language or free-form/open-ended communication.Such weaknesses lead to ITSs' fundamental problem: they cannot handle or support unexpected questions, tasks, or topics that students suddenly raise unless such spontaneous content was already anticipated and prepared.For such reasons, Graesser et al. [22] has argued that learning with an ITS is somewhat instruction-or tutor-centered rather than student-centered.
Our findings demonstrate that ChatGPT has potential to solve some of the aforementioned issues of ITSs.For example, our student participants could decide what tasks and topics to learn, request diverse viewpoints, and ask any questions at any time in free-form open-ended communication.However, ChatGPT gave students too much freedom, which reversely placed students with highlevel cognitive burdens in situations such as prompt-engineering, learning without any learning map or (sub)steps, etc.Thus, a first step toward designing LLMs for educational purposes is to revisit, leverage, and actively revalidate the dialogue and pedagogical techniques that have been established in the ITS literature and to blend the strengths of LLMs with these to reduce the weak points of each.Concurrently, researchers and practitioners can pioneer new research to explore whether LLMs can integrate and provide various pedagogical techniques beyond a text-based interface by utilizing different interactions, modalities (e.g., student drawings), and media.By doing so, we envision more advanced but responsible education technology that benefits students, instructors, and society.

Emerging Roles of Human Tutors
Multiple meta-analyses have already suggested that students learn better with AI tutors than with human tutors [19] and that students are enthusiastic about ChatGPT's great availability and capability.Given this, do we truly need human tutors?What are the genuine roles of human tutors in circumstances where ChatGPT can assist students, such as editing their essays [15,32,44,49], fixing their computer codes [14], and searching and retrieving information [63]?Our findings answer these questions by emphasizing the inevitable roles of human tutors in three ways: 1) human tutors' emotional support is valuable for motivating, complimenting, and guiding students; 2) students value social interaction and social learning with peers to gain tacit knowledge or know-how; and 3) some students need physical and hands-on activities to learn by doing.Thus, providing an environment and opportunities that foster these three human roles will become indispensable for human tutors.
Although many students criticized ChatGPT's lack of emotional support, no one suggested that ChatGPT should give emotional support.Instead, they were concerned with possibilities of reduced social interaction with their professors that comes from interacting too much with ChatGPT.Such findings are in line with the previous works of Holstein et al. [28,30], which discovered the least preferred feature of AI tutors among both teachers and students was providing automated emotional support to frustrated students.Also, in another field study of an AI-enhanced classroom, the authors observed a human teacher approach a student who struggled learning with the AI tutor and found out that the student's struggle did not result from interaction with AI but a break up with his significant other [28].The current form of ChatGPT, which affords dyadic interaction between a student and a machine, could easily reinforce social isolation, so human tutors' role in checking on students' academic status, guiding careers, and motivating them will be essential.
In addition, our findings showed that students rediscovered the importance of both social (i.e., social presence, learning, and interaction with peers) and physical (i.e., hands-on learning) aspects in learning.To meet their needs, human instructors can play a critical role in facilitating social presence, learning, and interaction both offline and online.For offline interactions, human instructors should provide more opportunities in the classroom where students learn how to present their ideas and explore diverse hands-on activities.This can address students' concerns about deteriorated human skills (e.g., speaking, socializing, etc.) or reduced chances for hands-on learning caused from overuse of ChatGPT.For online interactions, we suggest human instructors utilize existing virtual platforms to arrange a space (e.g., PC-based Metaverse or VR [36]) where students can interact with peers while feeling stronger social presence.Otherwise, they would be learning at home in isolation.Studies have shown that the presence of others can affect not only the productivity of students in learning [51] but also the effectiveness of social interaction [40].This is because, according to social learning theory, students learn more by simply observing and mimicking others' behavior [5].Thus, increasing and augmenting meeting places for students beyond the physical classroom will become an important role of human instructors.

Emerging Goals (i.e., Skillsets) in Higher Education
AI is profoundly transforming our work environments [33,34,66,67,102], reshaping both the nature of work and the skills required.
According to a report by the World Economic Forum, AI is expected to eliminate 85 million jobs globally by 2025 while concurrently creating 97 million new ones [97].This shift suggests that the skills and competencies needed in the age of AI will differ significantly from those currently in demand.Given this shift, there is growing concern about whether institutions of higher education are adequately preparing students for an AI-driven future.Available data indicates that these institutions are not meeting the burgeoning demand for AI expertise.For example, while 80% of senior IT leaders express a need for employees to be proficient in generative AI, 63% of managers report that their organizations lack sufficient staff with expertise in AI and machine learning [70].Despite this evident "talent gap, " approximately 69% of recent college graduates fear that AI could make their jobs obsolete or irrelevant in the near future.
The rise of generative AI technologies like ChatGPT has accelerated these transformations, rendering AI applications more prevalent.Bloomberg recently reported that prompt engineers at ChatGPT are earning annual salaries ranging from $175,000 to $300,000 [73].Furthermore, AI software could boost the productivity of the average knowledge worker by nearly 2.4 times, widening the gap between those proficient in AI and those who are not [81].Faced with this rapid evolution, universities find themselves challenged to provide specific guidelines or policies, often resorting to either vague discouragements against using platforms like Chat-GPT or issuing generic reminders about the importance of academic integrity.
Pinpointing the exact skill sets required in the AI era is challenging, but there is broad consensus among experts and scholars that critical thinking and AI literacy are key competencies [54].Critical thinking is indispensable for human ingenuity [2,17,54,59,86,105]; it equips individuals to question, analyze, and interpret the rapidly evolving sociotechnical landscape, where complex information, emerging technologies, social systems, and diverse stakeholders are intricately interconnected.AI literacy is crucial for effective collaboration and understanding of AI systems.In addition to these core competencies, some scholars emphasize the importance of human-oriented skills like empathy, communication, and creativity-especially in ambiguous and poorly-defined settings where AI falls short [106].
The rapid expansion of human knowledge and technological capabilities presents a paradox: while there is increasingly more to learn, the time available for formal education remains limited.Although the concept of lifelong learning in gaining momentum, the traditional university system may still necessitate substantial curriculum adjustments, such as designing programs to have a more selective focus of subject matter.Regarding research imperatives, the advent of generative AI technologies like ChatGPT underscores the urgency of this subject, demanding extensive future studies to better understand and navigate this evolving landscape.

Sociotechnical Policy Design
Concerns about compromised learning outcomes and academic integrity have intensified with the emergence of ChatGPT in higher education.Given ChatGPT's advanced text-generating capabilities and its capacity to aggregate and present vast amounts of information effortlessly, the tool presents a highly tempting shortcut for students otherwise engaged in laborious and intricate learning processes.In response to this urgent challenge, some have endeavored to develop algorithms that can accurately detect cheating.A wave of such technological solutions has been released in recent months, gaining widespread adoption in academic settings.
However, these techno-centric approaches are increasingly being scrutinized for their limitations and unintended consequences.Technological challenges in accurately detecting AI-generated text persist and are escalating with rapid advancements in LLMs [6,45,111].For example, even leading detection solutions have been found to produce false positives, flagging innocent students, and fail to detect text generated by newer AI models like GPT-4 [112].Moreover, the psychological strain inflicted on wrongly accused students remains a concern [88].Therefore, while technological solutions may mitigate threats to academic integrity, they should not be the sole focus in addressing the multi-faceted issue.To develop more holistic and effective solutions, it is crucial to adopt a sociotechnical perspective that considers human psychology, organizational culture, and social norms [1].
The advent of transformative technologies has consistently generated tensions between educational paradigms and their practical applications in the workforce.For instance, the introduction of calculators once sparked significant debate and concern among a wide range of educational stakeholders [82].This historical episode in some ways mirrors the contemporary challenges we face with the integration of ChatGPT into higher education, notably regarding concerns about compromised learning outcomes and academic integrity.However, in today's world, manual calculations persist in educational settings while they have largely been replaced by calculators in practical applications.This dichotomy reflects a societal consensus that acknowledges the differing skills required in educational and practical settings.The consensus broadly supports the view that educational processes should involve labor-intensive, iterative experiences that facilitate trial and error.Such well-designed "intended inconveniences" allow students to cultivate critical thinking skills.Additionally, a limited understanding of the components and processes of technology constrains our ability to leverage its full potential.Consequently, arguments like 'I don't see why I need to practice calculating when I have a calculator' are rarely voiced.
Just as society reached a consensus about the role of calculators, a similar discourse is urgently needed for integrating LLM technologies like ChatGPT into educational settings.To initiate this discourse, we must articulate the core values and principles that should guide education in the AI era, making sure these resonate with academia and other stakeholders in our society.Key questions to consider include: What skills should be prioritized in education during the AI era?Given the rapid expansion of information and knowledge, how can educational institutions make focused choices? and how can we harmonize traditional educational paradigms with rapidly evolving societal needs?After establishing a direction and foundational principles, it becomes imperative to communicate them clearly and consistently to all members of the educational community.Through iterative and sustained dialogue, these values can be embedded into institutional culture, eventually shaping public perception and establishing a new common sense.

LIMITATIONS AND FUTURE RESEARCH
Some limitations merit note.First, since our research is based on English-speaking university students in a Western culture, so our results may not reflect students who speak English as a second language, live in other regions (e.g., Eastern countries, or those that are slow in technological advancement), or are unable to enter university for various reasons (e.g., economic).At this time, our research cannot answer whether ChatGPT will become a useful educational tool for students who (1) cannot receive a university-level education (considering that some of them may not have regular access to technology) or (2) must interact with ChatGPT via a second language.Thus, future research could investigate how underprivileged students or students who speak English as a second language perceive and use ChatGPT for educational purposes.
Second, our study focused on the benefits and drawbacks of ChatGPT when used by undergraduate students, so views of graduate students or students with higher or lower AI literacy could yield different outcomes.In addition, due to the nature of a classroombased study, students might not have felt comfortable sharing their frank opinions (e.g., ChatGPT is better than a professor) in front of the professors and other peers.We anticipate future research that can elicit more honest opinions from students who have used ChatGPT for diverse educational purposes.
Third, we did not systematically observe or measure if the AI tools did or did not help improve the main objective of students' study.However, we believe that analyzing the grade of the course or other metrics of the material to examine the real impact of AI tools on student learning will certainly be a promising area of research.
Last but not least, at the time of conducting this research, we experimented with ChatGPT-3.5However, as the technological advancement of ChatGPT accelerates, we will need fruitful follow-up studies that extend our research.For example, while ChatGPT-3.5 had issues with recency of information, the ChatGPT Pro version resolved most of the recency issues.Thus, it is important to research how recent information satisfies student needs.Also, the updated ChatGPT-4 can handle longer texts and provide a multimodal setup.Such technological advancements have extended the possibilities of comprehending concepts or contexts in long texts (e.g., student's long reports) and increased chances of supporting various interaction types (e.g., drawing pictures) that ITSs cannot adequately support [22].Thus, we call for more agile follow-up studies that investigate the new prospects these recent functions could open in the field of human-AI interaction in education.

A APPENDICES A.1 Method
A.1.1 Step 2: Acquainting Students with AI-Related Concepts and Designs (8 weeks).Lectures: Lectures consisted of an array of cutting-edge topics, including but not limited to fundamentals of machine learning (ML) and deep learning (DL), ethical considerations in AI, algorithmic bias and fairness, privacy concerns in AI, explainable AI and algorithmic transparency, generative AI, the future of work in the AI landscape, and AI in organizational settings.Each topic was delivered in one to three 50-minute sessions.
During the lecture period, to stimulate critical thinking and encourage students to form their own perspectives, the instructor assigned a series of ten different questions that touched on current and contentious issues within the AI field.These assignments were crafted to enrich the topics covered in lectures.For example, one assignment required students to read a recent article about an AI model winning an art competition prize.In response, students composed one-page essays and then participated in discussions about the broader implications of such events.Key debate questions included, "Should AI be prohibited from such competitions?""What does this mean for AI in the workplace?"and "How can human artists maintain their uniqueness in a landscape increasingly influenced by AI?" The course also employed multimedia resources to introduce ethical complexities associated with AI technologies.Students watched "Coded Bias," a Netflix documentary, and read a research paper that claimed AI could more accurately discern individuals' sexual orientation through facial expressions than humans could.These pieces served as a catalyst for intensive discussions concerning the privacy and ethical challenges inherent in applying AI to sensitive areas.
In a hands-on segment, the course also featured a rapid prototyping exercise named "Developing AI Surveillance in Three Minutes." This segment aimed to highlight how easily AI can be harnessed for potentially malicious purposes, such as intrusive employee monitoring and supervision.Utilizing Google's Teachable Machine, a tool that enables transfer learning with minimal data, students recorded a range of typical workplace behaviors, such as sleeping at desks, working, and smartphone use.These recordings served as training data for a model capable of real-time behavior prediction via webcam footage.
These assignments and discussions gave students a nuanced understanding of the evolving AI landscape, focusing on ethical and societal issues.This enhanced their ability to discuss AI's contemporary role, opportunities and challenges in human-AI interaction, which we ultimately expected to improve the quality of our participatory design outputs.

Figure 1 :
Figure 1: Examples of artwork from a mini art contest

Figure 2 :
Figure 2: Examples of storybooks from the Becoming an Author contest

Figure 3 :
Figure 3: Examples of design sketches from participatory design sessions

Table 1 :
A summary table of opportunities and challenges identified from student-ChatGPT interaction in higher education