ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions

Mathematical questioning is crucial for assessing students’ problem-solving skills. Since manually creating such questions requires substantial effort, automatic methods have been explored. Existing state-of-the-art models rely on fine-tuning strategies and struggle to generate questions that heavily involve multiple steps of logical and arithmetic reasoning. Meanwhile, large language models (LLMs) such as ChatGPT have excelled in many NLP tasks involving logical and arithmetic reasoning. Nonetheless, their applications in generating educational questions are underutilized, especially in the field of mathematics. To bridge this gap, we take the first step to conduct an in-depth analysis of ChatGPT in generating pre-university math questions. Our analysis is categorized into two main settings: context-aware and context-unaware . In the context-aware setting, we evaluate ChatGPT on existing math question-answering benchmarks covering elementary, secondary, and ternary classes. In the context-unaware setting, we evaluate ChatGPT in generating math questions for each lesson from pre-university math curriculums that we crawl. Our crawling results in TopicMath 1 , a comprehensive and novel collection of pre-university math curriculums collected from 121 math topics and 428 lessons from elementary, secondary, and tertiary classes. Through this analysis, we aim to provide insight into the potential of ChatGPT as a math questioner 1 .


INTRODUCTION
Math problems are essential educational tools for evaluating students' logical and problem-solving abilities [10,36].Engaging students in answering those expert-designed questions has been shown to improve their learning outcomes [12,28].Nonetheless, manually crafting such questions demands substantial human effort and expertise, making it time-consuming, non-generalizable, and impractical for scalability [22].Therefore, automatic tools to generate mathematical questions have received growing attention [20,37].Existing state-of-the-art frameworks primarily rely on fine-tuning strategies [34,35,37,42].However, these approaches are criticized for their limitations in generating questions that necessitate multi-step reasoning [15].Recent progress in large language models (LLMs), like ChatGPT [24], has garnered significant interest and demonstrated remarkable efficacy in numerous natural language processing (NLP) tasks through the use of prompts.Nevertheless, their potential and benefits in crafting educational questions, especially within mathematics, remain underinvestigated.
In this work, we take the first step to conduct an in-depth analysis of the potential of applying ChatGPT in automatically generating pre-university math questions.We categorize our analysis into two main scenarios: (1) context-aware, where the model is given a context to generate math questions either with or without an expected answer, and (2) context-unaware, where the model generates math questions based solely on an instructional prompt.Under the context-aware setting, ChatGPT is evaluated on 3 math questionanswering benchmarks from elementary, secondary, and ternary classes respectively.In context-unaware scenarios, where no prior context is available, assessing ChatGPT is more challenging due to significant variations in model performance based on different instructional prompts.Nonetheless, this setting is more realistic and helpful since teachers may not have any contexts or stories beforehand to ask for generating math questions.
In addition, our evaluation reveals that the performance of the model varies when generating questions in different math topics.Therefore, to exhaustively evaluate the model in the contextunaware setting, we hire expert students who are high-school national math olympians from universities, to crawl 428 math lessons from Khan Academy 2 with their mathematical definitions and exemplary problems from 121 math topics covering most of the topics from grade 1-st to tertiary classes.We then instruct ChatGPT to generate question-answer pairs for each lesson, given desired difficulty levels.Through our analysis, we derive a number of worthy findings.Our contributions are summarized below: (i) We are the first to conduct a comprehensive analysis of the feasibility of leveraging ChatGPT in generating pre-university math questions.
(ii) We study two main settings in generating mathematical questions.We further dive our evaluation deeply into a large number of math topics and lessons covering most from pre-university classes.
(iii) We contribute TopicMath, a novel and comprehensive collection of expert-authored pre-university math curriculums.
(iv) We provide 11 findings about the capability of ChatGPT in generating pre-university math questions.We hope these findings can offer good insights for teachers & researchers in utilizing modern AI technologies like ChatGPT for serving educational purposes.

RELATED WORK 2.1 Large Language Models & Prompting
Recently, LLMs have shown remarkable zero-shot and few-shot abilities in various language generation contexts [2,25,39].However, they still face challenges in more complex tasks like mathematical reasoning [9,11], often requiring expensive computational resources for fine-tuning.To address this, researchers have been exploring novel prompting methods to instruct LLMs in these tasks, including chain-of-thought (CoT) prompting [40].This enables LLMs to perform intermediate reasoning steps, significantly enhances LLMs' reasoning abilities, especially for complex mathematical and decision-making tasks.

Pre-university Math Problems Generation
Pre-university math problems have received increasing attention from the AI research community, with benchmarks such as SVAMP [27] for elementary-level math, secondary school-level GSM8K [5] offers diverse solution templates, and the MATH [10] dataset provides complex reasoning for tertiary/olympiad problems along with step-by-step solutions.Recently, interest has grown in other tertiary math topics like geometry problems and mathematical theorem proving [3,32].Additionally, automatic question generation (QG) in education has gained attention for enhancing teaching activities [16].Additionally, in education, QG has gained attention with the use of LLMs, particularly ChatGPT, has gained significant interest for generating practice questions in various subjects [13,38].However, its potential for generating pre-university mathematics problems remains largely unexplored.This study, therefore, evaluates ChatGPT's performance using three well-established datasets: 2 https://www.khanacademy.org/SVAMP, GSM8K, and MATH, covering pre-university grades and various difficulty levels.

PROBLEM FORMULATION
We study ChatGPT3 on generating math problems in both contextaware and context-unaware settings across various pre-university difficulty levels, including elementary, secondary, and tertiary.
• Context-aware.We evaluate models in both answer-aware and answer-unaware modes.In the answer-aware setting, we provide context  and evaluate the models by generating math questions, with each sample represented as (, , ), where  is the context,  is the question, and  is the answer.The models are then finetuned/run inference to generate  given  and .In the answerunaware setting, models generate questions conditioned solely on , with  being unavailable.
• Context-unaware.The absence of context  poses a unique challenge for assessing ChatGPT's math problem generation capabilities.Nonetheless, this scenario is essential, as teachers often seek to prompt language models like ChatGPT for math questions without prior context.To address this, we manually collect math curricula for three pre-university levels and propose a prompting framework to create PRE-UMATH, a novel dataset with 16 question-answer pairs spanning 121 pre-university math topics and 428 lessons.Our evaluation of PRE-UMATH provides valuable insights into ChatGPT's math question generation capability.

CONTEXT-AWARE METHODOLOGY 4.1 Fine-tuning Baselines
We fine-tune the baselines to generate the question , given the context  with or without the expected answer  by concatenating the input in the format: Context: C [with/without] Answer: A. The model then learns to generate .

Prompting ChatGPT
We prompt ChatGPT to generate a math question using  with or without .Empirical experiments in Table 1 demonstrate that imposing constraints produces questions closer to ground truth.Hence, we propose the following constraints for this task.To ensure coherence and comparison with groundtruth questions, we instruct ChatGPT to generate concise questions (1) without excessive context repetition.This constraint minimizes disparities with the ground-truth question, improving fluency (e.g., before: To maintain consistency, we emphasize that the generated question should (2) match the tense of the provided context.This constraint helps to ensure that the question appears grammatically correct and coherent within the given context (e.g., before: [Past-tense Context] [Present-tense Question]; after: [Past-tense Context] [Past-tense Question]).In order to promote brevity and clarity in the generated questions, we set a (3) word limit of no more than 20 words.2.
------------------------------  (1) Curriculum Collection.We hire six undergraduate students in mathematics who achieved high-school national mathematical olympiad medals.They are divided into three groups, each group consists of two students.Students in each group are instructed as follows to collect the math curriculums from Khan Academy.First, they are instructed to collect all the math topics (chapters' titles) and lessons' titles from 14 courses in Khan Academy, ranging from elementary school to tertiary.In addition to the titles, students are also asked to collect one exemplary question per lesson in the Example section, rate its difficulty following our definitions in Section 6.3, and the lessons' definitions from the About section or the FAQ and Review sections at the end of each chapter.If students could not find any definition in the above sections, they were asked to attempt to find the lesson's definition from the introductory Video.If the students could not find an appropriate definition or example for a lesson, the lesson would not be collected.Among Table 3: Data collecting and annotating process with the edit rate of 12.75%.Red denotes the deleted part of ChatGPT's answer, green denotes the corrected.the 14 courses from Khan Academy available, spanning from elementary school to tertiary, the rate of removed lessons is 52.60% (Table 2) (2) Create Examples' Answers.After getting topics, lessons, definitions, and exemplary questions with their difficulties, we ask annotators to prompt ChatGPT via zero-shot Chain-of-Thought [14] to obtain the questions' explainable solutions.These solutions are then reviewed and edited as needed.As per the data presented in Table 3, the average edit rate in the token level stands at 12.74%.
(3) Curriculum Expert Verification.In our final step, we hire three educators who have degrees in education and currently are math teachers in elementary, secondary, and tertiary schools.They are invited to verify the correctness and appropriateness subjectively of all the collected topics, lessons, definitions, and question-answer pairs with their difficulties.If any collected data is found to be inappropriate or theoretically incorrect, educators have the option to edit or discard it.We found their approval rate of 87.70%.Finally, we collect 121 topics and 428 lessons with 428 examples.We name this dataset as TopicMath.

TopicMath Analysis
Topic & Subtopic Distribution.We observe that the number of grade 1 topics collected is the smallest since its difficulty levels are not diverse and the number of mathematical operators and methods is limited, there are fewer collected grade-1-level math questions compared to other levels.Meanwhile, grade 5 has the highest number of topics since a significant number of grade-5-level math questions proved to be highly compatible with our collection criteria and constraints.
Removal Ratio Analysis.In the process of collecting math definitions, questions, and answers in grade 1, our annotators observe the absence of definitions and an overabundance of similar question types.Consequently, a significant portion of grade-1-level questions had to be excluded from our collection due to the stringent criteria and constraints we employ.

Prompting ChatGPT to Generate
Educational Questions from Math Topics Table 4: PRE-UMATH statistics by grades.We prompt ChatGPT to generate QA pairs from TopicMath for our evaluation purposes.Our inference strategy involves using prompts that promote diversification in tokens, topic alignment, and difficulty.The algorithm is presented in Algorithm 1. Specifically, we create a list of generated QA pairs for each lesson in TopicMath.Given a lesson, our prompt consists of its definition and the topic's name it belongs to, and one demonstration randomly selected from its list of generated QA pairs.After getting the newly generated QA pair, we accept it if its question has a ROUGE-L score less than 0.7 with any of all the questions generated from all the lessons, otherwise, we filter it out.To promote token diversity in generating educational questions, we utilize two strategies.For grades 1-8, we ask ChatGPT to enrich the generated questions by providing objects and stories via adding "You could introduce characters, objects or scenarios to make your math problem context more diverse in terms of token" to the prompts.For tertiary classes, the problems might be complicated and require more rigorous and abstract thinking.Therefore, instead of requiring a real-life context, we instruct the model to introduce more variables in naming the objects via supplementing "Your questions are required to be diverse in terms of tokens, which can be achieved by paraphrasing the question or introducing/renaming variables" to the prompts, so the abstract contexts could be generalized.We name our generated dataset from ChatGPT for evaluations as PRE-UMATH, consisting of 16 QA pairs.Our prompt template is below.
Base prompt for generating pre-university math questions.Your questions are required to be diverse ...You are also given an example: [prompt demonstration] Generated question:...We also conduct an in-depth analysis of PRE-UMATH to better understand how large and diverse in terms of topic, lesson, and difficulty our evaluations are.Its statistics are presented in Table 4.Our analysis offers several key insights.First, regarding difficulty, PRE-UMATH encompasses question-answer pairs from five distinct difficulty levels, with Level 4 being the most prevalent at 54.7% and Level 3 at 13.5%.In terms of topic and lesson distribution, grade 1 exhibits the fewest topics (2), while tertiary classes have the highest (51), likely due to the broad subject range.Additionally, lesson distribution mirrors topic distribution, with grade 1 having the least number of lessons and tertiary classes having the most.Regarding the generated QA pairs distribution, we observe that in certain lessons such as Polynomial factorization (437 QA pairs) and Trigonometry (516 QA pairs), ChatGPT can generate substantial numbers of QA pairs, whilst other lessons such as Absolute value & piecewise functions (19 QA pairs) these numbers are significantly fewer.This is because, in certain lessons, problems can have multiple conditions and mathematical scenarios which result in a high number of variants being generated, while questions in other lessons can be either too narrow or too specific, leading to limited variants.Therefore, the number of generated QA pairs is not always monotonically increasing with the number of lessons.According to grade, we obtain the number of generated QA pairs for tertiary classes as highest (11,032) while for secondary and elementary classes, grades 6 and 8 have the highest numbers whilst grade 1 is the lowest.
• Data Pre-processing.While the SVAMP and GSM8K datasets provide context and question separately, the MATH dataset lacks this separation.To address this, we firstly segment MATH contexts into individual sentences, then the annotators identify the most suitable sentence for forming a question and the rest becomes context.In cases where the information is insufficient, the annotators can exclude these samples.Finally, in contrast to GSM8K and MATH, which provide separate train and test sets, we split the SVAMP dataset into train and test sets due to the absence of this division.
• Automatic Evaluation.In the answer-aware setup, our aim is to generate questions that closely resemble ground-truth one as possible.Following previous works [7,8,23], we use BLEU-4 [26], ROUGE-L [19], METEOR [1] as our n-gram evaluation metrics, and use BERTScore [43] to measure the similarity between the generated candidate and ground-truth questions.In the answerunaware setting where the answer and the ground-truth question are unavailable, we follow [6,33] and measure the Diversity of generated questions by Distinct-1,2 [18] and the Relevancy with respect to the context using BERTScore.
• Human Evaluation.To further assess the quality of the generated questions with human preferences, we conduct a human study on 200 randomly selected cases from each dataset.The bestperforming fine-tuned baseline (based on the average of all metrics) and ChatGPT are selected for evaluation.Then, three English-native educators are hired to evaluate models (1-5) based on 5 criteria: Difficulty, Relevancy, Grammaticality, Answerability, and Usefulness.The detailed scoring criteria for metrics are provided in Section 6.3.

Context-unaware Experimentation
In the context-unaware setting, since there are no ground-truth questions, we only rely on human evaluations.We perform human evaluation on 500 randomly selected samples, with 100 questions coming from each prompted difficulty.We hire three educators who are native English speakers to evaluate ChatGPT on 5 criteria: (1) Grammaticality to assess the grammatical accuracy of the generated question; (2) Answerability measuring the answerable plausibility of the generated question; (3) Topic Alignment assessing question relevancy to the topic; (4) Difficulty Alignment to compare the expected and generated difficulty; (5) Usefulness to assess the mathematical usefulness of the generated question to the education generally.The scoring criteria for metrics are provided in Section 6.3.

Human Rating System
This section presents the human evaluation criteria employed to assess the quality of datasets in both context-aware and contextunaware settings.These criteria were thoughtfully selected, taking into consideration their widespread usage, to ensure an effective evaluation of the datasets' quality.
For evaluating both answer-aware and answer-unaware settings, our human evaluators assess questions based on multiple criteria.These criteria encompass Difficulty, Relevancy, Grammaticality, and Answerability.When evaluating difficulty, we employ a 1 to 5 scale, with 1 signifying suitability for lower elementary school students (grades 1-3) and 5 representing a level of challenge appropriate for mathematics contests and tertiary-level students.Relevancy scores span from 1 to 5, with 1 indicating low relevance (0-20%) and 5 denoting high relevance (80-100%) between the context and the generated question.Grammaticality is rated with options of 1, 3, or 5, where 1 reflects the presence of severe grammatical errors, 3 suggests the question is good but contains minor grammatical errors, and 5 indicates a question that is both grammatically and factually correct.As for answerability, we consider two scenarios.In the answer-aware setting, a score of 1 means the question is not answerable, and a score of 3 indicates that the question is answerable but does not match the ground-truth answer, while a score of 5 means the answer matches the ground-truth.In the answer-unaware setting, only a score of 5 is used, indicating that the question is answerable.
To evaluate questions within the PRE-UMATH framework, human evaluators employ a set of diverse criteria, encompassing Difficulty, Grammaticality, Answerability, Topic Alignment, Difficulty Alignment, Answer Quality, and Usefulness.Difficulty is rated from 1 to 5 scale, indicating the question's suitability for varying educational levels, from elementary to olympiad.Grammaticality is scored at 1, 3, or 5, reflecting the presence of grammatical errors.Answerability is rated either 1 (unanswerable) or 5 (answerable).Topic alignment is rated as either 1 (not relevant to either topic or lesson), 3 (relevant to the topic but not the lesson), or 5 (relevant to both topic and lesson).Difficulty alignment is assessed with options of 1 (deviation of more than 1 level from the given difficulty), 3 (onelevel deviation), and 5 (match the given difficulty level).Answer Quality is either 1 (incorrect step-by-step explanation), 3 (partially correct step-by-step explanation but not the final answer), or 5 (both a correct step-by-step explanation and the correct final answer).Finally, usefulness scores gauge the utility of generated questions and solutions, which is either 1 (not useful), 3 (useful but requires editing), or 5 (useful and no editing required).

RESULTS AND DISCUSSIONS 7.1 Automatic Evaluation
It is worth noting that our automatic evaluations are only conducted on context-aware setting.In the answer-aware setting, fine-tuning baselines consistently outperform ChatGPT across all automatic metrics on the three benchmarks.However, in the answer-unaware setting, we derive interesting insights.Firstly, ChatGPT generates   more diverse questions in terms of token levels on the challenging GSM8K and MATH benchmarks, which underscores its practical potential for educational purposes.Conversely, GPT-2 excels on SVAMP dataset by yielding higher distinct scores compared to ChatGPT.This might be because the questions generated by GPT-2 are generally short and consist of non-sense tokens.For example: Context: "At the stop 8 more people got on the train.There were 11 people on the train.";Question: "@@ Is there a limit on the number of people on the bus?"

Context-aware Human Evaluation
Through our careful manual evaluations, we have obtained 6 insightful findings.
(1) ChatGPT generates questions with minimal grammatical errors.As shown in Table 7, ChatGPT consistently attains grammaticality scores exceeding > 4.9, underscoring its proficiency in generating grammatically correct texts across all pre-university levels.Notably, we observe that grammatical errors predominantly emerge when ChatGPT attempts to generate highly complex problems.
(2) ChatGPT generates questions that are highly relevant to the input context.Our manual evaluations reveal that the questions generated by ChatGPT are highly relevant to the input contexts.Remarkably, these questions exhibit minimal presence of unrelated characters or variables not found in the context, resulting in nearly perfect relevancy scores across most datasets and sub-settings (see Table 7).However, an intriguing observation emerges with a lower relevancy score in the answer-aware setting for MATH compared to its answer-unaware counterpart.
(3) ChatGPT frequently repeats information from the context.Despite explicit constraints outlined in the prompt template regarding repetition, the model occasionally reiterates random segments (tertiary-level) or the whole context (lower-levels).This repetition leads to the generation of overly lengthy questions, as exemplified by instances such as: "Context: A football team played 22 games.They won 8 more than they lost.", "Generated Question: How many games did the football team win if they played 22 games and won 8 more than they lost?".Subsequent human evaluations unveil that this issue predominantly afflicts the GSM8K dataset, occurring about 50% of the time.
(4) With an expected answer, ChatGPT tends to generate answerable questions whilst, without it, this likelihood is lower.When additional hypotheses are required to construct a complete question (e.g., "Context: Mary is two years younger than Joan, who is five years older than Jessa"), our empirical evaluation indicates that ChatGPT tends to struggle in the absence of an expected answer as a reference.For instance, within the answer-unaware setup and considering the context mentioned, ChatGPT only asks "How old is Jessa?".
(5) Without an expected answer, ChatGPT frequently generates trivial questions.In the answer-unaware scenario, ChatGPT often exhibits a combination of the aforementioned behaviors (2) and (4), where it redundantly repeats information from the context and formulates it as a question.This behavior occurs irrespective of the context's complexity, resulting in the generation of simplistic questions that merely require looking up information in the context itself.For instance, when the context is as straightforward as "Darrell and Allen's ages are in the ratio of 7:11", ChatGPT redundantly repeats the entire context and asks, "What is the ratio of Darrell's age to Allen's age?".While this phenomenon occurs less frequently than (4), about 5% of the generated questions.
(6) Even with good contextual understanding, ChatGPT struggles to understand the relationship between mathematical objects.This problem manifests in both the answer-aware and answer-unaware settings.In the answer-aware mode, ChatGPT tends to inaccurately order subtraction operations while in the answer-unaware, it exhibits reluctance in generating questions related to the relationships between objects.This phenomenon occurs about 2% of the time.

Context-unaware Human Evaluation
Along with the same observation about question grammaticality, we provide 5 more findings in the context-unaware setting.
(1) ChatGPT generates questions with high diversity in terms of context.Across all three class levels, ChatGPT consistently excels  (2) ChatGPT sometimes generates questions that are not wellaligned with some provided topics.ChatGPT occasionally interprets complex concepts as more familiar ones.For instance, when tasked with creating questions related to "Divide whole numbers to get a decimal quotient", it might generate questions whose answers are whole numbers instead of decimals.Similarly, in lessons containing the term "modeling", ChatGPT tends to generate Linear Programming questions.Although this misunderstanding is relatively infrequent, only about 5% of cases, it is a noteworthy aspect to keep in mind.
(3) ChatGPT generates questions with difficulty solely depending on the difficulty of demonstration.When presented with a straightforward example (level 3) but tasked with generating a more complex question (level 5), ChatGPT often replicates the initial demonstration and struggles to enhance the question's difficulty.We noticed that lessons with overlapping content between secondary school and high school levels were generated similarly because of the shared demonstration, despite distinct required difficulty levels in the prompt.This pattern was consistently observed in all our attempts to generate questions with varying difficulty levels from the provided demonstrations.
(4) If ChatGPT generates hard questions, it could not handle the complexity and generates nonsense.ChatGPT demonstrates proficiency in introducing new objects within questions but struggles to establish meaningful connections or inquire about genuine relationships between these objects.For instance, in Geometry questions, ChatGPT often generates random points (A, B, C), states some relationships (e.g., AX is the bisector of angle BAC), and includes unrelated quantitative properties (e.g., angle AXB = angle AXC), resulting in suboptimal performance.
(5) ChatGPT could generate questions that are not so mathy.While instructions and demonstrations in prompts have been effective in mitigating the issue, they are not foolproof.In primary school topics such as "Measurement", ChatGPT occasionally generates questions that do not necessitate mathematical knowledge (e.g., "How long is a ruler?").However, in higher-level classes where lesson names are more math-specific, this phenomenon is notably less prevalent.

CONCLUSIONS
In this work, we provide an in-depth analysis of the capability of ChatGPT to generate pre-university math questions.Our experimentation is categorized into two main settings: (1) context-awarewhen an input context/background is provided; (2) context-unaware -when there is no available context provided.The evaluation results in the context-aware setting show that although the generated questions are highly grammatically correct and relevant to the context, they tend to have lower difficulty than expected.Additionally, we find that with an expected answer as input, ChatGPT is more likely to generate an answerable question than without any answer provided.In the context-unaware setting, to exhaustively evaluate ChatGPT in pre-university math topics, we first crawl TopicMath, an expert-authored pre-university math curriculum consisting of 121 math topics and 428 lessons with their definitions and question-answer examples.We then prompt ChatGPT to generate math questions within each topic.Our human evaluations reveal that in some topics, the generated questions are not well-aligned with the topics, and are combined with knowledge from other scientific fields.Furthermore, we find that by providing both difficulty requirements and demonstrations, ChatGPT is highly likely to generate questions that are aligned with demonstrations instead of difficulty instructions.We hope these findings can provide good insights for teachers and researchers in utilizing large language models such as ChatGPT for generating math questions, boosting the applications of AI in education.
4 https://www.khanacademy.org/Source: Khan Academy Question: What is the value of x in the figure shown below: ... Edited ChatGPT's answer: Based on the information given.. and Congruence of Triangles -SSS to ... According to the Congruence of Triangles -SSS ... Hence, angle ABC = angle BCD = 88 Substituting ... + 90 88 degrees + 39 degrees = 180 degrees.Therefore, the value of x in the figure is 51 53 degrees.
Difficulty Level is:... Define the [subtopic name] topic as: [subtopic definition].Generate a math problem with its answer at Difficulty Level [difficulty] in the topic of [topic]: [subtopic name].

Table 1 :
Comparisons between performances of ChatGPT with and without constraints: Contextual Independence, Tense Matching and Word

Table 5 :
Answer-aware question generation experimental results in the context-aware setting.

Table 6 :
Answer-unaware question generation results in the context-

Table 7 :
Human evaluation results in the context-aware setting.

Table 8 :
Human evaluation results in the context-unaware setting.The format is  +  at crafting real-life scenarios that seamlessly integrate with the context.Notably, at the secondary and tertiary school levels, Chat-GPT demonstrates its versatility by drawing connections to other subjects, such as Physics and Biology.For example: "A population of bacteria doubles every 3 hours.If there are initially 1000 bacteria, what will be the population after 12 hours?Is this an example of exponential growth or decay?"