Curriculum Learning: Theories, Approaches, Applications, Tools, and Future Directions in the Era of Large Language Models

This tutorial focuses on curriculum learning (CL), an important topic in machine learning, which gains an increasing amount of attention in the research community. CL is a learning paradigm that enables machines to learn from easy data to hard data, imitating the meaningful procedure of human learning with curricula. As an easy-to-use plug-in, CL has demonstrated its power in improving the generalization capacity and convergence rate of various models in a wide range of scenarios such as computer vision, natural language processing, data mining, reinforcement learning, etc. Therefore, it is essential introducing CL to more scholars and researchers in the machine learning community. However, there have been no tutorials on CL so far, motivating the organization of our tutorial on CL at WWW 2024. To give a comprehensive tutorial on CL, we plan to organize it from the following aspects: (1) theories, (2) approaches, (3) applications, (4) tools and (5) future directions. First, we introduce the motivations, theories and insights behind CL. Second, we advocate novel, high-quality approaches, as well as innovative solutions to the challenging problems in CL. Then we present the applications of CL in various scenarios, followed by some relevant tools. In the end, we discuss open questions and the future direction in the era of large language models. We believe this topic is at the core of the scope of WWW and is attractive to the audience interested in machine learning from both academia and industry.


TOPIC AND RELEVANCE
We first illustrate the topic of our proposed tutorial and its relevance to the web conference.

Description
Curriculum learning is a machine learning strategy that trains a model from easy to hard, mimicking the way that humans learn with curricula.Via this tutorial, we are going to depict a comprehensive developing skeleton of curriculum learning based on the schedule illustrated in Table 1.

Scope
The scope of the tutorial includes theories, approaches, applications, tools and future direction of curriculum learning.We will try our best to comprehensively cover all relevant aspects and advocate novel, high-quality research findings of CL.

Importance
We believe this tutorial is important and necessary to be included in the tutorial program at WWW 2024 for two-fold reasons.
(1) CL is a research topic worthy of studying, which can help models to generalize better and converge faster and is easy to use.(2) This tutorial can ease the usage of CL both theoretically and practically, helping form potentially novel solutions.

Relevance to WWW
The Web Conference is a premier venue to present and discuss progress in research, development, standards, and applications of the topics related to the Web.With the development of web technology, machine learning and many other relevant fields methods have also been applied to it.Curriculum learning, as an easy-to-use training strategy of machine learning, enables to address multimedia data with noise, and data collected from the web often comes with noise more or less.Therefore, CL can be an essential technology to be employed when training a machine learning model using tremendous data from the Web, and thus be a highly relevant topic to this conference. (

STYLE
It is a lecture-style half-day tutorial (lasting 3 hours).

SCHEDULE
The detailed program of the tutorial is organized as follows.

Introduction
Curriculum learning (CL) has continuously gained attention since its first advent from Bengio et al.CL borrows the idea of human learning curricula from easy content to hard content, forming a general training strategy for various machine learning models and applications.Given that CL can help models to generalize better and converge faster, researchers have proposed numerous CL algorithms and shown their effectiveness in a wide range of tasks.Therefore, we believe that it is necessary to introduce CL to more researchers in machine learning community and provide an overall picture of CL, which includes comprehensible and elaborate answers to the following questions: (1) Theories: what are the definitions of CL and why are they effective?(2) Approaches: what methods should be included?(3) Applications: how to design different curricula for different scenarios?(4) Tools: what tools are available for ease of use and understanding of CL? (5) Future directions: What role does curriculum learning play in this era of large language models?
•   () =  ().Since the concept of Original Curriculum Learning was formally proposed as above, the academic community follows and further extends the definition of CL within the spirit of "training from easier data (tasks) to harder data (tasks)", i.e., relaxing the conditions in its definition to enable more flexible CL strategies.For example, Data-level Generalized Curriculum Learning is defined as a sequence of reweighting of target training distribution over  training steps, discarding the three conditions.Generalized Curriculum Learning is defined as a sequence of training criteria over  training steps, discarding the three conditions and the definition of   .
Another important question is why does this human-curriculumlike training strategy work?Basically, existing analyses uncover the essence of CL from the perspectives of optimization problem and data distribution, based on which we can further summarize the two main motivations for applying CL: to guide and to denoise.
To begin with, from the perspective of optimization problem, Bengio et al. initially point out that CL can be seen as a particular continuation method, which shares the same spirit with simulated annealing to provide a sequence of optimization objectives starting with a heavily smoothed objective throughout the training.On the other hand, researchers also analyze the CL mechanism from the perspective of data distribution.Since CL strategy encourages training more on the easier data, an intuitive hypothesis is that CL learner wastes less time with the harder and noisy examples to achieve faster training, reducing the negative impacts from lowconfidence examples, thus denoising the training process.

Approaches
A general framework for curriculum design consists of two core components: Difficulty Measurer and Training Scheduler, which decide two things respectively: 1) What kind of training data is supposed to be easier than other data?2) When should we present harder data for training, and how much more?Thus, we can divide existing CL methods into two types: when both the Difficulty Measurer and Training Scheduler are designed by human prior knowledge with no data-driven algorithms involved, we call the CL method predefined CL.If any (or both) of the two components are learned by data-driven models or algorithms, then we denote the CL method as automatic CL.
In the early stages, predefined CL takes the mainstream.However, this type of predefined approach is not flexible and general enough for widespread applications.In 2010, Kumar et al. propose self-paced learning (SPL), enabling automatic curriculum scheduling by ordering data according to their training loss.Subsequently, a variety of automatic curriculum learning methods have continued to emerge.For example, transfer learning methods employ teacher models to offer student models curricula.Reinforcement learning methods allow teacher models to adapt curriculum based on the feedback from student models.In addition, there are other ones based on Bayesian optimization, meta-learning, and adversarial learning for implementing automatic curriculum learning.All representative approaches of both categories will be reviewed and discussed in this tutorial.

Applications
CL has a wide range of applications.In this tutorial, we discuss them in terms of six aspects: • Combinatorial optimization problems for the web: traveling salesman problem, secretary problem, etc. • Computer vision for the web: image classification, object detection, semantic segmentation, face recognition, image generation and translation, video processing, etc. • Natural language processing for the web: text classification, machine translation, question answering, etc. • Graph machine learning for the web: node classification, graph classification, link prediction, etc. • Robotics: navigation, control, games, etc.

Tools
This tutorial will also introduce our contributed Curriculum Machine Learning library, CurML, which is the first public open-source library for CL.We implement a considerable number of existing CL algorithms through a unified and extensive framework.

Future Directions
It would also be valuable to discuss the future directions for CL.Evaluation benchmarks.In existing literature, the datasets and metrics are diverse in different applications.It is necessary but challenging to design a unified dataset with unified metrics to evaluate and compare the CL algorithms.
More advanced theories.Existing theoretical analyses provide different angles for understanding CL.Nevertheless, more theories are still required to guarantee the effectiveness of CL, and the application of CL in a specific task.
Application on LLM.Previous research on CL has mainly focused on smaller models.But with the rise of LLMs in today's landscape, there's a pressing question: How can we effectively combine CL with LLMs?Specifically, can CL help with tasks like pretraining, fine-tuning, and prompting in LLMs, ultimately speeding up learning, improving generalization, and making the models more adaptable to new tasks or domains?We'll explore some examples and analyze how CL can be applied to LLMs.

Q&A
This tutorial includes 15 minutes for questioning and answering.We welcome any question about CL from the audiences.

AUDIENCE AND BACKGROUND
Target Audience.This tutorial will be highly accessible to the whole machine learning community, including researchers, scholars, engineers and students with related backgrounds in computer vision, natural language processing, graph machine learning, reinforcement learning, meta-learning, etc.The expected number of attendees will be around 100 for this tutorial.
Prerequisite Knowledge.It is self-contained and designed for introductory and intermediate audiences.No special prerequisite knowledge is required to attend this tutorial.

Potential Learning Outcomes.
(1) To promote the importance of CL in advancing machine learning research, as well as reduce the marginal cost of studying CL in a variety of scenarios.(2) To encourage the audience to combine their research with CL, which can be a possibly promising solution for problems involving difficulty-measurement and noisy data.(3) To push forward the development of CL research by pointing out future directions.
To summarize, CL is a promising research area that will have a great effect and positive impact on machine learning, so we believe it can benefit audience interested in machine learning a lot and inspire them to produce exciting research results with this topic.

PREVIOUS EDITIONS
The tutorial has not been given before.Besides, to the best of our knowledge, there have been no related or similar tutorials presented in the past 3 years at WWW or other venues.

Table 1 :
Detailed schedule for the division of the presentation.(split unit: minute)