CLEB: A Continual Learning Energy Bidding Framework For An Energy Market Bidding Application

Energy trading in the day-ahead and continuous energy market enables the maximization of profits for market participants, such as utility companies/suppliers and residential/industrial consumers. However, in practice, the AI-based decision-making process for accepting or rejecting bids/offers from customers/suppliers, commonly referred to as bidding decisions, often experiences performance degradation due to the fluctuation of renewable energy resources and the intermittent demand behavior of customers. This phenomenon is widely recognized as a data distribution shift in machine learning. One conventional approach involves training the model from scratch over an extended historical period, incurring significant computational and storage costs. To address this challenge more effectively, we propose a Continual Learning-based Energy Bidding framework (CLEB). This framework employs a relay-based continual learning method, utilizing a combination of a small portion of historical data and the most recent data with different distributions to enhance the accuracy of bidding decisions. The framework consists of predictive neural networks, specifically a Multi-Layer Perceptron (MLP), as well as data buffers for storing newly acquired data from a non-stationary data stream within an application. Subsequently, the evolving probability distribution of the data stream identified by the framework is utilized to retrain the model. Our evaluation in a public European energy trading dataset shows that the framework significantly improves accuracy performance of prediction model under the data distribution shift occurrences, allowing the model adaptively itself to deal with non-stationary data distributions in dynamic environments.


INTRODUCTION
Energy trading involves transactions between utility companies/suppliers (a.k.a.power generators) that produce electricity and industrial suppliers, who purchase power from suppliers to sell it to residential consumers, as depicted in Figure 1.Energy trading plays a vital role in alleviating power shortages and ensuring the stable operation of the electricity market.Energy trading in both the day-ahead and continuous markets is susceptible to various uncertainties, including the intermittent behavior of renewable energy resources influenced by climate change and fluctuations in customer demand [1].This phenomenon is commonly recognized as a data distribution shift in the field of machine learning.An effective energy trading strategy should maximize the profits of all participants while maintaining an instantaneous energy balance amid various uncertainties [2].
Various research interests and publications on energy bidding strategies have been explored, categorized into two primary approaches: conventional optimization [3] and deep learning-based method [5].While the approach proposed in [3] finds an optimal bidding strategy for each participant by solving a multi-objective optimization problem through a central entity (i.e., an energy market operator), reinforcement learning (RL) [5] is employed to determine an optimal bidding strategy, aiming to maximize rewards by inferring the best action for a given state.Nonetheless, most of these approaches do not consider the challenge of data distribution shifts, leading to previously trained models performing suboptimally with real-time data.Continual learning (CL) algorithms are considered as promising solutions to address this challenge.CL algorithms learn concepts and tasks sequentially without degrading performance on prior tasks.Several CL strategies are described in [6]: (1) regularization strategies, (2) rehearsal strategies (also known as relay-based buffer/memory), and (3) architectural strategies.CL algorithms have demonstrated success in various domains since the models continuously adapt and update its parameters to account for such dynamic situations within the application.
Based on these observations, this work aims to introduce a framework called CLEB (Continual Learning-based Energy Bidding) that leverages continual learning algorithms, specifically the relay buffer strategy, to continually adapt to distribution changes in non-stationary data streams.The CLEB framework includes a multi-layer perceptron network (MLP) for bidding prediction and buffers for storing new data.Addressing data distribution shifts involves the use of an adaptive sliding window method [8], with a relay buffer employed to store detected distribution shifts.Additionally, the prediction output is estimated and compared to an adjustable threshold to identify performance shifts.Similarly, if model performance degrades suddenly, the present input data is stored in the relay buffer; otherwise, it is stored in a sampling buffer.When the relay buffer is full, model adaptation is triggered.The MLP is subsequently retrained on the data retrieved from the relay buffer and sampling buffer, with the goal of mitigating catastrophic forgetting [9].Subsequently, the CLEB framework is benchmarked in terms of model accuracy in a continual learning strategy within the context of energy bidding, using a publicly available European energy trading dataset.The results indicate that the CLEB framework has the potential to effectively manage energy bidding in non-stationary data streams.

RELATED WORK 2.1 Energy Bidding Strategy
Numerous recent studies have explored various methodologies within the realm of energy bidding strategy research.
A game theory-based approach [4] have been introduced for conducting transactions within the energy market.Participants submit information to a central entity, which then formulates the energy bidding strategy and communicates it to all stakeholders.However, as the number of participants increases, the data volume grows exponentially, intensifying the challenges associated with real-time resource scheduling in energy management.In [10], the authors introduced an energy market framework in which participating participants disclose their cost functions to the market operator.Subsequently, the optimization of the distribution network is collectively resolved through a decentralized approach.An additional investigation [11] unveiled a framework for a distributed system operator within the context of an energy market.This framework has demonstrated its ability to reduce supply costs for prosumers within a localized distributed area while simultaneously enhancing the payoffs of generation companies.Moreover, several research endeavors have delved into the application of Reinforcement Learning techniques to refine bidding strategies in the energy market.In [12], the authors harnessed Deep Reinforcement Learning (DRL) within the wholesale market, aiming to optimize generator bids based on limited, readily available information.Meanwhile, in [13], the authors employed a deep neural network to ascertain the best power trading dynamics amongst multiple microgrids featuring batteries and power generation.In [14], the authors adopted the multiagent deep deterministic policy gradient (DDPG) methodology to estimate the Nash equilibrium within the competitive bidding landscape involving power suppliers.

Continual Learning
This section provides an overview of several studies on classification and regression tasks within the domain of CL.
This section provides an overview of several studies on classification and regression tasks within the domain of CL.In [15], the MNIST dataset is divided into five distinct tasks, with one of them containing a pair of unique labels representing different digits.The proposed model continually learns to effectively solve a sequence of tasks using a transfer learning strategy.CL algorithms have been studied in various energy-related sectors.For example, in [7], neural networks are developed to predict renewable energy generation using a CL architecture-based strategy.This forecasting depends on power consumption, which is subject to fluctuations due to various factors, such as acquiring new electrical devices or an increase in occupants in a building or factory.Additionally, weather data exhibits non-stationary behavior, including extreme weather conditions.Moreover, in [16], two distinct CL application scenarios are outlined in the context of setting up local smart grids.These scenarios encompass the task-domain incremental scenario and the data-domain incremental scenario.Both scenarios are relevant to power forecasting, covering aspects such as predicting energy generation and load consumption levels.The research also delves into the performance evaluation of various regularization-based CL algorithms, specifically Elastic Weight Consolidation (EWC) and Online-EWC.
However, none of these studies consider employing CL algorithms for energy bidding applications, where the bidding decisions are significantly impacted by the uncertainty factors of the environment, such as power generation, load consumption, or electricity market prices.Furthermore, we adapt a relay-based continual learning strategy in the Adaptation component to update the model and dynamically adjust the threshold following each update.The CLEB framework can flexibly integrate various data detection shift methods as well as prediction models, tailoring its approach to specific application scenarios and data types.

Model
The bidding decision prediction task is regarded as a classification problem, specifically determining which bids/offers are accepted or rejected.To address this, an MLP neural network is employed, comprising an input layer, multiple hidden layers, and an output layer with fully connected neural networks extending from the input to the output layer.We consider a data stream for ∈ {1, 2, .., }, where each input is a sequence of bidding transactions and offer transactions submitted by consumers and suppliers.Let ∈ R be the parameter space for our model.The total binary cross-entropy loss on the training set = {( , )| < } =1 is represented as: where n stands for number of training examples, refers to the target label for training example , represents the input for training example , and ℎ denotes the model with learnable weight .The predictor is used to infer the bidding decision ˆ , given that the MLP neural network is fully trained with optimized neural network weight ℎ * and the current input at time is .

Input Distribution Shift Detection
A non-stationary data stream undergoes changes in properties over time, posing challenges for energy bidding prediction models.To determine the stationarity of the data stream, meaning whether the distribution of the current data is similar with the old data , various data distribution shift detection methods have been proposed in the literature, such as Adaptive Windowing (ADWIN) [8], Page-Hinkley [17], and Drift Detection (DDM) [18].Since AD-WIN is more tolerant of various drift types (such as concept drift, covariate drift, and others) compared to the remaining methods [19], we employ ADWIN as our primary method for detecting data distribution shifts.Essentially, the ADWIN method utilizes sliding windows of variable sizes based on observed data changes.When the difference between the statistics within these windows, such as the mean of the observed data window, surpasses a predetermined threshold, it indicates the detection of a data distribution drift.Figure 3 illustrates how ADWIN operates, with the red lines denoting data sample indexes at which distribution shifts are detected.One of the features in the training dataset is the "Price" attribute, which is described in more detail in section 4.
When the data distribution shift is detected, the data is store in the relay buffer.The relay buffer retains samples that exhibit a probability distribution shift in the data stream and uses them as inputs for retraining the model when model updates are triggered.Conversely, the sampling buffer stores data with a distribution similar to the old data .The data in the sampling buffer is employed to preserve previous knowledge for the updated model.A continually learning energy bidding model can be seen as an accumulation of knowledge aimed at enhancing prediction performance.It's important to note that the relay buffer has a finite size, whereas the sampling buffer is designed to be larger compared to the relay buffer.

Output Distribution Shift Detection
Even if data exhibits a similar distribution to the previous data , there is still a chance that the model's performance may significantly deteriorate if the data input contains noise.For instance, this could occur if essential attributes are missing due to errors in the data collection process, leading to prediction shift issues.Therefore, it is crucial to inspect the prediction output to ensure that prediction shifts are effectively managed.
Given the prediction output ˆ the framework assesses the model's performance using the following Supply-Demand Equilibrium (SDE) equation: where and ∈ {0, 1} are elements of vectors output ˆ , represent bidding decisions corresponding to a sequence of bidding and offer transactions in the input .Meanwhile, and denote the power quantity of the bidding and offer transactions, respectively, acquired from the data input .Detailed input attributes are further described in section 4. It is important to note that + = | ˆ |.The value of ( ˆ ) serves as an indicator of the supply-demand balance once the bidding decisions are determined by the prediction model.
Subsequently, ( ˆ ) is compared to a threshold , which is denoted by: where the value of establishes the boundary for the quality of an energy bidding prediction output.If ( ˆ ) < , the current input is stored in the sampling buffer; otherwise, it is stored in the relay buffer.The threshold value is adjusted after each model update, based on the new data used for retraining the model.

Adaptation
This component plays a crucial role in the CLEB framework by retraining the model to adapt to the non-stationary data stream within the dynamic environments of the energy bidding application.As mentioned earlier, we employ a relay-based continual learning approach that utilizes a small portion of old data (i.e., data with a distribution similar to what is learned by the current model at time ) along with newly acquired data (i.e., data with distribution shifts or noise) to update the model.This strategy is considered a suitable method for the energy bidding prediction task, addressing distribution shifts and mitigating catastrophic forgetting.It allows the model to acquire new knowledge without forgetting past knowledge.Once the model update is completed, it replaces the current model with the latest version, ℎ * .It's important to note that, since the energy bidding application in this work doesn't have access to ground-truth labels in practice, unlike other applications such as recommendation systems or energy load forecasting, we rely on a heuristic-based labeling technique (i.e., the electricity market clearing process), as mentioned in [20], to label the data in the relay buffer before retraining the Include bids/offers submitted to an electricity market operator in the continuous intraday market for Spain.OMIE Transactions Made [21] Contain successfully matched pairs of bids and offers in the OMIE Submitted Bids dataset.ENTSOE Energy Consumption & Generation [22] Consist of electrical consumption, generation, and weather data for Spain.ESIOS Energy Market Price [23] Include the electrical market price for Spain.

EXPERIMENT 4.1 Dataset
To conduct an empirical study, we have considered a total of four electricity datasets from the Spain market, as outlined in Table 1.
All four datasets span a 1-year period, from January 1, 2019, to December 31, 2019, and are organized based on hourly intervals.From the first two datasets presented in Table 1, we extracted the ground-truth labels for each bid and offer transaction, indicating whether they were accepted or rejected.Subsequently, we merged this dataset with attributes selected from the ENTOSE and ESIOS datasets, including total energy generation (supply), forecasted load consumption (demand), and energy market prices, ensuring time synchronization.The final dataset, denoted as , comprises over 5 million trading transaction samples and includes 16 attributes, with key attributes listed as follows: • Price (€ /MWh): This refers to the price at which a utility supplier or industrial consumer is willing to sell or buy a certain quantity of electricity during a specific period.
• Quantity (MW): This indicates the amount of electricity that market participants are willing to buy or sell at a specific price during a particular period.• Energy Generation (MW): This pertains to the total amount of electrical energy generated within a specific region, country, or system over a given period.• Forecasted Load Consumption (MW): This represents forecasts of expected electricity consumption or demand within a specific region, country, or electrical system over a defined time period.• Energy Market Price (€ /MWh): This refers to the price at which electricity is bought and sold in the electricity market in Spain.

Synthetic Dataset
To simulate the data distribution shift scenario, we created a synthetic dataset with the goal of altering the probability distribution of numeric columns in dataset at specific points.The underlying generation of the synthetic dataset is shown as follows: where ( ) is sampled from a Gaussian distribution with a timedependent mean and variance, representing the th attribute of

Metrics
Since the energy bidding prediction is considered as classification task, we employ the Jaccard score to measure the accuracy performance where ( ) represents the true label set of th input sample , and ( ) refers to the predicted label set of th input sample .A higher value of this evaluation metric indicates better model performance.

Result
Figure 5 illustrates the accumulated daily accuracy performance of the CLEB framework using ℎ as data streaming input.We can observe that, starting from December 1, 2019, the model's performance undergoes a significant degradation.The cause of this decline is the detection of a data distribution shift.If the model is not updated, its performance steadily deteriorates.
On the other hand, Figure 6 demonstrates that the performance of the CLEB framework experiences a remarkable improvement through the utilization of continual learning methods.As the performance degrades suddenly, the CLEB framework starts recording samples that indicate a distribution shift in the relay buffer.When the relay buffer is full, the adaptation component shown in Figure 2 is triggered to retrain the model using the relay-buffer continual learning strategy.The newly trained model, once retraining is complete, is deployed to perform model inference for new incoming sequence data.Since the updated model has learned new data distribution patterns, it quickly enhances accuracy performance.

DICUSSION
The outcome of experiments showed that the performance of the relay-based continual learning strategy depends on the size of the buffers.If the size of the buffer is larger, the CLEB requires a longer time to reflect changes in data distribution, as the retrain process is only invoked when the buffer is full, and vice versa.Therefore, the optimal buffer size is also considered a factor that affects the efficiency of the proposed method.This generates a need for modeling the buffer size adaptively, allowing the buffer size to adjust itself during the continual learning process by observing the circumstances.
Due to the absence of preliminary experiments on a dataset related to continual learning-based energy bidding, a direct comparison of our experiment results with those of other studies is not feasible.Consequently, the outcomes of our experiments and the performance metrics for energy bidding are exclusively based on the utilization of a synthetic dataset and algorithms tailored specifically for this research.
Several challenges had to be overcome to implement this study.One of the main limitations is labelling data stored in the relay buffer due to limited information on bidding constraints, such as regulations, policies (i.e., renewable energy quotas), or capacity constraints (i.e., the maximum amount of energy that can be bid or generated within a specific time frame) in the Spanish market.Since the quality of the supervised-based learning model depends on the quality of labels, the results of the experiment may not fully reflect the applicability of continual learning to bidding decisions in case a data distribution shift occurs.Additionally, the artificial dataset generated to demonstrate the concept of continually learning algorithms in this study is not generalizable in practice, which might necessitate starting our experiments from scratch for different energy bidding datasets.

CONCLUSION
We propose the CLEB framework, which is designed to enhance the performance of the energy bidding decision model under nonstationary data streams using the relay buffer continual learning strategy.The CLEB framework monitors the data stream to detect changes in probability distribution using the adaptive sliding window method, namely ADWIN.All samples that cause a distribution shift are stored in the finite relay buffer.When this buffer is full, the adaptation component is invoked to update the model using the data stored in the relay buffer.The experimental results illustrate that the CLEB framework enables the prediction model to adapt itself to uncertain conditions in the energy bidding application.
However, the framework can be further improved in our future research.For instance, data distribution shifts in practice may exhibit various types, and it is necessary to propose a new approach for effectively detecting distribution shifts with low detection delay and high precision.The labeling technique is also of importance to explore further since it impacts the performance of model retraining.Additionally, rigorous experiments are required to evaluate the performance of the CLEB framework in terms of forgetting ratio.Moreover, conducting more ablation experiments is necessary to simulate and identify the impact of thresholds on overall performance.

Figure 1 :
Figure 1: The energy bidding application scenario.
The proposed CLEB framework consists of three fourth components: (i) Model, (ii) Input Distribution Shift Detection, (iii) Output Distribution Shift Detection, and (iv) Adaptation, as shown in Figure 2. In this context, the Model component incorporates an MLP neural network to predict bidding decisions.For example, it determines which bidding transactions from industrial consumers and offer transactions from utility suppliers should be accepted or rejected based on the incoming trading transaction data stream.The Input Distribution Shift Detection component is utilized to identify changes in the probability distribution of the input data, ( ).Similarly, the Output Distribution Shift component is designed to detect changes in bidding prediction values, ( | ).

Figure 3 :
Figure 3: An example output of ADWIN method.

Figure 4 :
Figure 4: The fourth subfigures illustrate the data distribution shift generated to Price, Quantity, Forecasted Load, and Total Generation attributes, respectively.

Figure 5 :
Figure 5: The performance of the CLEB framework when a distribution shift occurs, without taking the concept of CL into account.

Figure 6 :
Figure 6: The performance of the CLEB framework when a distribution shift occurs, considering the concept of CL.

Table 1 :
Summary table of the considered datasets