An evolutionary triclustering approach to discover electricity consumption patterns in France

Electricity consumption patterns are critical in shaping energy policies and optimizing resource allocation. In pursuing a more sustainable and efficient energy future, uncovering hidden consumption patterns is paramount. This paper introduces an innovative approach, leveraging evolutionary triclustering techniques, to unveil previously undisclosed electricity consumption patterns in France. By harnessing the power of triclustering algorithms, this research provides a comprehensive analysis of electricity usage across various dimensions, shedding light on intricate relationships among variables. Using this novel method, the study reveals concealed patterns and offers insights that can inform decision-makers and stakeholders in the energy sector. The findings contribute to a better understanding of electricity consumption dynamics, aiding in developing more targeted and effective energy management strategies. This research represents a significant step forward in the quest for sustainable energy solutions and underscores the potential of evolutionary triclustering as a valuable tool in uncovering complex consumption patterns.


INTRODUCTION
Electricity consumption data analysis is crucial for understanding and optimizing energy usage, a fundamental component of sustainable development and resource management.Extracting meaningful patterns from vast and complex electricity consumption datasets is a challenging task, but it can provide valuable insights for various applications, including energy forecasting, demand-side management, and anomaly detection.This paper presents a novel approach to extracting behavior patterns from France's electricity consumption data.
Our methodology is built upon triclustering, an extension of clustering and biclustering tailored for three-dimensional (3D) datasets.In the context of electricity consumption data analysis, our 3D dataset consists of layers representing years, instances representing time slots, and attributes representing months.A tricluster is a subset of instances, attributes, and layers that exhibit similar behavior patterns under a given criterion.Unlike clustering, which operates on 2D datasets, and biclustering, which considers instances and attributes, triclustering considers the temporal dimension, allowing us to discover behavior patterns that span time, months, and years.In our case, the layer dimension corresponds to the temporal development of electricity consumption instances across attributes, and the criterion for forming triclusters is based on the similarity of their behavior patterns.
To implement triclustering, we employ the  algorithm [8], an evolutionary metaheuristic inspired by genetic algorithms. takes a 3D dataset as input and aims to find a set of triclusters by optimizing specific fitness functions.The algorithm operates by evolving an initial population of potential tricluster solutions over generations using genetic operators such as selection, crossover, and mutation.The fitness function is a crucial aspect of , which defines how well a tricluster captures a behavior pattern.We employ three fitness functions:  3 [6],  [5], and  [7], each designed to measure the similarity of behavior patterns across instances, attributes, and layers.These fitness functions enable  to adapt to different behavior patterns and capture meaningful insights from the electricity consumption data.
The results of our analysis provide valuable insights into behavior patterns within the dataset covering the years from 2012 to 2020.We executed the  algorithm with various control configurations to discover triclusters.Each run aimed to identify ten triclusters, each representing subsets of continuous time slots, months, and years that exhibit similar electricity consumption behavior patterns.
Our analysis of the triclustering results focused on two key aspects: coverage and quality.We assessed the coverage of the dataset dimensions by the generated triclusters, analyzing their impact on time slots, months, and years.Additionally, we evaluated the quality of individual triclusters using the   [4] index, which combines graphical quality, Pearson's correlation quality, and Spearman's correlation quality.
In the following sections, we present detailed descriptions of our methodology, experimental setup, results, and discussions, providing a comprehensive overview of our approach and its implications for electricity consumption data analysis.

RELATED WORKS
One of the pioneering works in this field introduced an algorithm based on the symmetry property of triclusters [31].A more extended and generalized version of this proposal was presented in another study [12].Later, Liu et al. [11] took an evolutionary computation approach, defining the fitness function as a multi-objective measure.
The work in [30] redefined triclusters to uncover gene regulatory relationships, tested on synthetic and real data.Another method, LagMiner [28], found time-lagged 3D clusters, while [14] used evolutionary computation for multi-objective triclusters on real data.
The work in [23] introduced a strategy for mining 3D clusters in real-valued data, addressing item count limitations, while [10] focused on low variance triclusters in quantitative data.Liu et al. [15] discovered temporal dependency association rules in microarrays, representing gene relationships.In a different application, [27] developed a triclustering algorithm using statistical methods for 3D short gene expression data time series datasets.
In 2014,  was introduced in [8].The authors used an evolutionary approach to discover triclusters in the biological context.However,  has been successfully applied to different domains.Thus, it was applied to earthquake zonification in [17] and [1], to precision agriculture in [21] and [20], to the streaming environment in [18] and [19].
Various triclustering algorithms have been developed, each with its own approach.These approaches include iterative searches, distribution parameter identification, pattern mining, and evolutionary multi-objective optimization [9].Also in 2018, Kakati et al. [13] introduced a robust triclustering technique focused on conditionspecific change analysis in HIV-1 progression data, identifying the 38 most responsible genes for this deadly disease.
Cuckoo search and particle swarm optimization in triclustering temporal gene expression data were compared in [25], showing superiority for the latter in three real-world datasets.Consequently, the same authors proposed a hybrid cuckoo search with clonal selection to discover meaningful triclusters in breast cancer gene expression data [26].They discovered certain genes with a clear influence on such cancer.
Another interesting triclustering-based approach can be found in [3].The authors analyzed gene expression microarray data using a restricted neighborhood search during the triclustering process.Later, they used a grained and dynamic deme-based parallel genetic approach to mine, again, gene expression microarray data [2].
In 2022, the authors in [24] proposed a triclustering method for finding biomarkers in human immunodeficiency virus-1 gene expression data, discovering relevant patterns for different types of patients.
Finally, Mondal et al. [22] introduced a novel data structure suffix forest to design a triclustering algorithm.The approach was applied to the Indian Forest Dataset.The main achievement is discovering triclusters to monitor forest and mangrove cover changes over time.

METHODOLOGY
This section presents triclustering as a methodology to extract behavior patterns from France's electricity consumption data and the  algorithm [8], an evolutionary metaheuristic to apply triclustering in three-dimensional datasets.

Triclustering
This methodology consists of finding triclusters from a threedimensional (or 3) dataset, which means a set of layers composed of instances, each composed of several attributes (or features).Considering the previous 3 dataset statement, a tricluster is defined as a subset of instances, attributes, and layers from a whole collection of a 3 dataset dimension whose values are similar under a given criterion.
Triclustering is typically interpreted as an extension of clustering and biclustering.In a clustering problem, given a common two-dimensional (2) dataset (a collection of instances composed of features), the goal is to find clusters or subsets of instances whose values along all the features show a similar behavior under a given criterion.Extending the clustering technique, biclustering emerges intending to discover biclusters, which are subsets of instances whose values along a subset of features depict a similar behavior under a given criterion.Finally, in the last extension of the former methodology, the environment is changed by adding the third dimension (layers) to the dataset, and the objective is to find triclusters, meaning subsets of instances whose values along a subset of features depict a similar behavior along a subset of layers under a given criterion.
This technique, as demonstrated in Section 2, is highly versatile and, therefore, can be applied to various problems.The definition of instances, attributes, and layers within the 3-dataset, as well as the criteria for grouping triclusters, are the key factors in solving the problem.Typically, in most triclustering problems, the layer dimension of the 3-dataset corresponds to the time points representing the temporal development of its instances across attributes, and the criterion for forming triclusters is that their values exhibit a behavior pattern.
As triclustering is considered an np-complete problem, metaheuristic approximations offer a proper approach to performing the technique; in that sense, the TriGen algorithm, an evolutionary implementation of the triclustering technique, has been used for this research.

The TriGen algorithm
This algorithm, whose initial version was presented in [8], implements a triclustering technique.Given a 3-dataset as input,  finds a set of triclusters by optimizing a specific criterion, as is shown in Eq. 1.

𝑇 𝑟𝑖𝐺𝑒𝑛(𝐷
was implemented following the genetic algorithm paradigm; its general workflow can be found in Figure 1.For each requested solution (tricluster), an evolutionary process is applied based on the evaluation performed by a fitness function, previously named criterion.In other words, the algorithm evolves them over specific generations using genetic operators starting from an initial population of potential solutions (individuals).These operators include: • Initial population: In this phase, the initial individuals of the populations are built.For each new individual in the initial population, a subset of instances, attributes, and layers are randomly selected from the input 3-dataset.
• Selection: This operator implements elitism within the population by advancing the best individuals to the next generation.A portion of the remaining individuals is also promoted to ensure genetic diversity.The algorithm employed for executing this operator is the tournament method; population members are initially divided into three groups and subsequently sorted by their fitness levels.A percentage of the population is chosen from these three ordered groups.These selected individuals are directly advanced to the next generation and will serve as suitable parents for reproduction by applying the crossover operator.
• Mutation: This process involves making minor modifications to the genetic makeup of a randomly selected group of individuals.These alterations introduce variations in the solutions, potentially resulting in improved outcomes for future generations.The individuals produced through the crossover operation are candidates for mutation.An individual can change with a certain probability, such as removing or adding a random instance, attribute, or layer coordinate.
In the context of the  genetic process, an individual represents a tricluster, with its genetic material consisting of a subset of instances, attributes, and layers from the input 3-dataset.Consequently, the described evolutionary process is applied to a population of triclusters and their collections of instances, attributes, and layers (genetic material).
The fitness function is a key aspect of the  algorithm (and genetic algorithms in general).This function, the objective optimized by the algorithm, defines how good a potential solution (individual) is.It corresponds to the tricluster formation criterion mentioned in Section 3.1.
implements the fitness function of a tricluster  defined by a weighted average shown in Eq. 4.
Where  (  ,  ,  ) is the size of the tricluster  regarding its instances, attributes, and layers. ( , ) measures the overlapping of the evaluated tricluster concerning the discovered solutions, .
The main component of the  evaluation procedure is the  ( ), which measures how the tricluster  accomplishes the formation criterion.Finally,   ,   , and   are the weights that establish the influence of each member in the evaluation process;  set these weights, as default, with the values shown in Eq. 5.
= 0.8,   = 0.1,   = 0.1 (5) Focusing on the  ( ) member,  implements three tricluster criteria, all designed to measure the similarity of the behavior patterns depicted by the values of a tricluster across its subset of instances, features, and layers.These fitness functions are: •  3 : This function measures the homogeneity of triclusters across all three dimensions (instances, attributes, and layers) and was introduced in [6], initially for solving biological problems.• : Introduced in [5], this function uses the angle of the least squared line of each pattern line within the tricluster to assess the similarity of all lines, representing the overall pattern.
• : For each line in the pattern depicted by the tricluster, this function computes the angles between each pair of points and compares them to gauge the similarity of the pattern.This function and its computation process can be detailed in [7].
In summary,  takes an input 3D-dataset and aims to find a specific number of triclusters, as defined by the input parameters.It does so by optimizing a particular fitness function (described above) through an evolutionary process for each requested tricluster.

RESULTS
Several experiments were conducted to discover behavior patterns (triclusters) in France's electricity consumption data using the  algorithm.The following sections the materials used for running the experiments and the analysis of the discovered patterns.

Experimental Setup
The input for the experiments consists of France's electricity consumption data, which was obtained from the website of France's transmission system operator (www.rte-france.com/).These data were downloaded in separate files, with one file per year, covering the years from 2012 to 2020 (inclusive).Each file contains the consumption data (in TWh) distributed in a semi-hourly (30-minute frequency) time series spanning from the first to the last day of the respective year.In other words, each file (corresponding to a year) contains a time series ranging from January 1st at 00:00h to December 31st at 23:30h.

3𝐷-Dataset Definition.
The data has a 3-array structure, resembling a cube, where the rows (instances) represent the time slots in a semi-hourly time series for a month.Specifically, it covers the time slots from the 1st day at 00:00h, progressing through each half-hour interval, until the 31st day at 23:30h. Figure 2 shows how the columns (attributes) represent the months of the year, and the layers correspond to the years, spanning from 2012 to 2020.Therefore, a cell located in the 3-dataset at coordinates [, , ] contains the electricity consumption value for France for the year , in the month , at the time slot  (a specific day of the month , at a specific time).4.1.2 Execution.The  algorithm was executed multiple times to identify behavior patterns within the 3-dataset.The algorithm was executed with various control configurations for the evolutionary process, and each run requested ten triclusters.In this research, the obtained triclusters represent subsets of continuous time slots (rows/instances), subsets of months (columns/attributes), and subsets of years (layers) in such a way that their electricity consumption values reveal behavior patterns.

Tricluster Evaluation.
The results of each  experiment, which consists of a collection of triclusters, are evaluated from two perspectives: overall coverage of the input 3-dataset dimensions by the generated triclusters and the individual quality of the triclusters.
Concerning the first aspect, coverage graphs for the dimensions of the 3-dataset (time slot, month, and year) are generated.These graphs illustrate which coordinates are selected for each obtained tricluster, providing insight into the influence of each tricluster on each dimension.
The second aspect involves assessing the individual quality of the triclusters.This uses the   index, a tricluster quality metric introduced in [4].  is defined in Eq 6 as a weighted average of three indexes, these are: • : It evaluates the visual quality of the tricluster, which quantitatively represents its qualitative aspect-namely, how consistent and similar its members are.This approach is commonly employed in research to visually validate results by graphically illustrating the triclusters across their three components: instances, attributes, and layers [16,29,31].•  and : These metrics assess the typical correlation among the tricluster's values and are commonly referenced in the literature [12].Correlation quantifies the interdependence among the instances, attributes, and layers within the triclusters.Specifically,  relies on Pearson's correlation coefficient, while  is grounded in Spearman's correlation coefficient.
These three indexes are combined into a final value,  , by applying the weights described in Eq. 7.   ranges from lower (0.0) to higher (1.0) to indicate the quality of each tricluster.
Additionally, the behavior patterns of the triclusters are visually represented in two views: the month view, where the consumption curves for specific periods (time slots on the x-axis) of each selected month are depicted, separated by the selected years, and the year view, where the consumption curves for specific periods (time slots on the x-axis) of each selected year are depicted, separated by the selected month.

Discussion
In this section, we discuss the effectiveness of applying the triclustering methodology through the  algorithm to extract consumption behavior patterns by examining the results of two experiments: experiment A and experiment B.

Coverage Analysis.
In terms of the time slot dimension (rows), Figure 3a illustrates how the triclusters in experiment A cover short periods while leaving long periods without coverage.This behavior occurs because  prioritizes searching in short periods, as guided by its control configuration.This pattern contrasts with the month coverage, as shown in Figure 4a, where the triclusters cover all the months except January.
On the other hand, Experiment B exhibits complete coverage of time slots, as demonstrated in Figure 3b.In this case, the triclusters are larger in terms of time slots, indicating that  diversifies its search along this dimension to identify longer periods.However, Figure 4b illustrates that the month coverage is lower than Experiment A.
Both experiments complement each other; Experiment A offers good coverage of months, while Experiment B excels in covering time slots.Consequently, their respective triclusters provide valuable insights into the dimensions being analyzed.
Regarding the year dimension, Figure 5 reveals that both experiments achieve complete coverage across all layers.

Quality analysis.
The   values for Experiment A's triclusters are presented in Table 1.The average   is 0.90 with a standard deviation of 0.016, indicating a satisfactory experiment in terms of quality.The best solution, tricluster #4, achieves a   of 0.93, demonstrating excellent graphical quality ( at 0.96) and a high level of correlation among its values, as confirmed by its  and  values.
Regarding Experiment B, its   values, as shown in Table 2, have a mean of 0.866 with a standard deviation of 0.014, making it Tricluster #4 from Experiment A exhibits a behavior pattern spanning from the 15th at 5:00h to the 16th at 5:30h.In the month view, as depicted in Figure 6a, the consumption curves for the tricluster months, specifically July and October, are examined across each tricluster year (2015, 2017, and 2018).July and October both exhibit a consistent pattern throughout all years.From the 15th at 5:00h to the 15th at 12:30h, consumption displays an upward trend, followed by a gradual descent with smooth peaks until the 15th at 19:30h.Subsequently, another peak emerges, with a subsequent peak at 22:30h on the same day.Finally, consumption declines until the 16th at 5:30h.This behavior maintains consistency between July and October, albeit with varying scales, as October's values are higher than those of July.
In the year view, shown in Figure 6b, the consumption curves for the tricluster years (2015, 2017, and 2018) are depicted for each tricluster month (July and October).The behavior patterns in the year view align with those observed in the month view.This emphasizes the similarity in behavior between the yearly series for each month and highlights the differences in scale while maintaining the same behavior across the years.Regarding Experiment B, the month view of Tricluster #6's behavior patterns can be found in Figure 7a.Here, the consumption curves for the tricluster months, specifically January and November, are depicted across each tricluster year (2013, 2018, and 2020).The consumption curves span from the 11th at 1:30h to the 21st at 11:30h and depict a pattern characterized by prominent peaks and descents.Notably, this tricluster covers 501 time slots, significantly larger than the 50 grouped time slots of the previous tricluster.The observed peaks and descents repeat approximately every 48 time slots in both series (January and November) and maintain a similar profile for each year.As in the previous tricluster, there is a notable change in scale, with January exhibiting higher values.Moreover, similar consumption values are evident in the central period of 2020 and at the beginning and end of 2013.
The year view, represented in Figure 7b, confirms the recurring peaks and descents across all three-year lines for both months.This view highlights a significant scale change from the 17th at 8:30h to the 19th at 13:00h between 2018 and the other two years for January.A similar behavior is also observed for November, from the 13th at 21:00h to the 16th at 17:00h between 2018 and the other two years.

CONCLUSIONS AND FUTURE WORKS
In this study, we have introduced an innovative approach using evolutionary triclustering techniques to analyze electricity consumption patterns in France, and we have shed light on how and why consumption patterns change over time.Our findings offer valuable insights into the underlying dynamics of electricity consumption and have significant implications for energy policy formulation and efficient resource allocation.However, it is important to acknowledge certain limitations in our study.Firstly, the availability and quality of electricity consumption data can vary, which may affect the precision of our results.Additionally, while we have identified significant patterns, we have not delved deeply into the underlying causes of these patterns.For this reason, in future work, we will investigate the underlying causes of the identified electricity consumption patterns (economic and/or climatic) by incorporating exogenous variables.Moreover, the algorithm efficiency will be improved to tackle even larger and more complex datasets.

Figure 3 :
Figure 3: Triclustering coverage in the time slot's dimension.
• Crossover: This process merges the genetic material of two individuals to create two new ones, potentially generating improved solutions for the next generation.Two individuals are randomly selected for reproduction from those chosen by the selection operator.From these two parent individuals, denoted as  1 and  2 , two offspring,  1 , and  2 , are derived as illustrated in Eq. 2 and Eq. 3. The first new individual,  1 , is formed by combining the instance,  , coordinates from  1 , the attribute coordinates from  2 , and the layer coordinates from  1 .Conversely, the second individual,  2 , comprises the instance coordinates from  2 , the attribute coordinates from  1 , and the layer coordinates from  2 .

Table 1 :
TRIQ values of Experiment A

Table 2 :
TRIQ values of Experiment B