A Multi-Objective Evolutionary Approach to Discover Explainability Tradeoffs when Using Linear Regression to Effectively Model the Dynamic Thermal Behaviour of Electrical Machines

Modelling and controlling heat transfer in rotating electrical machines is very important as it enables the design of assemblies (e.g., motors) that are efficient and durable under multiple operational scenarios. To address the challenge of deriving accurate data-driven estimators of key motor temperatures, we propose a multi-objective strategy for creating Linear Regression (LR) models that integrate optimised synthetic features. The main strength of our approach is that it provides decision makers with a clear overview of the optimal tradeoffs between data collection costs, the expected modelling errors and the overall explainability of the generated thermal models. Moreover, as parsimonious models are required for both microcontroller deployment and domain expert interpretation, our modelling strategy contains a simple but effective step-wise regularisation technique that can be applied to outline domain-relevant mappings between LR variables and thermal profiling capabilities. Results indicate that our approach can generate accurate LR-based dynamic thermal models when training on data associated with a limited set of load points within the safe operating area of the electrical machine under study.


INTRODUCTION
As the use of data-driven decision-making systems is becoming commonplace today, users are increasingly demanding some form of understanding on how these systems make decisions.This can be particularly important when the goal is to obtain novel scientific insights from observational or simulated data [Roscher et al. 2020].Roscher et al. [2020] also propose three highly relevant core characteristics that facilitate human understanding and trust of machine learning (ML) models: transparency, interpretability, and explainability.While primarily derived from applications that employ fairly complex ML and deep learning techniques to gain scientific knowledge in the natural sciences, these three core characteristics offer a valuable framework for studying explainable artificial intelligence (XAI) systems in general as they provide both a welcomed distinction between often intertwined concepts and a way of understanding interactions between these concepts.In the case of ML, Roscher et al. [2020] posits that: • Transparency concerns the different ingredients of a model: structure, individual components, learning algorithm, and how a specific solution is obtained by the algorithm.This aligns closely with the views in Lipton [2018].• Interpretability refers to the ability to "make sense" of a model (and its results) by presenting some of its properties in a way that is understandable to humans.In contrast to transparency, data is always involved when ascertaining interpretability.• Explainability is fairly subjective, often context-dependent, but could be reasoned about using the prior definition from Montavon et al. [2018]: "An explanation is a collection of features of the interpretable domain, that have contributed for a given example to produce a decision".
Based on this taxonomy, it is very easy to understand why linear (regression) models are seen as defining the upper (asymptotic) threshold of explainability for ML: their weight values can directly identify attributes that are relevant for prediction making as well as their relative importance.For this reason, linear models have been used to construct understandable proxies of more complex ML approaches like within the (Local Interpretable Model-Agnostic Explanations) LIME approach [Ribeiro et al. 2016], where linearity is used to characterise the local neighbourhood of a datum.Given that the good explainability of linear models is often contrasted by their poor performance across numerous modelling scenarios, the main XAI research focus naturally falls on improving the explainability of complex high-performance approaches (e.g, deep neural networks).
Motivated by the characteristics of our real-life application domain, in this study we propose a slightly counter-intuitive approach to developing effective and explainable data-driven models.In essence, we first use synthetic features to augment the modelling power of linear regression models in order to increase their performance on a well-known non-linear task (dynamic thermal modelling).Given that by adding a large set of synthetic features to the interpretable domain, we are likely to impact the explainability of the resulting thermal models, the second step of our approach is to apply an iterative model reduction (i.e., regularisation) strategy to reduce the size of the best performing LR models (and thus mitigate the aforementioned explainability impact).More importantly, the entire thermal modelling process is governed by a multi-objective optimisation approach that aims to provide decision makers with an overview of the optimal tradeoffs between data collection costs, expected modelling errors, and model explainability.The high-level overview of the key components of our approach alongside their interactions is provided in Figure 1.
In order to maximise trust in the generated data-driven thermal models, we have also sought to maximise the transparency and explainability of the proposed multi-objective approach itself by (i) working with electrical engineers to integrate domain knowledge in the data-driven modelling problem formulation right from the start and (ii) opting for a step-wise formalisation of the 3:3 Fig. 1.High-level overview of our data-driven strategy to construct explainable dynamic thermal models.
final multi-objective modelling task that aims to build confidence by incrementally validating key modelling assumptions.
The rest of this article is structured as follows: Section 2 provides a background to thermal modelling for electrical machines and describes the modelling scenario and the requirements that motivate the present work.In Section 3, we describe our multi-objective thermal modelling approach, including data preparation and experimental setup.Section 4 demonstrates the results and provides their interpretation, and finally, Section 5 contains conclusions and an outlook on future work.

BACKGROUND TO THERMAL MODELLING OF ELECTRICAL MACHINES
Our industrial case concerns the heat that is produced by electrical machines during their operation.When an electrical machine, e.g. a motor, is running, heat is produced as a result of friction when electrical energy is being converted to mechanical energy.Electrical engineers consider this heat as problematic because, first, it represents losses in efficiency, which may reach up to 25% [Boglietti et al. 2009]; and second, it gradually reduces the lifespan of the electrical machine, and in a worst-case scenario can damage it [Choudhary et al. 2018].
Our case study considers a 3-phase brushless outer rotor permanent magnet synchronous motor, commonly used in low-cost fans.The motor has six key component temperatures that are of interest when wishing to monitor and manage heat (see Figure 2).Domain experts have categorised the components as, high (denoted H), medium (M), and low (L) priority depending on the importance of monitoring their temperature within the general thermal context of the assembly.The high-priority temperatures are for the winding, T w , and the static ring of the inner ball bearing, T bi .The temperatures of the mounting flange, T f , and the rotor, T r are considered of medium priority, whereas the outer ring of the outer ball bearing, T bo , and the steel stator yoke, T s , are of low priority.
Domain experts have also identified five input variables that are highly relevant for thermal modelling.Depending on the ease and cost of collecting (real-time) sensor data during regular operation, these inputs can be categorised into three groups as follows: A: data always available; R: data rarely available; and N: data never available.The inputs are: rotor speed, v, (A); electric current, I , (A); torque, τ , (R); ambient temperature, T amb (R); and electric power input, P (N).A summary of the modelling requirements is provided in Figure 3.
Traditionally, engineers would turn to the lumped parameter thermal network (LPTN) analytical technique to model heat in electrical machines, especially when it comes to accurately modelling the transient thermal processes in the assembly [Boglietti et al. 2009].However, using LPTN for this kind of motor is known to be challenging [Wöckinger et al. 2020].While studies like Kirchgässner et al. [2019] and Zăvoianu et al. [2020] have demonstrated the potential of data-driven thermal modelling, researchers also caution that due to the fact that most data-driven models are black-boxes in nature, it is not possible for electrical engineers to obtain particular machine-specific information and thus gain insights from them [Wöckinger et al. 2020].Therefore, explainability is a key requirement for this data-driven modelling scenario.Further compounding the complexity of the modeling task, data availability restrictions are usually associated with the low-cost applications of these types of motors.
To summarise, our aim is to construct explainable data-driven thermal models that can be used to accurately characterise the real-time dynamic thermal behaviour of electrical machines under different operational scenarios.Using a limited set of data regarding only speed (v) and current (I ), the developed thermal models must be able to predict the temperatures of the six above-mentioned output motor components.Furthermore, the models are required to have a simple architecture and be resource efficient in order to facilitate deployment on a microcontroller.For the models to be effective, they should have an average temperature estimation error of less than +/−2°C when the motor is used within its safe operating area (SOA).

PROPOSED APPROACH TO EXPLAINABLE THERMAL MODELLING 3.1 Data Preprocessing
To enable the creation of models that are applicable under different operational scenarios, domain experts have provided 20 datasets, each containing time series data of temperature profiles that correspond to common usage patterns (load points) of the motor under study.Each dataset contains the two inputs/features (v, I ) and six outputs/targets (T w , T bi , T f , T r , T bo , T s ) measured simultaneously at an interval of 2 seconds.Sample sizes for the 20 datasets (marked DS 01 . . .DS 20 ) range from 571 to 16,201.In total, the 20 datasets contain 240,200 samples (i.e., ≈ 133.5 hours worth of testing data).Details of the setup of the test bench, the sensors and the cameras used to collect data are described in Wöckinger et al. [2020] and Wöckinger et al. [2021].The data itself can be accessed at: https://github.com/czavoianu/TELO_2023.We used the two provided inputs, speed (v) and current (I ), to create two sets of synthetic features as follows: • Based on expert knowledge of electrical machines, torque (τ ) is directly proportional to current (I ) [Nash 1997] and the total power losses are directly proportional to speed (v) and I [Chalmers and Spooner 1999].Thus, from a physical point of view, input variables based on several multiplicative combinations of v and I are considered suitable for thermal modelling.We thus created four expert-suggested additional features: v 2 , v 3 , I 2 , and v • I .The inclusion of these features is the main channel of incorporating expert knowledge in our modelling approach and arguably improves overall explainability by expanding the interpretable domain of our thermal models in a way that is directly aligned with user knowledge and expectations.• We applied the Exponentially Weighted Moving Averages (EMAs) [Holt 2004] to all the 6 features (2 original + 4 expert-suggested) based on v and I in an effort to smooth random fluctuations in the time series data and complement data samples with information regarding trends.All EMA features were calculated using the formula in Equation ( 1): where, α is the weight, t is the current period, and r t is the value of the time series r in the current period.A key aspect when using EMA is to decide how much weight to give to older observations.We initially used weights of 0.001, 0.005, and 0.04 to capture long-, medium-, and short-term trends in the data.A further 18 synthetic inputs were thus created using EMAs, and in total, each of the 20 datasets contained 24 features.
It is important to note that the usage of synthetic EMA features is both a necessity for capturing temporal aspects and a common practice for time series modelling in other fields (e.g., financial and economic modelling).However, the particular number and choice of EMA weights was subjective and largely informed by the authors' modelling experience.As such, this can be seen as negatively impacting the (design) transparency of our thermal models.

Preliminary Modelling Insights
To determine the effectiveness of the provided datasets in modelling the target temperatures, we carried out preliminary modelling of the high priority temperatures (T w and T bi ).We combined all the 20 datasets into a single dataset, shuffled it, and, using a simple train-test split, randomly partitioned it into a training set containing 90% of the samples and a test set with the remaining 10% samples.Then, we trained four learning algorithms, Linear Regression (LR) [Kutner et al. 2005], Random Forest (RF) [Breiman 2001], K-Nearest Neighbour (KNN) [Cover and Hart 1967] and a shallow Artificial Neural Network (ANN) [Haykin 1999].We identified the best parameters for RF (i.e.number of features and maximum depth) and KNN (leaf size and the number of neighbours) using GridSearchCV with 10-fold cross validation available in Scikit-learn [Pedregosa et al. 2011].
For ANN, we used the RandomizedSearchVC with 3-fold cross validation (also available in Scikitlearn) to identify the best configuration for hidden layer sizes, activation, alpha and learning rate.
Results from the preliminary modelling are presented in Table 1 and are largely consistent with previous findings in the sense that non-linear techniques are more accurate in predicting the target temperatures when compared to LR [Zăvoianu et al. 2020].However, Linear Regression is able to produce competitive models with a Mean Squared Error (MSE) and a Mean Absolute Error (MAE) on test data well below the +/−2°C threshold imposed by domain experts for the considered application scenarios.Furthermore, linear models are strongly preferred by domain experts because they are explainable and can be directly deployed on low-cost microcontrollers with ease.

Modelling Task as Multi-Objective Optimisation Problems
It is important to highlight that while the results above show that LR is a suitable technique for our data-driven dynamic thermal modelling scenario, collecting the 20 datasets (i.e., temperature profiles based on likely operational scenarios) was a very time-consuming exercise that also required specialised expertise.As such, there is a primary modelling imperative to discover if (and under which conditions) a more limited data collection stage can yield equally good LR models as this would significantly reduce modelling costs (especially when aiming to analyze more motor designs).Given an expected positive correlation between data availability and model accuracy, we opted to explore the aforementioned data collection inquiry through a set of three multiobjective optimisation problems (MOOPs), each designed to provide a holistic answer to a modelling question grounded on the efficient usage of the 20 datasets (DS 01 . . .DS 20 ) in a manner that is likely to generate explainable thermal models: Q1: Which combination of datasets should be used to train an LR thermal model that is able to accurately estimate a given target temperature across all operational scenarios?-Given the cost and complexity of collecting data, it would be important to know which load points are likely to help characterise the thermal behaviour of a particular motor component and the accuracy tradeoffs related to their usage during modelling.
Q2: What EMA weights used for creating synthetic (input) features can improve the accuracy of LR thermal models for each of the six target temperatures in the context of reduced training sample availability?-Instead of limiting synthetic feature generation to the three weights that capture short-, medium-and long-term trends as described in Section 3.1, the idea is to attempt to improve LR accuracy by extending the Q1 modelling problem to include the identification of the best weights or combination of weights from a predefined range.Therefore, we generated 10 additional EMA weights by using the formula α i = 0.001 • 2 i , i ∈ {0, 1, 2, . . ., 9} to capture a wider range of trends in the data.We then used the 10 weights to create 60 synthetic EMA features, one for each of the 6 features based on speed (v) and current (I ).
After replacing the 18 original EMA synthetic features with the 60 new ones, each of the 20 datasets we used for answering this question had a total of 66 input features.In terms of XAI characteristics, the optimisation of EMA weights can be seen as an attempt to mitigate the loss of LR (design) transparency induced by the initial arbitrary fixing of EMA settings.Q3: Which combination of datasets and EMA weights should be used when wishing to train accurate LR thermal models for all six target temperatures?-Besides discovering the modelling tradeoffs for a particular target temperature (i.e., answering Q2), it would also be very useful to investigate how optimal combination of datasets and EMA weights can be used to best model all six target temperatures via LR.
Formally, all three data modelling MOOPs that we aim to solve can be defined as: where x is a n-dimensional vector of real-valued variables-i.e., As illustrated in Figure 4, in order to enable x to easily encode the training-test data split across our 20 datasets, we have formulated the three MOOPs as a typical 0,1 Knapsack problem, codified with real values [Russell and Norvig 2010].
In the case of MOOP1 -the problem designed to answer Q1, a candidate solution is a vector of 20 real-values between 0 and 1 (i.e., x ∈ [0, 1) 20 ) with the interpretation that each variable x i represents its associated dataset DS i .If x i ≥ 0.5, then DS i is selected and added to the training set of the modelling experiment.On the other hand, if x i < 0.5, DS i is added to the test set of the modelling experiment.In order to evaluate F (x), a counting of the total number of samples in the training set is performed (i.e., f 1 ) and an LR model is first trained on the training set and then tested on the test set to inform f 2 .It is noteworthy that, since we are interested in the independent modelling of 6 different component temperatures, we are considering six instances of this problem: MOOP1−T w , MOOP1−T bi , MOOP1−T f , and so on.
MOOP2 was formulated by adding 10 more variables to the decision vector used in MOOP1.Each new variable represents a predefined EMA weight.If a given weight is to be used (i.e., x i ≥ 0.5, 21 ≤ i ≤ 30), all the associated synthetic features (i.e., all 6 EMA features created with α i−21 ) are used for training and testing the LR model that informs the accuracy of f 2 .In other words, the usage of each EMA weight will add 6 independent variables to the resulting LR model.
MOOP3 is a variant of MOOP2 that features a minimax optimisation approach.For each candidate solution x we trained and tested independent LR models for all six target temperatures, recording model test errors individually.We then defined f 2 (x) as the maximum test error observed across the six LR models, meaning that the accuracy objective of this MOOP aims to minimise the largest error across all component temperatures of interest.

Experimental Setup
Given the characteristics of our MOOPs (i.e., two objectives, unknown PF t , medium number of decision variables), we opted to use the NSGA-II [Deb et al. 2002] solver, the second version of the Nondominated Sorting Genetic Algorithm (NSGA-II) is one of most widely used multiobjective evolutionary algorithms (MOEAs) and is known to be robust across different types of real-life and benchmark MOOPs.This means that NSGA-II is generally able to discover Paretooptimal (PN ) sets that very accurately approximate the true Pareto Front (PF t ) of the problem-i.e,the objective-space projection of all the optimal tradeoff solutions of the MOOP.
We applied NSGA-II with its standard genetic operators-i.e., Simulated Binary Crossover (SBX) [Deb et al. 1995] and polynomial mutation [Deb et al. 1996]-and we used the literature recommended settings for these operators: crossover probability rate of 0.8, crossover distribution index of 20, mutation probability of 1/n and a mutation distribution index of 20.Across all optimisation runs, we set both the population and offspring size to 200 and used a computational budget of 50,000 fitness evaluations, thereby evolving 250 generations.Given the stochastic nature of MOEAs, we initially carried out five independent repeats of each optimisation run.The limited number of runs is motivated by the fact that, even after parallelising the fitness evaluations, a typical optimisation would take 10-15 hours on a high-end PC.In the case of MOOP2, each modelling experiment was repeated 30 times in order to enable statistical significance testing of the importance of optimising the EMA weights.
Our numerical experiments integrated algorithm implementations from jMetalPy-a Pythonbased framework for multi-objective optimization with metaheuristics [Benítez-Hidalgo et al. 2019]-and Scikit-learn-a library for machine learning in Python [Pedregosa et al. 2011].

MOOP1: Optimising Data Requirements for Thermal Modelling
Figure 5 shows typical optimisation results for MOOP1.The top subplots show the training set size vs accuracy tradeoff for LR models of T w when using the MAE (left) and the MSE (right) on the test set as model quality indicators.Similarly, the bottom subplots from Figure 5 indicate the sought modelling tradeoffs for T bi -the other high priority component temperature.Across all subplots, we marked with black squares the Pareto-optimal solutions identified by NSGA-II (i.e., the objective space projection of the PN obtained at the end of the run).The x-axis is trimmed at 2 in light of the +/−2°C modelling accuracy constraint imposed by our thermal modelling scenario.Across both high-priority temperatures, test errors decrease with increasing training set size.However this decrease is very gradual and somewhat limited as models trained with fewer than 50,000 samples have MAE values smaller than 1°C and MSE values smaller than 1.5°C, while models trained with more than 200,000 samples have MAE and MSE values smaller than 0.5°C.On the one hand, this behaviour is expected because when there is a very limited set of samples to learn from, the LR model lacks the ability to properly model all the underlying patterns when presented with unseen temperature profiles.On the other hand, the fact that even models trained on less than 10% of the available data satisfy the accuracy constraint (i.e., generalise well) validates that LR is effective for modelling the dynamic thermal behaviour of the studied electrical machine.Thus, while not directly linked to explainability, the holistic view provided by the MOOP1 formulation and its associated results from Figure 5 reinforce user trust in the choice of regression model.We mention that these experiments were conducted for the four medium and low priority target temperatures as well and the results follow a very similar pattern.
Generally, MOOP1 modelling results show that an LR model trained on a subset of the original 20 datasets can be used to accurately predict target temperatures across different operational scenarios.For example, the Pareto optimal solution pointed with an arrow on the bottom left subplot from Figure 5 represents an LR model trained only using datasets DS 03 and DS 14 (i.e., ≈ 9.5% of all available data) that yielded a test MAE of 0.8610 on the other 18 datasets.This particular LR model is given in Equation (3) and, in light of its simplicity and accuracy, is a very interesting contender for installation on a microcontroller to estimate T bi (the temperature of the inner ball bearing) when only provided with data regarding v (the rotor speed) and I (the electric current).
We proceeded to compare the performance of LR models for T w and T bi trained only using DS 03 and DS 14 with the non-linear alternatives considered in Section 3.2.To make this comparison, we first applied the previously outlined strategies for identifying the best parameters for each non-linear modelling technique when considering only the 22,862 samples from the two training datasets.We then trained the non-linear models using all the 22,862 samples and finally tested  them on the the remaining 217,338 samples from the other 18 datasets.The results are shown in Table 2 and indicate that, when compared with the preliminary results from Table 1, the MAE and MSE performance degradation is an order of magnitude higher for the non-linear approaches.This can be interpreted as further evidence towards the robustness and overall suitability of LR models for our considered modelling tasks, especially when aiming to reduce data collection requirements.

MOOP2: Optimising the EMA Weights Used for Synthetic Feature Generation
In MOOP2, we included EMA weights into the optimisation and the obtained results follow a similar pattern as those obtained for MOOP1.In the two subplots from Figure 6, we illustrate all the Pareto-optimal solutions discovered by NSGA-II for MOOP1 and MOOP2 across the five initial independent runs when modelling T w .Graphically, it is clear that test errors decrease when both training set composition and EMA weights are optimised and as a result the Pareto fronts associated with MOOP2 are shifted to the left.In order to further investigate this empirical observation, we carried out 25 more independent optimisation runs and proceeded to quantitatively measure the quality of the obtained Pareto fronts.Several specialised indicators are commonly used for this task: the generational distance [Van Veldhuizen and Lamont 1998], the inverse generational distance [Coello et al. 2007], the epsilon indicator [Zitzler et al. 2003] and the hypervolume indicator [Zitzler and Thiele 1998].
We chose to use the hypervolume indicator (Hv) as our unary PF quality measure because it is widely accepted in the MOEA community, has a theoretical proof of a monotonic convergence behaviour, and can be easily used on problems with an unknown PF t .This is because Hv(PF c ) measures the size of the objective space that PF c dominates when considering an anti-optimal reference point [Zitzler and Thiele 1998].Based on this, larger Hv values are preferred, but in order to make the numerical values more meaningful, computing the relative hypervolume as Hr (PF c ) = Hv(P F c ) Hv(P F t ) is advisable.In our case, as PF t is unknown, we have decided to assume it only contains the ideal point (0,0) that would denote an LR model that requires 0 training data and yields 0 errors.Conversely, the anti-optimal reference point was set at (5, 240200), denoting a hypothetical LR model that is trained using 100% of the data but falls well out of acceptable accuracy thresholds.
Across 30 independent runs aimed at modelling T w , we obtained: • an average Hr of 78.00% and a median Hr of 77.96% in the case of MOOP1 (i.e., when only optimising the temperature profiles used for training); • an average Hr of 81.59% and a median Hr of 81.62% in the case of MOOP2 (i.e., when optimising both profiles and EMA-based synthetic features).
This general improvement of modelling outcomes suggested by the difference in Hr central tendency indicators between MOOP1 and MOOP2 was confirmed as statistically significant by a one-sided Mann-Whitney U test [Mann and Whitney 1947] with a 0.01 significance level (p-value = 1.5099 • 10 −11 ).This means that we can say with 99% confidence that the inclusion of EMA weights in the optimisation improves the data requirement vs. accuracy tradeoffs of our LR thermal models for T w .The impact of this decision on model explainability is discussed at length in Section 4.4.

MOOP3: Simultaneously Optimising All Six Target Temperatures
Figure 7 shows a typical optimisation result for MOOP3 where, given our minmax approach described at the end of Section 3.3, for each evaluated solution, the color and shape (as per the legend) correspond to the target temperature for which the solution's maximum LR test error was obtained.The winding temperature (T w ) error is dominant across all evaluated solutions and among plotted Pareto optimal solutions, the max error value is associated with (i) T w in 49/62 of cases, T r in 10/62 of cases, and T f in 3/62 of cases (for MAE) and with (ii) T w in 53/59 of cases, T r in 5/59 of cases, and T f in 1/59 of cases (for MSE).
Based on this, we can infer that a combination of datasets (i.e., sample temperature profiles) and EMA weights that can lead to an accurate LR model for predicting the winding temperature will equally yield accurate LR models for predicting all six component temperatures under a wide range of operational scenarios.This observation and the high-priority modelling status motivates the T w significance testing focus in Section 4.2.

Balancing Model Accuracy and Explainability
We are aware that an excessive use of EMA synthetic features (in solution to MOOP2 and MOOP3) will increase the complexity of the LR models thus compromising our stated objective of obtaining simple and explainable models that can help electrical engineers gain insights related to the dynamic thermal behavior of the studied electrical machine.For example, in Figure 8, we re-plot all the T w -based Pareto optimal solutions from the MOOP3 run depicted in Figure 7 with a marker size proportional to the size of each LR model of T w .These results indicate that the improved accuracy brought by including EMA-weights in the multi-objective optimisation tends to come at the expense of generating larger (i.e., more complex) models when increasing the amount of training data.This is especially obvious when using MSE as an optimisation goal and is likely due to the fact that the usage of the same quadratic loss function within the MSE and LR formulae enables a larger set of EMA-weights to bring marginal modelling improvements when training on larger sets of temperature profiles.When the loss functions used in the optimisation and model training are well aligned but not identical (i.e., when f 2 (x) is based on MAE), the increase of optimal model size is more subdued.
The fact that complexity increase affects MAE and MSE modelling differently is also evidenced by the plots in Figure 9 that display the comparative performance of the Pareto optimal LR models 3:13  from Figure 8 after a step-wise regularisation procedure that removes 30%, 50%, and 70% of the original regression model coefficients in decreasing order of their importance (i.e., absolute value).
Regularisation results indicate that a reduction of LR model size (complexity) by 50% to 70% affects MSE optimal models more (i.e., they determine larger error increases).The fact that a 30% reduction of model sizes appears to have a negligible effect on estimated accuracy for most optimal models can be explained by our MOOP formulation described in Section 3: when an EMA weighting is selected, six new synthetic features (corresponding to two original + four expert-suggested base features) are created and all six features will feature in the final LR thermal model even if just one feature has a meaningful contribution to improving model accuracy.This approach was a design tradeoff itself as: • we wished to limit the size of our MOOPs.By allowing the multi-objective solver to select individual EMA synthetic features, the sizes of MOOP2 and MOOP3 search space would increase to 70 instead of 20 -likely requiring a more complicated solver + parameterisation selection process alongside extended run-times; • we wanted to aim the modelling exercise towards identifying EMA weights that capture temporal trends that are relevant for more multiple base features as these weights could provide more insights to electrical engineers (thus improving overall explainability).Meaningful EMA weights can be identified by domain experts that analyse relative temperature profiling differences on LR models where a reduction of complexity is more strongly correlated to a corresponding reduction of global and/or local modelling accuracy-e.g., the T w model from Figure 10.
Regarding the relative modelling performance shown in Figure 10, it is noteworthy that features that are in the 50% to 70% range of importance (based on their associated absolute coefficient values in the original Pareto optimal LR model) seem crucial for correctly modelling temperature peaks associated with constant medium and high utilisation scenarios.Conversely, the least important 50% of original model features have an incremental, but overall very limited, impact on general modelling performance.These observations indicate that by further tailoring the regularisation procedure (e.g., making it more fine grained or dependent on the relative loss of global/local accuracy across the 20 analysed scenarios), the explainability of the original model could be enhanced by constructing a more detailed mapping of features or groups of features to particular thermal profiling capabilities.This in turn would give decision makers a clear view of all the modelling tradeoffs associated with a given Pareto optimal thermal model: training costs vs accuracy vs explainability.

CONCLUSIONS AND FUTURE WORK
The present research demonstrates how three 0,1 Knapsack multi-objective formulations of data modelling tasks coupled with the usage of an effective evolutionary solver (i.e., NSGA-II) can be used to outline optimal costs vs accuracy tradeoffs when aiming to discover high-quality Linear Regression (LR) models that can estimate the dynamic thermal behaviour of six electrical motor components under various operational scenarios.Case study results indicate that the ability to generate highly explainable models coupled with the holistic data modelling perspective provided by our multi-objective approach provides electrical engineers with useful data-driven insights regarding the thermal profile of the studied electrical machine.

Fig. 3 .
Fig. 3. Summary of modelling requirements with input and output variables.
the total number of data samples in the training set encoded by x that are used for creating the LR model; • f 2 (x) = the MAE or the MSE obtained by the trained LR model on the test set encoded by x.

Fig. 5 .
Fig. 5. Pareto fronts (black squares) of single NSGA-II optimisation runs on MOOP1 when aiming to model T w (top subplots) and T bi (bottom subplots).

Fig. 6 .
Fig.6.All end-of-the-run Pareto-optimal solutions for MOOP1 (datasets optimisation) and MOOP2 (datasets + EMA weights optimisation) across five independent runs on each problem that aimed to model the winding temperature (T w ).

Fig. 8 .
Fig. 8. Size vs Pareto Front (PF) position of T w -based solutions at the end of a MOOP3 run.

Fig. 10 .
Fig. 10.Performance of a T w model for MOOP3 at different levels of regularisation across several operational scenarios (the grey area denotes part of the samples used during model training).

Table 1 .
Performance Comparison on Test Data for Different Regression Modelling Techniques