Disaggregation of Heat Pump Load Profiles From Low-Resolution Smart Meter Data

As the number of heat pumps installed in residential buildings increases, their energy-efficient operation becomes increasingly important to reduce costs and ensure the stability of the power grid. The deployment of smart electricity meters results in large amounts of smart meter data that can be used for heat pump optimization. However, sub-metering infrastructure to monitor heat pumps’ energy consumption is costly and rarely available in practice. Non-intrusive load monitoring addresses this issue and disaggregates appliance-level consumption from aggregate measurements. However, previous studies use high-resolution data of active and reactive power and do not focus on heat pumps. In this context, our study is the first to disaggregate heat pump load profiles using commonly available smart meter data with energy measurements at 15-minute resolution. We use a sliding-window approach to train and test deep learning models on a real-world data set of 363 Swiss households with heat pumps observed over a period of 8 years. Evaluating our approach with a 5-fold cross-validation, our best model achieves a mean R2 score of 0.832 and an average RMSE of 0.169 kWh, which is similar to previous work that uses high-resolution measurements of active and reactive power. Our algorithms enable real-world applications to monitor the energy efficiency of heat pumps in operation and to estimate their flexibility for demand response programs.


INTRODUCTION
Heat pumps (HPs) have become a popular technology in residential buildings.In 2020, about 180 million units were in operation worldwide, potentially rising to 600 million by 2030 [14].When operating efficiently, HPs can significantly reduce energy consumption and greenhouse gas emissions compared to conventional heating systems.Nonetheless, global energy demand for heating buildings has increased by 10% since 2010 [13].This makes counter-measures such as changes in consumer behavior, improvements in building envelopes, and more efficient operation of clean heating technologies highly necessary.These should also address HP operation as many HPs in the field have a lower performance than specified by the manufacturer [6].For example, a study including 297 Swiss households with HPs showed that after optimization half of the households had average annual energy savings of 1,805 kWh (15.2%) per year [25].Common causes are incorrect configurations or sizing, and installation errors [23].Low HP performance can result in high operating costs for HP owners and an increase in total electricity and peak demand, which may require costly upgrades in grid infrastructure [19].On the other hand, HPs have proven to be suitable for demand response (DR) programs [19].Therefore, their operational flexibility can compensate for fluctuations in power generation from solar or wind energy.To enable wide-spread energy efficiency services for HPs, the advanced metering infrastructure can serve as important technology.In 2016, China installed over 350 million smart electricity meters (SMs), while the United States installed 94.8 million in 2019.Also, ten European countries have already achieved deployment rates of at least 80% [6].In the context of HPs, smart meter data (SMD) may be used to predict faults in operation, misconfigurations, inappropriate sizing, and atypical behavior [23].In particular, also HPs can be monitored that are not connected to the internet [5,23].However, monitoring the energy consumption of HPs is particularly challenging due to their dynamic and intermittent nature [26].Many HPs are able to modulate their compressor speed, resulting in varying patterns in SMD [5].On the other hand, sub-metering, which measures HPs separately, is costly and thus rarely implemented [3].For this reason, non-intrusive load monitoring (NILM), also known as load disaggregation, algorithmically decomposes a building's overall electricity usage into the consumption of individual appliances.While existing studies in this area mostly use high-resolution SMD with sampling rates around 1 Hz, there is little work for commonly available low-resolution SMD [27].However, more importantly, there is no study available with a particular focus on HPs.In this paper, we address this research gap by focusing on NILM to disaggregate HP load profiles from aggregated SMD energy measurements at 15-minute resolution. 1

RELATED WORK
Evaluating the energy efficiency of individual appliances requires knowledge of their energy consumption.Therefore, NILM has become a vivid research area, which is discussed in [12,21].Existing work uses data sets with separate measurements of individual appliances.However, only the AMPds data set [18] includes an HP.Additionally, we note that the toolkit NILMTK [2] implements a set of baseline models commonly used for benchmarking.
Temporal Resolution: In general, SMD can have different temporal resolutions.Usually, SMD with frequencies of a few seconds is called high-resolution and provides a high level of detail that allows identifying typical device signatures [20].However, the data available in the field generally has resolutions of a few minutes to hours, which is called low-resolution [4].Despite its importance for practical applications, most work on NILM uses high-resolution data and measurements of active and reactive power and their phase angles, e.g., [22].In this case, each data point is 3-dimensional, which facilitates the separation of resistive and non-resistive loads.In contrast, SMs in practice often provide only energy measurements [27], i.e., 1-dimensional data points that are more difficult to disaggregate.
Categories of NILM Approaches: NILM approaches can be categorized as event-based or event-less.The first focus on detecting switching events (i.e., on-off transients) of appliances, whereas the second focus on activity states [10,11].In general, event-based methods are considered to be rather suitable for SMD with high sampling rates [1].However, deep learning models such as those used in our work can also be categorized as event-based that have been successfully applied to low-resolution SMD before [11,17].
Heat Pump Load Disaggregation: Two studies show that HP installations can be detected with low-resolution SMD [8,24] and [5] predicts the capability of an HP to modulate compressor speed.However, none of the studies extracts the load profiles of the HPs.In this context, we note that HPs are particularly challenging for load disaggregation, especially when only using low-resolution energy measurements.In contrast to other appliances, they show a high variety in patterns which depend on multiple factors, such as HP type, sizing, configurations, building characteristics, and weather conditions [6].In addition, especially variable-speed HP can operate in power ranges that may be overlayed by other devices [5].To this end, we did not find any work with a focus on NILM for HPs and no publicly available data set that contains HPs, measures energy in low resolution, and covers multiple households.Nonetheless, a few studies use the AMPds data set [18] and report disaggregation scores for an HP [7,22,26].This data set contains active and reactive power readings in 1 min resolution of a single house with an HP observed over two years.For example, the authors of [7] use Factorial Hidden Markov Models (FHMMs) and dynamic time warping (DTW) and achieve  2 scores around 0.8 to 0.9 for the HP, whereas [26] presents an approach based on clustering and support vector regression, and reports an  2 score of 0.975 for HP performance.We can summarize that previous work detects HP installations with low-resolution SMD or breaks down HP load profiles with high-resolution SMD of active and reactive power.However, existing NILM studies lack an evaluation on multiple real households and do not use lowresolution energy measurements, although this is the most common case in practice [27].

METHODS
Data Set.We use a real-world data set, which covers 363 households in Switzerland observed from January 2012 to March 2020 (i.e., 8 years).All households are single-family houses that use an HP for heating and are not equipped with a photovoltaic (PV) system.In 231 cases (i.e., 63.46%), we further know through the utility that the HP is additionally used for domestic hot water (DHW) production.For all households, SMD is available at 15-minute resolution, which measures energy consumption in kWh.Each household has two SMs installed, with one SM measuring the HP separately from all other appliances.The average data availability per household is 1,299 days (i.e., 3.5 years).The households have an average yearly electricity consumption of 12,515 kWh, of which on average 7,495 kWh (i.e., 59.8%) is caused by the HP.Additionally, we use temperature data from each household's nearest weather station.
Problem Formulation: We define the HP energy in kWh of a household at timestamp  as  HP  .Similarly, the rest of the household's energy consumption can be expressed as  HH  .Consequently, aggregate measurements    are given by the addition of both measurements.Our models are supposed to approximate  HP  when only    is assumed to be known, leading to a prediction ẼHP  .The error in the prediction   is then given by the difference between the original measurement  HP  and the estimate ẼHP  .Since we isolate the full load profile of each HP, i.e., predict continuous energy values, we treat the problem as a regression task.Note that other NILM studies often only predict if an appliance is on or off, which leads to the use of different metrics, e.g.Matthews Correlation Coefficient.Further, note that we extract the load profiles only for households where an HP is installed.We consider determining whether or not a household has an HP a separate preliminary task as in [8,24].
Sequence-to-Sequence Learning: We slide a window of width  over the SMD with a step size  (encoded by number of observations).Choosing  <  produces an overlap  =  −  between two consecutive windows.We apply this approach to both the aggregate and separate SMD to create input-output pairs for a model to learn (Figure 1).Within each window, we derive the following features.
Energy features: We use the sequence of values from the aggregate SMD and their average to reduce the impact of variations.
Temporal features: HPs perform heating cycles that result in regular periods of activity and rest [6].Because HPs may have programmed schedules, e.g., night setback configurations, cycling may differ over time.Therefore, we assume that temporal information helps a model with contextualization.To this end, we use the time of day, day of the week, and day of the year of the start time of a window.To better reflect periodicity in the mathematical space, we map each one-dimensional feature to a two-dimensional cyclic space with continuous values.The sequential values for the day of the year and the day of the week can be used directly as input to the transformation functions.However, the time of day (formatted as hh:mm) must first be encoded as a value in minutes: time = hour×60+minute.Let  be the values to be transformed and  a scaling factor ( = 24×60 = 1, 440 for time of day,  = 7 for day of the week, and  = 365 for day of the year).Then we can transform the values as follows:  = sin 2 ×   and  = cos 2 ×   .Weather features: The heat demand of a building usually increases as the temperature decreases.Therefore, we use average, minimum, and maximum outdoor temperatures to learn the distinction between temperature-dependent and temperature-independent loads.
Meta feature We use an additional binary feature to encode whether or not the HP is responsible for DHW production because we suspect that the SMD patterns slightly differ.However, we recognize that this type of information is typically unknown in a real-world scenario.Therefore, we do not use this feature in all cases and experimentally evaluate whether it improves performance.
We normalize the features by scaling them to unit variance.We train the normalization exclusively on the training data to prevent bias from test data and then apply the same transformation to the test data features.
Deep learning models have performed well in NILM applications in the past [15,16].As a tradeoff between training time, computational resource requirements, and model complexity, we investigate two different feedforward neural networks (FFNNs) with five hidden layers.We use the highest number of nodes in the middle layer as this allows the model to learn complex relationships in the data, but is less prone to overfitting [9]: Model 1: [50, 100, 200, 100, 50]; Model 2: [200,400,800,400,200].As last layer's activation function, we use the rectified linear unit (ReLu) because the energy consumption can be zero but not negative.We reserve 10% of the training data set for validation, train the models with 200 epochs, and allow early stopping if performance on the validation data set does not improve for more than 3 epochs.We also use a batch size of 2048 and the Adam optimizer (smoothing  = 0.9; learning rate  = 0.001).We use the mean squared error (MSE) as loss function.Using previous definitions and  as the number of energy values to be predicted, we can write it as: The overlap of the sliding windows creates multiple predictions per time point .To obtain a final prediction Ẽ  , we compute the average of all predictions at .We thus reconstruct an SMD time series of the original length from multiple sliding windows and compare it to the original HP measurements.

RESULTS AND DISCUSSION
We evaluate our approach using a 5-fold cross-validation with splits on household level, where four folds (i.e., 80%) are used for training and the rest for testing.Hence, we evaluate performance on households that were not seen during training.This procedure well reflects a real-world application, where a single model is trained on households with separate HP measurements and is then applied to different households with only aggregate measurements.We compare the performance of our approach to baseline models that are provided in the toolkit NILMTK [2].In particular, we use the implementations of a Combinatorial Optimization (CO) and an FHMM because other algorithms (e.g., Hart) require the use of active and reactive power measurements, which are unavailable in our setting. 2To evaluate the performance of each model, we follow the recommendations in NILMTK [2].Therefore, we report the mean absolute error (MAE), median absolute error (MDAE) and root mean square error (RMSE) in kWh relative to a 15-minute measurement interval, same as the  2 score.The work of [17] explains that the window width should be chosen in such a way that it captures the majority of device activations, while too large window sizes can negatively affect performance.In our case, the window size is best chosen to capture some heating cycles from start to finish.As typical HP cycles are in the range of minutes to hours [6], we experiment with different window sizes and overlaps.Additionally, for each of these configurations, we evaluate whether the meta feature affects performance.Table 1 shows the performance scores of the evaluated parameters.Unlike in [17], the selected window size has no effect on performance in our setup.A reason for this could be that the heating cycle duration of each HP depends on the outdoor temperature and that the variance of the different HPs in our data set is large enough.On the other hand, we see that higher window overlaps tend to perform better, which may be explained by the increase in training samples.

CONCLUSION
Heat pumps (HPs) are considered a clean heating and cooling technology.However, in practice, the energy efficiency of HPs can vary widely from building to building.The increasing availability of smart meter data offers new opportunities for HP monitoring and optimizations.An important prerequisite is to isolate the energy consumption of the HP from aggregate measurements.Existing work on load disaggregation rarely uses energy measurements at low resolutions, even though this is the most common use-case in the field, and does not focus on HPs.In our study, we successfully disentangle HP patterns from 15-minute SMD energy data using sequence-to-sequence learning.Using a real-world data set of 363 households in Switzerland (observed over 8 years), our approach achieves similar performance as previous work that uses high-resolution SMD of active and reactive power.

Figure 1 :
Figure 1: Example of the sliding window applied to the aggregate and separate SMD to create input and target sequences.

Table 1 :
[7] best model achieves an  2 score of 0.832 (0.066) and an RMSE of 0.169 (0.030) kWh.In comparison, the best performing NILMTK-model (FHMM) only achieves an  2 score of 0.627 (0.256) and an RMSE of 0.211 (0.116) kWh.This makes our best model better by 32.7% in terms of  2 and better by 19.9% with regard to the RMSE.Additionally, the performance is similar to Related Work that uses high-resolution measurements of active and reactive power, e.g.,[7].For both window sizes, the best model uses the meta feature.I general, its use improves  2 by about 3% and RMSE by about 7%.Limitations and Future Work: Our study utilizes SMD from Swiss single-family homes heated by HPs without PV systems.Future research should assess our algorithms on a broader data set encompassing diverse buildings and geographical conditions.Additionally, testing the robustness against other substantial energy Mean and standard deviations of performance scores.loads and exploring performance variations between households remains an open task.Further investigation into different architectures and the factors contributing to our method's success, including a deeper understanding of individual features, could enhance performance and expand its applicability to different scenarios.