Unlocking the Potential of Deep Learning in Peak-Hour Series Forecasting

Unlocking the potential of deep learning in Peak-Hour Series Forecasting (PHSF) remains a critical yet underexplored task in various domains. While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF. This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF. Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit. To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models. Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue and a simple yet effective trainable-parameter-free peak-hour decoder with a hybrid loss function that utilizes both the original series and peak-hour series as supervised signals. Extensive experimentation on publicly available time series datasets demonstrates the effectiveness of the proposed framework, yielding a remarkable average relative improvement of 37.7% across four real-world datasets for both transformer- and non-transformer-based TSF models.


ABSTRACT
Unlocking the potential of deep learning in Peak-Hour Series Forecasting (PHSF) remains a critical yet underexplored task in various domains.While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF.This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF.Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit.To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models.Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue and a simple yet effective

INTRODUCTION
Variations in peak-hour values within daily time series cycles (see Figure 1a) are crucial across domains.In telecommunications, engineers calibrate base station capacities with maximum traffic volumes, optimizing communication quality [17,26].In energy, daily peak power consumption affects the provisioning of raw materials and power production [8,12], with some utilities basing billing on peak demand.Likewise, understanding peak-hour traffic patterns in the transportation sector is crucial to effective urban planning [11].Hence, precise Peak-hour Series Forecasting (PHSF) is significant for diverse industries, underlining the necessity of focused research in this area.Despite the importance of PHSF, existing research is underdeveloped, often treating PHSF as a conventional sequence-to-sequence problem [1][2][3], and ignoring the vital relationship between peakhour and original series.These approaches often fall short in predictive scope and accuracy.Recent advancements in TSF techniques [21], such as LogTrans [13], Informer [27], SageFormer [25], and DLinear [23], offer promise but have limitations when directly applied to PHSF (Figure 4).They struggle with the specific information embedded in peak-hour values and weak correlations, emphasizing the need for an innovative method tailored for PHSF.We identify three paradigms in applying TSF methodologies to PHSF tasks (Figure 1b).The first paradigm (PFP) solely relies on historical peak-hour series to forecast peak-hour series but often underperforms due to lower Autocorrelation Function (ACF) [7] compared to the full series (Figure 2).Research has demonstrated a positive correlation between ACF and data predictability [4].The second (SFP) employs full series to forecast but faces similar challenges.The third (SFS) incorporates full series and manually extracts maximum daily values, compromising the ability to predict extremes due to the loss function minimizing average error.For effective PHSF, it's necessary to: 1) Address wide forecast span and poor temporal dependencies; 2) Utilize the relationship between peak-hour and original series.
We introduce Seq2Peak, a framework bridging general time series and peak-hour series forecasting.It consists of two components: Cyclic Normalization, modeling inter-cycle relationships and extracting tailored statistical measures for PHSF; and Peak-hour Decoder, taking historical series, outputting original and peak-hour series with a hybrid loss function prioritizing the mapping relationship.Easily integrated into various models, Seq2Peak enhances forecasting without major computational complexity.Efficacy is proven by improved performance across models and real-world datasets, backed by ablation studies and parameter experiments, marking a new phase in the under-researched PHSF field.
The contributions of this paper are summarized as follows: 1) We systematically introduce the task of peak-hour series forecasting, a vital, under-researched problem, highlighting challenges.2) We propose Seq2Peak, a PHSF framework, easily integrated and generalized in most forecasting models 3) We validate our framework on four real-world datasets, showing 37.7% average improvement over original TSF models.

RELATED WORKS
Peak-hour series forecasting: Current forecasting methods primarily rely on traditional TSF techniques.Various studies have applied traditional TSF methods to predict peak time, including [10], which uses LSTM [18] to forecast the peak load of the current week, and [12], which compares the performance of ARIMA [24], SVR [5], LSTM on power load data.Similarly, [17] utilizes ARIMA and LSTM, among other methods, to predict traffic patterns clustered for each base station during a week.[15,22] introduces Bi-LSTM to forecast the day-ahead peak electricity.[6] compares the performance of CNN and LSTM on peak-load data.[3] and [16] use linear regression and ARIMA, respectively, to model the relationship between the peak load of the next day and typical load patterns identified through clustering.However, with their sole reliance on historical peak time information and traditional time series prediction models, these methods often result in poor prediction performance.

Time series forecasting:
In recent years, innovative sequence processing models like the Transformer have outperformed traditional models like ARIMA and LSTM.Yet, their application in the domain of PHSF remains limited.Noteworthy developments in this area include SageFormer [25], which integrates Transformer and Graph Neural Network into the series-aware framework.Informer [27] introduces sparsity into the attention mechanism to reduce the complexity of the Transformer.Additionally, [23] proposes DLinear, linear prediction models that perform comparably to the Transformer but are significantly smaller in scale.Unfortunately, our experiments indicate that directly applying these models to PHSF tasks yields unsatisfactory results.However, integrating these models with the proposed Seq2Peak framework can more effectively harness their potential, significantly advancing PHSF tasks.

METHODOLOGY
Given the limited research on PHSF tasks, we formally define them and introduce Seq2Peak (Figure 3), a pioneering approach to PHSF.The framework includes two components: the Cyclic Normalization mechanism and the Seq2Peak decoder, working together to predict peak-hour series from historical data.
In this section, we formally address the definition of the Peakhour Series Forecasting task.An illustrative diagram of this problem is provided in Figure 1a.We denote a historical input data window of length  , represented as  ( The predicted output is denoted as Ŷ peak =  ( ).

Cyclic Normalization
The requirement for a longer time span and less self-correlation are the main characteristics of PHSF tasks.Our study addresses these concerns by introducing the Cyclic Normalization (CyclicNorm) pipeline, which aims to learn data correlation and generate features more appropriate for PHSF tasks.CyclicNorm achieves this by modeling the correlation of distribution across varying hours within a cyclical interval.First, we perform Cyclic Normalization on the raw input time series to learn intrinsic correlations between different hours within the same cycle, separating non-stationary content.Given  ( ) = {  ,  + ,  +2 , ...}, where 1 ≤  ≤  and  +2 <  , we apply standard normalization (•) to the 24 sub-sequences individually, extracting means and dividing by standard deviations: Here,  = { 1 ,  2 , ...,   } and  = { 1 ,  2 , ...,   } are the sets of means and standard deviations, respectively.The normalized  ′( ) is the forecasting model's input for more stable data correlation.
The second stage of CyclicNorm further explores handling nonstationary input sequence statistics.Inspired by [9,14], we propose a non-stationary shifting module to model the distribution shifting in time series.
Depending on the forecasting model and the dataset characteristics, the shifting function T can be instantiated as a trainable linear layer or simply a set of multiplication factors.The third stage of CyclicNorm involves denormalizing the results obtained from the forecasting model (i.e., the baseline models used in the experiments) using the aforementioned post-processed statistics  ′ and  ′ .The output of CyclicNorm is treated as a standard TSF result, which also functions as an input for subsequent peak-hour inference.

Seq2Peak Decoder
The objective of optimization for standard TSF tasks focuses on minimizing the mean forecasting deficit, which contradicts the task of peak-hour series forecasting.However, PHSF models with a direct optimization strategy over peak-hour values have been proven to have poor generalizing ability to the test set.
To overcome this dilemma, our Seq2Peak Decoder provides a simple yet highly effective optimization strategy: to optimize the loss function of the original time series and its corresponding peak-hour series simultaneously.To execute this strategy without introducing more trainable parameters, we attach a max-pooling layer of a stride and kernel size of 24 at the end of the previous standard forecasting result.The distinction between the max-pooling operation and manually processing the peak-hour series is that max-pooling allows back-propagation.Thus, we optimize the parameters of PHSF models via the following hybrid loss function  ℎ .
where   is the MSE loss between the ground truth original time series and the output of the penultimate layer,   is the MSE loss between the ground truth of peak-hour series and the final output of the Seq2Peak decoder. is a weighting factor that varies between 0 and 1.By employing this decoder and corresponding loss function, forecasting models can achieve stronger generalization abilities, fully capturing the original series' information while emphasizing forecasting performance for peak hour series.

EXPERIMENTS 4.1 Experimental Setup
Datasets.We evaluate our methods on four large-scale real-world time series datasets: ETTh1/21 , Electricity2 , and Traffic3 .These datasets have an hourly granularity and belong to the domains of energy and traffic.They exhibit daily periodicity, making the PHSF task meaningful in this context.
Baselines.We apply Seq2peak to transformer-based and non transformer-based TSF models (Transformer [19], Informer [27], Autoformer [20], and DLinear [23]) to investigate performance enhancement.For baselines, we perform the SFS paradigm.We examine mean square error (MSE) and mean absolute error (MAE) as metrics.We conduct a comprehensive comparison of various competitive models across different prediction lengths, specifically at 5, 10, 15, and 30-day peak-hour series.For all datasets, the input sequence length is consistently set to 30 days.The 'Avg' value is derived from the average of all four prediction lengths.

Main Results
We first examine the four paradigms depicted in Figure 1b, implemented on Transformer.The results of these experiments (Figure 4) demonstrate the superiority of Seq2Peak over the other three paradigms in accurately predicting future peak-hour series.Given the poor autocorrelation of the peak-hour series mentioned earlier, the methods that directly predict peak-hour (SFP, PFP) fail to capture the trend of the peak-hour series, resulting in underfitting.SFS displayed the best performance among all paradigms, leading us to select it as a strong baseline for subsequent experiments.
Furthermore, we applied our Seq2Peak framework to four commonly used forecasting models 4 .Table 1 compares the forecasting accuracy of the baselines and Seq2Peak.The results consistently show that Seq2Peak significantly outperforms all four baselines.Moreover, Seq2Peak demonstrates stable performance, contrasting sharply with the baselines which exhibit a high increase in error with the extension of the prediction length.These experiments affirm that the Seq2Peak framework can effectively enhance the accuracy and robustness of peak-hour series forecasting.

Ablation Studies
We delve deeper into the effectiveness of each module in the proposed framework.Ablation experiments are conducted using the Transformer and DLinear models on the ETT1 dataset.As shown in Figure 2, both CyclicNorm and Seq2Peak Decoder can enhance 4 Our code will be made publicly available at https://github.com/zhangzw16/Seq2Peak.The term "+ Seq2Peak" indicates the addition of the complete framework, which includes both the CyclicNorm and Decoder components.
performance individually, and using them together yields even greater improvements.This demonstrates how these two modules, by focusing on different challenges and empowering the forecasting model from different perspectives, complement each other.
In addition, we provide a hyper-parameter study for  in Eq.5 by plotting dynamic curves.As shown in Figure 5, we examine the performance on ETTh1 and ETTh2 datasets using Seq2Peak enhanced DLinear model.The tendency of the plot indicates that the best-performing  occurs around 0.5, which validates the necessity of applying the hybrid loss function.

CONCLUSION
In conclusion, this paper addresses the crucial but often overlooked issue of PHSF.We proposed Seq2Peak, a novel framework tailored for PHSF tasks, bridging the gap between TSF methods and PHSF.Seq2Peak has demonstrated significant performance improvements across datasets for transformer-and non-transformer-based stateof-the-art TSF models.This study effectively solves the PHSF problem and paves the way for further explorations in this critical and challenging field.

Figure 1 :
Figure 1: The peak-hour series forecasting task and its paradigms

Figure 4 :
Figure 4: Display of forecasting results for three paradigms and Seq2Peak (with the red line representing the ground truth)

Figure 5 :
Figure 5: Effect of peak weighting factor  (tested with DLinear on ETTh1/2) is omitted hereafter), where   ∈ R  and  represents the number of channels.The objective is to forecast the peak-hour series   derived from the original future series  = {  + , • • • ,   + + −1 } with window size .The peak-hour value is the maximum value downsampled from an interval of  = 24 hours consecutively for each time series

Table 1 :
Performance promotion by applying the proposed framework to four TSF models.