On Robust Electric Network Frequency Detection Using Huber Regression

A robust regression technique known as Huber regression is incorporated into the Electric Network Frequency (ENF) detection task. This novel framework is based on the assumption of a mixture noise model, which combines Gaussian and Laplacian noise for ENF detection in short-length audio recordings. The effectiveness of the proposed ENF detector is assessed through accuracy calculations and the analysis of the Receiver Operating Characteristic curve with respect to the Area Under the Curve. Real-world benchmark data from the ENF-WHU dataset are utilized for this evaluation. The experimental results indicate that integrating the Huber regression method leads to a significant enhancement in ENF detection for short-length audio recordings, outperforming the performance of existing state-of-the-art techniques.


INTRODUCTION
The Electric Network Frequency (ENF), which originates from the fluctuations of the power grid frequency, serves as a distinctive and intrinsic "fingerprint" within multimedia content, such as audio recordings [2].ENF holds a nominal value of 50 Hz in Europe and 60 Hz in the United States/Canada, serving as a forensic criterion in the realm of multimedia forensics [9,13,25].The comparison of the estimated ENF from a multimedia recording with a ground truth ENF derived from power mains can find application in tasks such as time stamp verification [5,6,12,14,35], geo-localization [4,32,33], and tampering detection [11,28,31,34].Additionally, the integration of ENF analysis with Non-Intrusive Load Monitoring techniques [3,21,30] further expands forensic capabilities, allowing for the disaggregation of energy consumption patterns that may corroborate the time and date of a recording.
To achieve optimal performance in the aforementioned applications, an important prerequisite before ENF estimation is the detection of ENF signals in multimedia recordings.A significant contribution that tackles ENF detection is developed in [15].Six distinct detectors are developed and assessed, with three of them specifically tested using real-world data alongside synthetic data.The detectors are evaluated, encompassing both long-length audio recordings and short-length audio recordings.For ENF detection in short-length audio recordings, a detector is proposed to enhance the Likelihood Ratio Test (LRT) performance employing the Least Absolute Deviations (LAD) regression [23].LAD regression operates under the assumption of a Laplacian noise model, solving for regression weights with fixed frequency estimates and solving for frequency estimates with constant regression weights until achieving convergence.In [24], a multi-tone time-frequency detector is developed employing a combination of multiple harmonics to identify the presence of valid ENF traces within a recording.Additionally, this detector provides insights into the overall quality of the ENF signal and the count of available harmonic components.An ENF detector employing a superpixel approach for videos is developed in [29].This method involves ENF signal estimations derived from stable superpixel regions to determine the presence or absence of an ENF signal in brief video clips.In [27], a linear discriminant is employed to create an automated detector for ENF disturbances.Prior to assessing the detector, ENF extraction is carried out using the Estimation of Signal Parameters by Rotational Invariant Techniques.
Significant research is also directed toward the ENF estimation task.A multi-tone harmonic model for estimating the ENF is introduced in [1].To enhance the accuracy of ENF signal estimation, multiple harmonics are combined, and the Cramer-Rao bound is utilized to restrict the variance of the ENF estimator.In [10], a spectral estimation method is presented that integrates the ENF across various harmonics.The extraction of ENF takes into account the local signal-to-noise ratio at each harmonic.Instead of using the conventional Short-Time Fourier Transform (STFT) [26], the process of ENF extraction is approached as a data-dependent filtering issue.A framework for robust extraction of ENF from real audio recordings, which encompasses multi-tone ENF harmonic enhancement and the utilization of a graph-based approach to optimize harmonic selection, is introduced in [16].Drawing insights from both [7] and [8], the frequency demodulation process in [18]  utilizes the intermediate frequency signal's spectrum to deduce the highest achievable frequency of the ENF.In [19], filter-bank Capon spectral estimators and non-rectangular temporal windowing are developed to enhance ENF estimation accuracy for authenticity verification.Moreover, a non-parametric approach that embeds a customized lag window into the Blackman-Tukey method is proposed in [20], reducing speech content interference and improving forensic analysis precision.The evaluation of ENF extraction involves the assessment of non-parametric and parametric spectral estimation techniques.
Here, ENF detection is approached through the perspective of Huber regression [17].A robust regression technique is proposed that combines the strength of the ℓ 2 norm in Least Squares (LS) and the robustness of the ℓ 1 norm in LAD regression.The proposed method incorporates a novel strategy by assuming a combination of Gaussian and Laplacian noise models within a mixture noise model framework.This is insightful, as the distribution of the noise model is similar to a Gaussian distribution in its central tendency while exhibiting a double exponential distribution in its tails.By employing a mixture noise model, a balance is obtained between the smoothness of Gaussian noise and the heavy-tailed behavior of Laplacian noise.This approach advances ENF detection, surpassing the robust ENF detector in [23] and enabling enhanced detection accuracy.By integrating Huber regression with a Gaussian-Laplacian mixture noise model, the proposed method leads a revolutionary stride in ENF detection, achieving exceptional accuracy and versatility compared to state-of-the-art techniques.
The rest of the paper is structured as follows.In Section 2, the proposed framework is analyzed, while in Section 3, experimental findings are presented.Section 4 concludes the research approach while also providing directions for future research.

PROPOSED FRAMEWORK
In this Section, the proposed framework, depicted in Figure 1, is analyzed.A description of the signal model as the formulation of the problem is described in Section 2.1, while the proposed ENF detection framework is discussed in Section 2.2.

Signal Model
The ENF is considered a deterministic signal denoted as  [] in the presence of noisy observations  [].The ENF waveform  [] can be expressed as: where [] > 0 and  [] represent the time-varying amplitude and ENF frequency respectively.The parameter  signifies the unknown initial phase.Here,  = 1/  corresponds to the sampling interval, with   denoting the sampling frequency.
Due to the slow variation of the ENF signal over time, [] and  [] can be approximated by constants   and   , respectively, resulting in the simplified expression of the ENF waveform The task of detecting ENF involves a binary hypothesis scenario.This binary framework facilitates the precise detection of the ENF presence within the signal model as follows: where

Huber-LRT ENF Detection
The core of the detection problem centers on the assumption of a mixed noise model.This is where the integration of Huber regression for ENF detection stands with its alignment to the mixed noise model's complexity.By incorporating both ℓ 1 norm and ℓ 2 norm elements, Huber regression adapts to the varied noise patterns inherent in the mixed noise model.The ℓ 1 norm component equips the method to resiliently manage Laplacian noise, which handles outliers robustly [36].Simultaneously, the inclusion of the ℓ 2 norm element empowers Huber regression to handle the Gaussian noise aspects within the mixture, aligning with the principles of least squares.ENF detection initiates with estimating the maximum likelihood values [22] for the unknown parameters, achieved through solving the optimization problem depicted as: where    ∈ R 2×1 denotes the regression coefficients and  > 0 the regression scale.The objective function, grounded in Huber regression, is articulated as: In (5) where    within  corresponds to the term . The Huber loss function is a hybrid error measure that combines the quadratic penalty of the ℓ 2 norm (i.e., LS loss) for small errors with the linear penalty of the ℓ 1 norm (i.e., LAD loss) for large errors, providing a robust and balanced approach to error minimization.Moreover,  > 0 ensures Fisher-consistency of  under i.i.d.Gaussian errors, calculated through: where   2

𝑘
is the cumulative  2  distribution, while  = 1.345 stands for a user-defined tuning threshold influencing the robustness level.In (5) The optimization problem (4) involves iterative estimation of   , regression coefficients    , and the scale parameter .The iteration begins by setting   to the frequency corresponding to the peak of the periodogram, computed as: Then the parameters    and  are estimated by solving the Huber regression problem: where ( 10) is convex when considering both the regression vector and the scale parameter jointly.This convexity property is based on the assumption that the loss function  (•) used is itself convex.The estimated parameters θ   and σ result through the utilization of a block-wise Minimization-Majorization (MM) algorithm [36].The MM algorithm iteratively seeks to minimize the Huber loss (6) by constructing a sequence of surrogate functions that majorize the original Huber loss function.These surrogate functions are typically chosen to be quadratic approximations near the current parameter estimates, making the optimization more tractable.At each iteration, the MM algorithm updates the parameter estimates by minimizing the surrogate function, with the goal of minimizing the original Huber loss.This iterative process continues until a convergence criterion is met, resulting in parameter estimates that minimize the Huber loss criterion (10).Subsequently, after obtaining the parameter estimates θ   and , these values are held fixed, and the estimation of the frequency f is pursued through a separate optimization problem: The minimization of ( 11) is achieved by implementing a dense grid search methodology around the ENF center frequency f  .Following the completion of the grid search and upon obtaining the initial solution for the frequency  , the optimization problem specified in (10) is subjected to further refinement.This refinement is achieved through an iterative alternating optimization approach.The primary objective of this iterative process is to attain convergence of the frequency parameter to its optimal value f , while concurrently optimizing the parameters    and  using the Huber regression method.While the optimization problem in ( 10) is inherently convex and can be efficiently solved, the grid search in (11) may lead to suboptimal results.
Once the unknown parameters f , θ   , and σ have been estimated, a Huber-LRT detector is established to assess whether    falls into the H 1 scenario relative to a threshold denoted as , i.e., The threshold for the Huber-LRT detector is determined, following the process in [23], by computing the median of the test statistic values for each duration across all recordings under both H 0 and H 1 .Let  be the number of recordings for each duration, which in this case is  = 100.This is in accordance with the 100 audio recordings available for analysis, which are divided into two groups, each containing 50 recordings (see Section 3.1).When  is even, the threshold is calculated as follows: where  Huber, (•) represents the order statistics of the test statistic values.By employing the median, the threshold value is less influenced by anomalous observations, making it a more reliable and robust choice for detecting deviations such as extreme values or outliers in the data.

EXPERIMENTAL EVALUATION
In this Section, the experimental evaluation is conducted to compare the proposed method with the state-of-the-art detectors developed in [15,23].A comprehensive description of the real-world audio recordings within the ENF-WHU dataset, followed by a detailed account of the preprocessing steps undertaken and the experimental results, is presented.

Dataset
The evaluation of the Huber-LRT detector utilizes the ENF-WHU dataset [15], which comprises 100 real-world audio recordings.These recordings are captured at a sampling rate of 44.1 KHz on the Wuhan University campus.Among these 100 recordings, 50 are placed under the folder labeled H1 as they contain the ENF signal, while the remaining 50 recordings, characterized by severe corruption or the absence of the ENF signal, are placed in the H0 folder.

Data Preprocessing
Prior to the evaluation process, a four-step pre-processing procedure summarized in Table 1 is applied to the audio recordings in the ENF-WHU dataset.Initially, the recordings are cropped into audio clips, with duration ranging from 5 to 10 seconds.Subsequently, the cropped audio clips undergo downsampling to 8 KHz using appropriate antialiasing filtering, followed by resampling at 400 Hz.Finally, a bandpass filter is implemented with the passband centered at the second harmonic of ENF (i.e., 100 Hz), and the cut-off frequencies are set at 99.9 Hz and 100.1 Hz.The transition bands have a width of 0.1 Hz.This pass-band choice aligns with the methodology outlined in [15], which centers around the second harmonic of ENF due to its robustness and prominence.

Experimental Results
In Figure 2, a comparative performance analysis of the proposed Huber-LRT detector against the existing methods [15,23] is presented.Figure 2a displays the detection accuracy across different recording durations, spanning from 5 to 10 seconds.Notably, higher accuracy is achieved by the Huber-LRT detector, which is defined as the ratio of correctly detected instances to the total number of recordings.Notably, the Huber-LRT detector outperforms its competitors in terms of accuracy, which is defined as the ratio of correctly detected instances to the total number of recordings.More specifically, the Huber-LRT detector achieves an accuracy of 80% for 5-second audio durations and surpasses 90% for audio recordings lasting 7 seconds.However, a decline in accuracy is noticeable for all competing methods when dealing with 8-second audio segments.Nevertheless, the Huber-LRT detector maintains a satisfactory detection accuracy of 85%.In Figure 2b, the Receiver Operating Characteristic (ROC) curves for the proposed detector and its competitors are illustrated, accompanied by their corresponding Area Under the Curve (AUC).The AUC for the Huber-LRT detector is calculated to be 0.942, achieving a higher performance compared to other methods.The improved performance of the Huber-LRT detector is attributed to the incorporation of robust statistical techniques in the ENF detection problem.By utilizing Huber regression for the estimation of the unknown parameters and assuming a mixture noise model, the ENF detection task is reinforced with greater strength.This assumption represents a significant departure from the LAD-LRT detector [23], which centered around the assumption of a Laplacian noise model.The Huber-LRT provides a more efficient and robust method for modeling noise within the ENF-WHU dataset by combining elements of the Gaussian distribution in the middle and the Laplacian distribution in the tails.Consequently, the Huber-LRT, in conjunction with the previous methods [15,23], contributes to a deeper understanding of ENF detection by leveraging the power of robust statistical methods.

CONCLUSION AND FUTURE WORK
Here, a novel ENF detector termed Huber-LRT has been proposed, incorporating robust statistical techniques into the ENF detection task.This innovative framework is built upon the assumption of a mixture noise model, combining Gaussian and Laplacian noise components to enhance ENF detection accuracy in short-length audio recordings.The effectiveness of this proposed ENF detector has been assessed through the accuracy and analysis of ROC curves with respect to AUC.Real-world benchmark data from the ENF-WHU dataset has been employed for this evaluation.The experimental findings have indicated a significant enhancement in ENF detection for short-length audio recordings, outperforming the performance of state-of-the-art techniques.Further research will consider the integration of other robust regression methods in the ENF detection task aiming at further improvements in detection accuracy.

Table 1 :
Summary of preprocessing steps and values for ENF detection.