Confidence Intervals for the Mean of Birnbaum-Saunders Distribution with Application to Wind Speed Data

Thailand is dealing with air pollution, particularly from small particulate matter (PM), significantly impacting public health. Wind speed is pivotal in the dispersion of these particles. Due to its unpredictability, we are interested in estimating the confidence interval (CI) for mean wind speed data using a Birnbaum-Saunders (BS) distribution. We have constructed various intervals, Bootstrap confidence interval (BCI), Percentile bootstrap confidence interval (PBCI), Generalized confidence interval (GCI), Bayesian credible interval (BayCI), and The highest posterior density (HPD). Using the R statistical software, a simulation study evaluated their coverage probabilities (CP) and average lengths (AL). GCI emerged as the most effective method overall. With increased sample size and shape parameters, these intervals displayed reduced average lengths. Applying these intervals to wind speed datasets in Nong Prue subdistrict, Chonburi province, Thailand, demonstrated their effectiveness.


INTRODUCTION
Thailand faces an annual challenge with particles that have a diameter of less than 2.5 micrometers (PM 2.5 ) particles, most noticeable from October to April due to still winds that trap dust.These particles originate from various sources like open burning, transportation, industry, and cross-border smog.Factors such as pressure, wind speed, rainfall, and temperature compound their impact, making it harder for these tiny particles to disperse, resulting in increased dust levels during the early months of the year.In 2023, PM 2.5 levels surged past safe standards, peaking at 180 micrograms per cubic meter [6].This situation significantly impacts the health of individuals, especially those at higher risk, including children, pregnant women, the elderly, and individuals with chronic conditions like asthma and other respiratory issues.Paneangtong et al. [20] indicated that individuals living near industrial estates are at a heightened risk of health problems due to exposure to dust and smoke compared to those residing farther away.Ammuaylojaroen et al. [1] research revealed the impact of wind speed on reducing PM 2.5 levels, showing that increased wind speed correlates with decreased PM 2.5 concentrations.Due to the unpredictable nature of wind speed, estimating it using a suitable method allows us to determine whether the wind speed in the future will increase or decrease.This, in turn, enables us to predict the potential changes in the amount of PM 2.5 likely to occur in the future.Additionally, Mohammadi et al. [18] investigated and assessed the application of the two-parameter Birnbaum-Saunders (BS) distribution for examining the wind speed distribution in a long-term time series of recorded wind speed data collected from ten distinct stations.Their results emphasized the successful performance of the BS distribution across all ten stations.The CI holds more utility than a point estimator as it offers a range of anticipated values.Therefore, there is interest in using this distribution to study the CI for the mean wind speed in industrial areas.This study will use daily average wind speed data collected from August to October 2023 in the Nong Prue subdistrict of Chonburi province, Thailand.Chonburi Province was selected due to its classification as an industrial area, leading to increased levels of PM2.5 that surpass standard limits.
Birnbaum and Saunders [27] were interested in finding a distribution that described how long a material specimen subjected to fatigue would last before failing.As a result, they developed the fatigue-life distribution based on a model that measures the total time until the combined damage, caused by the formation and growth of a primary crack, exceeds a specific threshold, leading to the specimen's failure [29].The BS distribution has become widely studied and applied in various fields, including earth and environmental sciences [9-11, 13, 21].Originally formulated to address material fatigue, Leiva et al. [12] expanded its use as a suitable model for describing environmental data through the proportionate effect law.This adaptation stems from how pollutants spread or gather within a volume influenced by environmental factors, moving them from their initial location while maintaining their original quantity.Several researchers have contributed to developing parameter estimation methods for the Birnbaum-Saunders (BS) distribution.Birnbaum and Saunders [28] introduced the maximum likelihood estimators (MLEs) for the parameters (, ).Ng et al. [8] devised modified moment estimators (MMEs) and a straightforward bias correction technique to enhance the MLEs and MMEs.Jantakoon and Volodin [19] contributed to the field by presenting percentile bootstrap and generalized pivotal processes that create CIs for the shape and scale parameters of the BS distribution.Lu and Chang [14] formulated a bootstrap method for forecasting intervals related to the BS distribution in a different approach.Wang et al. [25] extended the discourse by considering Bayesian inference for the parameters of the BS distribution.They based their methodology on inverse-gamma (IG) priors and computed Bayesian estimates.Lastly, Paggard et al. [22] contributed by presenting CIs of variance and the difference of variances within the BS distribution.
We focus on inferring the statistical aspects related to the mean or expected value of a random variable.This value represents the long-term average of random variables, derived by integrating the variable's product with its probability by the distribution.As the mean stands as a widely used measure, our attention lies in creating CIs to estimate the population mean.Thangjai et al. [23] presented CIs for the mean and the difference between means from two normal distributions featuring unknown coefficients of variation.Maneerat et al. [16] introduced Bayesian techniques for constructing the highest posterior density (HPD) intervals concerning the mean and the difference between means of two delta-log normal distributions.However, there is currently no available literature that investigates the development of CIs for the mean within BS distributions.Therefore, we propose CIs for the mean of BS distributions.We provide five different approaches based on the bootstrap confidence interval (BCI), parametric bootstrap confidence interval (PBCI), generalized confidence interval (GCI), the Bayesian credible interval (BayCI), and the highest posterior density interval (HPD).To demonstrate the effectiveness of the suggested methodologies, we also applied them to wind speed data from Nong Prue subdistrict, Chonburi provinces, Thailand, that were gathered from August to October 2023.

METHODS
A random variable  is said to follow the two-parameter BS distribution with shape parameter () and scale parameter (), (,  > 0) denoted as  ∼  (, ), if its cumulative distribution function (c.d.f.) is given by where Φ(•) is the c.d.f. of the standard normal distribution, and the probability density function (p.d.f.) of BS distribution can be written as The BS random variable  can be generated based on this correlation with a normal random variable.The mean (expected value) and variance of  are provided by  ( ) =  1 + 1 2  2 and   ( ) = () 2 1 + 5 4  2 , respectively.Therefore, the mean, denoted as , can be defined as: (3)

Bootstrap Confidence Interval
The bootstrap technique was first introduced by Efron [3] as a resampling method based on randomly selecting new samples derived from the original sample.Ng et al. [8] showcased that the MLEs for  and  exhibit bias when using Monte Carlo simulation.Meanwhile, the Constant-Bias-Correcting (CBC) parametric bootstrap method, introduced by Mackinnon and Smith [15], was found by Lemonte et al. [2] to be the most productive in terms of bias reduction.Let x = ( 1 ,  2 , ...,   )  be a random sample of size  from  (, ).
The MLEs of  and  denoted as α and β.Bootstrap sample size  is obtained from  ( α, β),which is denoted by The percentile bootstrap estimates for α * and β * are Therefore, the bootstrap estimator of the mean can be obtained as and the percentile bootstrap estimator of the mean can be obtained as ).

Generalized Confidence Interval
Weerahandi [26] formulated an approach for GCI using the generalized pivotal quantity (GPQ) as its foundation.Suppose X = ( 1 ,  2 , ...  )  is a random sample from the BS distribution and x = ( 1 ,  2 , ...,   )  in Equation ( 1) with sample size .Sun [30] and Wang [24] derived the GPQ of  and  as follows where,  ∼  ( − 1) follows a t-distribution with  − 1 degrees of freedom,  ∼  2 (), where  2 () is a Chi-squared distribution with  degrees of freedom,  1 =  =1   and  2 =  =1 1/  .Two solution for  denited as  1 and  2 can be derived by solving the following: where  = [( − 1) Pivotal quantities   and   then, To replace  and  in equation ( 3), crucial values   and   are used, resulting in the expression of the GPQ of the mean (denoted as   ) as ).

Bayesian Confidence Interval
Wang et al. [25] introduced appropriate priors utilizing known hyperparameters through the utilization of Inverse Gamma (IG) priors for  and  2 , denoted as  ( | 1 ,  1 ) and  ( 2 | 2 ,  2 ).Since it is difficult to demonstrate that  and  2 are independent of one another.Consider  following an IG distribution characterized by parameters  and .(denoted as  ( |, )), then the PDF of IG distribution is  ( |, ) = (  /Γ()) −−1  (−/), where ,  > 0 Let X = ( 1 ,  2 , ...,   )  be a random sample from  (, ) and x = ( 1 ,  2 , ...,   )  be observations of X.The likelihood function, without accounting for the additive constant, is expressed as follows: The joint posterior function of (,  2 ) is acquired by combining the likelihood function with the inverse-gamma (IG) priors of  and  2 , resulting in: Consequently, the marginal distribution of  and the posterior distribution of  2 given of  are as follows: and The Equations ( 17) and ( 18) samples are obtained using Markov Chain Monte Carlo methods.The generalized ratio-of-uniforms method, as expounded in the subsequent subsection, to generate posterior samples of .Conversely, the posterior samples of  2 can be readily obtained using the  package within the R software suite.This convenience arises due to the analytical intractability of the marginal distribution in Equation (17).
To construct the Highest Posterior Density (HPD) interval for the mean, we employed the ℎ function available within the  package of the R software suite.This was executed after obtaining the Bayesian mean estimator in step 4.

SIMULATION RESULTS
The study examined five methods (GCI, BCI, PBCI, BayCI, and HPD) to construct CIs for the mean within BS distributions.This analysis was conducted through a Monte Carlo simulation implemented in the R statistical program.R, being a programming language for statistical computing, is both free and open-source.It empowers researchers and practitioners with the capability to design and implement simulation studies, spanning from straightforward to highly sophisticated, by utilizing a combination of built-in functions and numerous user-created packages in the R program [4].The performance of these five methods was assessed based on their coverage probabilities (CPs) and average lengths (ALs).Two critical criteria determine selecting a preferred method: ensuring CP is equal to or near the nominal confidence level of 0.95 and achieving the shortest AL.The simulation setting consisting of the number of replications was 5, 000, with 5, 000 pivotal quantities for GCI,  = 500 for BCI and PBCI, and  = 1, 000 for BayCI and HPD interval.For mean of BS, the data were generated for  ∼  (, ) with sample size  = 10, 20, 30, 50 or 100 and the shape parameter  = 0.10, 0.25, 0.50, 0.75 or 1.00 .The scale parameter  values were fixed at 1 for all cases.For BayCI and HPD, we considered  = 2 and  =  =  =  = 10 −4 hyperparameter which was recommended by Wang et al. [25] When examining the mean of a BS distribution, the results from the simulation displayed in Table 1 revealed that GCI exhibited CPs higher than or near 0.95, even with large sample sizes and high  values.Conversely, despite PBCI showcasing the shortest ALs, its CPs fell below 0.95, although they improved with larger .Moreover, the ALs among the five methods displayed a decreasing trend and appeared similar as the sample size increased.

AN EMPIRICAL APPLICATION
Wind energy is an eco-friendly and sustainable power source, untainted by carbon emissions or pollution [17].According to Mohammadi et al. [18], the BS distribution is the optimal method for estimating wind speed distribution.We employed datasets containing daily wind speed records from August to October 2023 in Nong Prue Subdistrict, Chonburi Province, Thailand [7], to demonstrate the efficiency of CIs for the mean of BS distributions obtained through methods such as GCI, BCI, PBCI, BayCI, and HPD.
Since the data contain positive values, they can be fitted to BS, Uniform, Cauchy, exponential (Exp), Weibull or normal distributions.Therefore, we tested the distributions of positive wind speed datasets using the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).The results in Table 2 show that the wind speed datasets from Chonburi province fit a BS distribution, as confirmed by the AIC and BIC because the value of AIC and BIC for BS distribution was smallest.The essential statistics computed for the daily wind speed data are displayed in Table 3.The mean for Chonburi provinces was calculated as 1.8141.The 95% CI for this mean, utilizing GCI, BCI, PBCI, BayCI, and HPD, is outlined in Table 4.According to the simulation outcomes, PBCI exhibited the shortest ALs, followed by BCI.Notably, similar to the simulation findings, the CP of PBCI was below 0.95, while GCI hovered around or above 0.95.In summary, GCI emerges as the most suitable method for constructing a CI for the mean of a BS distribution.

CONCLUSION
We constructed CIs for the mean of BS distributions using GCI, BCI, PBCI, BayCI, and HPD.The performance of the CIs was evaluated in terms of their CPs and ALs.The simulation results show that the CPs of GCI were more significant than or close to the nominal CI of 0.95.Even though BCI and PBCI had shorter ALs than GCI, their CPs were the lowest and under 0.95.Thus, it cannot be recommended.Therefore, the GCI method is the most effective for constructing CIs for the mean of the BS distribution.Furthermore, when calculating the CIs for the mean of wind speed datasets from Chonburi province, Thailand, using the proposed methods, it was observed that the GCI method provides the best results in the empirical scenario.This method is particularly beneficial for estimating the mean of wind speed in the months of August to October, providing essential information for predicting fluctuations in  2.5 levels, whether they increase or decrease.Such predictions are crucial for preparing and effectively addressing the  2.5 dust problem.

Table 1 :
The CPs and ALs of 95% of two-sided CIs for mean of BS distribution ( = 1).
Notes:Blod represents values that satisfy criteria and the best-performing method.

Table 2 :
AIC and BIC values for fitting six asymmetric distributions.

Table 3 :
Descriptive statistics for the wind speed data from the Chonburi dataset.

Table 4 :
Confidence intervals for the mean of wind speed for the Chonburi datasets.