Machine Learning-Based Price Forecasting for Polypropylene Granules in Thailand

The plastic industry plays a vital role in Thailand, with a significant dependence on plastic materials for a majority of industrial products. Among the various types of plastics, polypropylene (PP) emerges as the most extensively used, making it indispensable for the country's plastic industry. This research focuses on presenting and comparing forecasting models for the price of PP granules in Thailand. The primary objective is to identify the most accurate forecasting model, with the mean absolute percentage error (MAPE) serving as the criterion for assessing the forecast model's performance. Three machine learning forecasting models, namely Support Vector Regression (SVR), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN), are employed in the study. The findings reveal that the ANN model demonstrates the highest accuracy, achieving a MAPE of 5.89% on the test dataset.


INTRODUCTION
The plastic industry is highly important due to its widespread use as a raw material and its production into various types of products globally.The fundamental raw material for plastic is plastic granules, and the price of these granules is not stable.It fluctuates based on uncertain factors.If the price patterns of plastic granules can be known, manufacturers can reduce costs by purchasing the granules more efficiently.
Based on the proportion of export structure in Thailand in 2020, each category was ranked in descending order as follows: principle manufacturing products accounted for 80.23% of exports, agricultural products accounted for 9.14%, agro-industrial products accounted for 7.8%, and mineral and fuel products accounted for 2.83%.The principle manufacturing product category had the highest proportion of exports, as seen in Table 1, from 2013 to 2020.Examples of products in this category include computers, cars, plastic granules, and electrical circuit boards.These products all use plastic granules as components or raw materials, highlighting the significant importance of plastic granules in Thailand's export industry.
Figure 1  Polypropylene (PP) is a versatile plastic material known for its solid, flexible, and heat-resistant properties (melting point: 165 degrees Celsius).Its resistance to chemicals and sterilization processes (up to 100 degrees Celsius) makes it suitable for various applications, such as medical purposes, food packaging, bags, and automotive components.In Thailand, PP plays a vital role in the manufacturing and export industries, with the highest annual consumption among plastic types.As the demand for plastics continues to increase each year, businesses find themselves ordering more plastic granules to accommodate these growing needs.The price of PP granules, as depicted in Figure 2, is subject to volatility.Strategically timed orders can result in cost savings, particularly when ordered at favorable price differentials.Accurate forecasting plays a pivotal role in enabling businesses to estimate PP granule prices and plan their strategies efficiently.Furthermore, there is currently no model for forecasting PP granule prices in Thailand using machine learning methods.Hence, this research focuses on predicting PP granule prices through machine learning techniques to address this gap in the Thai market.

REVIEW OF RELATED LITERATURE
The literature encompasses a significant amount of research on price forecasting, covering a wide range of products from agricultural to industrial goods.Accurate forecasting plays a vital role  Figure 2: PP granule price from 2011 to 2021 [3] in informing decision-makers, enabling them to make informed choices and optimize business planning to maximize profitability.Traditionally, time series methods, such as ARIMA and Holt-Winters, have been commonly employed for price forecasting and have shown reasonably accurate results in specific cases.For example, Nowneow and Rungreunganun [4] applied the ARIMA model to forecast polyvinyl chloride pellet prices based on crude oil prices and exchange rates, while Dooley and Lenihan [5] compared the ARIMA model with a lagged forward price model for lead and zinc cash price forecasting.However, with advancements in technology and processing capabilities, data-driven methods, particularly machine learning, have gained popularity due to their ability to handle complex problems and provide more accurate predictions compared to traditional approaches.Several studies have compared the performance of machine learning algorithms with traditional statistical models in price forecasting.For instance, Sabu and Kumar [6] compared time series models (SARIMA and Holt-Winters) with a machine learning model (LSTM) for arecanut price forecasting, and LSTM demonstrated superior performance.Similarly, Lago et al. [7] analyzed the accuracy of 27 approaches, including machine learning and statistical models, for electricity price forecasting, with machine learning models generally outperforming statistical models.Machine learning algorithms frequently employed in price forecasting include Artificial Neural Network (ANN) [8][9], Long Short-Term Memory (LSTM) [10][11][12], eXtreme Gradient Boosting (XGBoost) [13][14], and Support Vector Regression (SVR) [15][16].

Data Preprocessing
All feature data were scaled using StandardScaler from scikit-learn library.This transformation is given by where x is the mean of the training data and  is the standard deviation of the training data.However, the feature data were intentionally left unscaled for XGBoost as it is a tree-based algorithm known for its insensitivity to the scale of features.

Machine Learning Algorithms
Machine learning has three main categories: supervised learning, where algorithms are trained on labeled data; unsupervised learning, where algorithms are trained on unlabeled data; and reinforcement learning, where an agent learns through interactions with an environment.In this study, we propose three supervised learning algorithms, namely Support Vector Regression (SVR), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN), for forecasting the price of PP granules in Thailand.These algorithms were selected based on their proven effectiveness in price forecasting tasks.The objective is to evaluate their performance and identify the most accurate model for PP granule price prediction.

Support Vector Regression.
The Support Vector Machine (SVM) algorithm, originally proposed by Vapnik [17], is widely applicable for solving linear and nonlinear classification problems.SVM operates by establishing a decision boundary that maximizes the margin between the boundary and the nearest data points.This objective is to achieve the maximum separation between different classes.In regression problems, SVM extends to Support Vector Regression (SVR), which permits users to specify an acceptable error tolerance for the model.SVR then identifies an appropriate line or hyperplane in higher dimensions to fit the data [18].The widely used variant of SVR is known as -SVR, where the absolute error is constrained to be within a predefined margin known as the maximum error or  (epsilon) [19][20].This formulation allows for greater flexibility in controlling the tolerance for errors in the regression model.

eXtreme Gradient
Boosting.The eXtreme Gradient Boosting (XGBoost) algorithm, proposed by Chen and Guestrin [21], is an enhanced version of the gradient boosting decision tree (GBDT) algorithm.It employs a boosting approach to iteratively train multiple decision trees by predicting the residuals from the previous trees.XGBoost utilizes a second-order Taylor expansion and leverages first and second-order gradients to optimize the objective function during training [22].Widely recognized as one of the most popular boosting tree algorithms for gradient boosting machines (GBM), XGBoost exhibits exceptional performance in problem-solving and reduces the reliance on feature engineering [23].It offers high accuracy and mitigates the risk of overfitting through its boosting techniques [24].Furthermore, XGBoost is known for its flexibility and portability as an ensemble tree-based model, providing efficient solutions for various data science tasks including regression and classification.It incorporates built-in L1 and L2 regularization to prevent overfitting and exhibits robustness in handling missing values [25].

Artificial Neural
Network.The Artificial Neural Network (ANN) is a machine learning approach that mimics the functioning of the human brain, composed of interconnected layers with nodes  [26].Typically, an ANN consists of an input layer, hidden layer(s), and an output layer [27][28].Information flows through the network, propagating from the input layer to the output layer via the hidden layer(s).The hidden layer(s) perform the processing, taking weighted inputs and generating outputs using activation functions.The number of hidden layers can vary based on the complexity of the problem.The output layer produces the final results before making predictions [29].

Hyperparameter Tuning
Hyperparameter tuning was performed using GridSearchCV from the scikit-learn library with 5-fold cross validation.Table 3 presents the hyperparameters and their respective values for each algorithm.

Evaluation Metric
The accuracy of each algorithm is evaluated using the mean absolute percentage error (MAPE).It is defined as where   and   are the actual and forecast values at time , respectively, and  is the number of forecasting periods.The MAPE is used as the evaluation metric because it offers an easily interpretable measure of forecast accuracy and is scale-independent, making it versatile for comparing predictions across different datasets.However, it has limitations such as sensitivity to outliers, a bias toward underestimation, and the inability to handle zero values, which can affect its reliability in certain situations.

RESULTS AND DISCUSSION
The optimized hyperparameters of three machine learning algorithms obtained through GridSearchCV using the training dataset are summarized in Table 4.
During the cross-validation periods (January 2018 to December 2020), the monthly prices of PP granules were forecasted using three machine learning algorithms with their respective tuned hyperparameters.The forecasting results of SVR, XGBoost, and ANN are illustrated in Figure 3, and the accuracies of these algorithms, measured by MAPE, are compared in Table 5.From the figure, it is evident that in the initial period, all three algorithms were able to forecast values that were close to the actual prices.However, starting from the end of 2019, the prices of PP granules experienced an abnormal decline, which was a consequence of the COVID-19 pandemic.The lockdown measures implemented to control the spread of the virus resulted in economic activities coming to a halt, leading to a significant reduction in the use of PP granules.As a result, all three algorithms made substantial forecasting errors during this period.The comparison of MAPE among all three algorithms reveals that ANN provides the most accurate forecasts, with a MAPE of  6.99%.Therefore, ANN was chosen to be tested against the PP granules price data for the year 2021, which served as the test dataset to evaluate its generalized error on unseen data.The forecasting results are presented in Figure 4, and the MAPE value obtained was 5.89%.This level of accuracy falls within an acceptable range, indicating the model's reliability in making predictions for unseen data.

CONCLUSION
The plastic industry holds great importance for Thailand as it serves as a vital raw material for the majority of industrial products.Among the various types of plastics used, Polypropylene (PP) stands out as the most widely utilized.The price of PP granules exhibits fluctuations, making accurate forecasting crucial to help reduce costs in the industrial sector.This research aims to develop and compare forecasting models for the price of PP granules using three machine learning algorithms: SVR, XGBoost, and ANN.The models were trained using a dataset spanning the years 2011 to 2017, comprising nine features related to the price of PP granules.The performance of the models was evaluated through cross-validation for the years 2018 to 2020, where the ANN model demonstrated higher accuracy compared to other methods, achieving a MAPE of 6.99%.Furthermore, when the ANN model was applied to forecast the prices in the test dataset for the year 2021, it yielded a MAPE of 5.89%, indicating its effectiveness and reliability.
In conclusion, the forecasting models for PP granule prices have valuable applications in business planning, cost reduction, and risk mitigation.However, their effectiveness is contingent on data quality, assumptions of stationarity, consideration of external factors, model complexity, and the risk of overfitting.Businesses should carefully assess these factors when implementing these models for practical use.
illustrates the annual consumption quantities of the top 5 plastic granules used in Thailand from 2012 to 2019, sourced from the Plastic Institute of Thailand.The top 5 plastic granules comprise PP (Polypropylene), HDPE (High-Density Polyethylene), LLDPE (Linear Low-density Polyethylene), PVC (Polyvinyl Chloride), and PET (Polyethylene Terephthalate).The data clearly indicates that PP holds the highest annual consumption quantity compared to the other plastic granules in the mentioned period.

Figure 1 :
Figure 1: Quantities of the top 5 plastic granules used in Thailand from 2012 to 2019 [2]

Figure 3 :
Figure 3: Actual and forecast values of PP granule price from SVR, XGBoost and ANN during cross validation periods

Figure 4 :
Figure 4: Actual and forecast values of PP granule price from ANN during test periods

Table 2 :
Label and features for the forecasting models This research collected PP granule price data from the Plastic Institute of Thailand, providing average prices from different manufacturers, which will serve as the label for the forecasting models.Furthermore, it gathered various features that influence the price of PP granules, as presented in Table2.The data collection spanned 11 years, from January 2011 to December 2021, and was divided into three datasets as follows• Training dataset (7 years): Spanning from January 2011 to December 2017, this dataset was used to build the forecasting models.

Table 3 :
Hyperparameters and their corresponding values of each algorithm

Table 4 :
Tuned hyperparameters of each algorithm

Table 5 :
Forecasting accuracy comparison