Deployment of Random Forest Algorithm for prediction of ammonia in river water

The fascinating aspect of machine learning (ML) is its diverse application. ML models are most useful when it comes to the conservation of natural resources through sustainable usage. An essential natural resource, water is vital to life as we know it. Ammonia poses a serious hazard to aquatic life and is a primary source of pollution in waterways. To estimate the ammonia content in river waters, machine learning algorithms are used in this study. After testing and training many ML regression models, The Flask API is used to deploy the model that fits the data the best. Based on the values of pH, DO (dissolved oxygen), and COD (chemical oxygen demand), the website shows the amount of ammonia in the river water.


INTRODUCTION
The management of water quality depends heavily on the precise evaluation of contaminants.A prominent contaminant that has a considerable impact on both aquatic ecosystems and human health is Ammonia, a nitrogen-based molecule.Therefore, it has become crucial to build prediction algorithms that can appropriately forecast Ammonia concentrations in water bodies like rivers, ponds, aquacultures, etc. Industries, farmlands, and garbage treatment plants diffuse their waste into water bodies like rivers and oceans [1].The presence of Ammonia in fresh-water bodies like rivers results in detrimental effects such as eutrophication and low oxygen levels [2].pH, Dissolved Oxygen (DO), and Chemical Oxygen Demand (COD) are the three water quality parameters that influence the Ammonia content in water either directly or indirectly.Hence, using these three parameters as independent variables, the prediction of Ammonia can be performed.
The relationship between pH and Ammonia concentrations is an intricate interaction that determines Ammonia lethality.This link helps us understand the health of aquatic ecosystems, the efficiency of water treatment, and various ecological processes [3].A network of interconnected processes including microbial activity, nutrient enrichment, and the breakdown of organic materials may cause the development of COD that causes a rise in the concentrations of Ammonia in water.Although COD does not directly create Ammonia, it may influence the conditions that make Ammonia compounds more likely to be released and converted [4].Low DO concentrations diminish aquatic organisms' capacity to tolerate or metabolize Ammonia, making them more vulnerable to harmful effects [5].Traditional monitoring techniques necessitate extensive and time-consuming sampling, making it difficult to obtain realtime data for effective pollution management.
To find complex relationships and patterns between the aforementioned independent variables and Ammonia concentrations, Machine Learning (ML) models are trained using a dataset of the aforementioned three water parameters.This allows for the accurate prediction of Ammonia levels.The Sustainable Development Goals (SDG) 1, SDG 2, and SDG 14 are all in accord with the goals of protecting freshwater ecosystems.

LITERATURE REVIEW
The study aims to highlight the detrimental effects of Ammonia in freshwater bodies and deploy the best-fit machine learning algorithm to estimate the Ammonia level in parts per million (ppm) in river waters.

Negative Impacts of Ammonia in River
Water Bodies A multitude of harmful impacts are associated with the presence of Ammonia in river waters such as deterioration of the immune system, damage to the gills, and osmoregulation, it also effects adversely on the reproductive system of the fish.High Ammonia levels were shown to cause anxiety in guppy fish, in Uganda, which in turn caused inflammation by altering the mRNA, which resulted in oxidative stress [6].The fish population was greatly impacted by water quality parameters such as temperature, pH, Ammonia, and carbon dioxide, which also resulted in smaller and lighter tilapia and catfish [7].Additionally, cyanobacterial blooms and ecological decline have been brought on by high Ammonia levels [8].The significance of NH 3 -N as a marker of water pollution highlights the need for precise prediction models to control and reduce NH 3 -N contamination.Datasets linked to water quality were produced using sensors connected to the Internet of Things (IoT), which also assessed water parameters like pH and oxygen level concentration.Machine learning (ML) algorithms were then trained on these datasets to forecast Ammonia levels [9].An ultrasonicstrengthened breakpoint chlorination procedure with appropriate oxidizers was one of the measures done so far to lower Ammonia content [10].To protect the health of fish populations and the larger ecological balance, it is essential to regularly monitor Ammonia levels.By keeping these dangerous compounds at the right levels, fish farm productivity may be increased.

Regression Models
Anticipating Ammonia's composition is essential due to its detrimental effects on freshwater habitats, and this may be done using ML algorithms.Several regression techniques maximize the performance of a certain dataset, each with its own set of constraints, and the prediction error is decreased via hyperparameter tuning [11].Boosting Regression models and other supervised regression ML approaches like Multilayer Perception (MLP), Decision Tree, and Random Forest (RF) have previously been used to predict water quality in urban water systems [12].Following research on neural networks, Artificial Neural Network (ANN) algorithms often need huge volumes of data [13].The principle behind K-Nearest Neighbours (KNN) regression is that neighboring objects are more likely to belong to the same class, however, it is a slow learner since it does not spend any time training itself and nothing further is done except to store the training data [14].Ridge regression is extensively employed to cope with multi-collinearity, and it minimizes the variability of parameter estimators, requiring the inclusion of a bias into the regression equation [15].

Deployment of Random Forest Algorithm using Flask
The practicality of the regression models is discussed in light of the prerequisites for their application.The most accurate model for forecasting stream water quality is the RF algorithm [16].The RF approach delivers trustworthy findings when the sample size is appropriate, and the accuracy of the method depends on the number of trees in the algorithm.It is usually used in data based on nature since environmental data sets are unpredictable.[17].This RF algorithm is deployed using Flask API.Flask is a framework used to create web applications in Python [18].The present work focuses on deploying ML regression algorithms for the prediction of Ammonia in particular in river water rather than an overall water quality prediction, as was done in past studies, due to the large harm caused by Ammonia in contrast to other generic drivers of water quality.

State-Of-The-Art Approaches
Many approaches have been studied so far for the prediction and eradication of Ammonia in freshwater bodies.Separation and purification technology [19] is one of the hybrid methods used for removing Ammonia from water.A case study on Ammonia nitrogen prediction in water highlighted that traditional methods require cumbersome sample handling and analysis.To avoid this, the study [20] utilized deep learning techniques for water quality monitoring; however, this requires a large amount of data.In this paper, the Random Forest algorithm is proposed and deployed for Ammonia prediction as it gives good performance metrics with a comparatively small number of samples.

METHODOLOGY
This section discusses the methods and algorithms used for the prediction.

Significance of Machine Learning Model Deployment
One pressing need for sustainable fish production is the prediction of Ammonia levels in water bodies due to the significant harm it poses to aquatic ecosystems and, specifically, fish populations.Machine learning plays a pivotal role in this context, as it can accurately forecast Ammonia concentrations once the model is trained with the required number of samples.This prediction enables preventive actions to avoid any large-scale deterioration of water quality for the fish's survival.These actions contribute to the protection of freshwater environments and meet the objectives set out in the Sustainable Development Goals (SDGs) fourteen.

Data Collection and Processing
The water quality data is collected by the China Environment Monitoring Centre [21].pH, DO, COD, and Ammonia sensors are used to obtain the water quality parameters.This dataset with the four variables from several rivers is formatted into a CSV file.As a first step, the dataset is cleaned and pre-processed to avoid overfitting in the ML models and to arrive at accurate predictions.Cleaning the dataset is done by removing the samples that have NaN values and the outliers in the dataset are removed with the Isolation Forest algorithm [22].

Implementation of Machine Learning models
Four ML models were trained on 80% of the dataset and tested on 20% dataset.The hyper-parameters were tuned to get the best possible solutions for the respective ML models.
• Neural networks: This algorithm is inspired by the functioning of the human brain and performs the task of prediction.
In the case of Ammonia prediction, the activation function used is "relu" for three hidden layers in the feed-forward process.Considering the moderate size of the dataset, the number of epochs used is 500 to get the prediction output.• Linear Regression: Ridge regression, an extension of linear regression is used in the multicollinear dataset where pH, DO and COD are influencing the Ammonia content in the river waters.The regularization parameter is set as 1.
• Clustering: KNN predicts the unknown value based on feature similarity.The clustering method utilized the 5 nearest neighbours to find the best possible predicted value.The decision for selecting 5 nearest neighbours is determined by iterative methods.• Random Forest Algorithm: Random decision trees are particularly competent in accommodating data nonlinearities, resulting in improved predictive accuracy when compared to alternative regression models.Ensemble methods of modelling, for instance, random forests, are well suited to moderate to huge datasets.The RF technique is known for being an efficient learning algorithm.The method uses an averaging technique to generate a mean forecast from numerous distinct trees.The selected RF Algorithm is illustrated below: from a total of p predictor variables randomly select a sample of n variables.for i ← 1 to s do If splitting criteria are obtained with i th predictor then the internal node is split into two child nodes; break; end if; end for; end while; end for; Index the generated M subtrees.Compute the ensemble tree of all the generated subtrees.End

Flask API for Model Deployment
According to three performance criteria, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and r2_score, the RF model fits the data set the best.RF algorithm is deployed using the Flask microframework.The "Post" method is rendered for the HTML template intended for showing the anticipated value by compiling the independent variables.The web page for Ammonia prediction is locally hosted, provides the most precise predicted Ammonia value by the RF algorithm, and displays it on the web browser.The inputs required are pH, DO, and COD values; the output is displayed with the click of a button that is named "Ammonia content".The predicted result for 20 samples using the RF algorithm that is hosted on a web page using the Python code and the Visual Studio platform is represented in Table II.

CONCLUSION
To sustain life and food sources for the generations to come, it is imperative to take preventive actions to sustain natural resources.The deployment of a best-fit ML model for Ammonia prediction in river waters can be useful in various conservation approaches as it can predict the Ammonia content in the water and display it on the web page.Based on the displayed value, necessary actions can

Figure 1 :
Figure 1: Block diagram of the deployment of the ML model

Figure 2 :
Figure 2: The Actual and Predicted values of Ammonia with RF Model

Figure 4 :
Figure 4: Web page displaying the predicted Ammonia value

Table 1 :
EVALUATION OF REGRESSION MODELS

Table 2 :
THE RESULTS OBTAINED BY RF ALGORITHM