Leaf Area Index for Assessment of Urban Green Space and Tree Coverage to Support Elder Friendly Cities

Current walkability and living conditions assessments lack appropriate evaluation of urban green space and tree canopy which improves comfort of walking and living mainly for elderly. The study compares municipal registers of trees and green areas with satellite-derived Leaf Area Index (LAI) using Sentinel-2 imagery processed in SNAP. For evaluation correlation analysis, multiple linear regression (MLR) and XGboost modelling was applied for two types of pilot areas in Ostrava, CZ. Relationships between LAI and green, building and water areas, as well as number and kernel density of registered trees were explored. S-pilot areas with almost completed evidence of tree and green areas provide much better modelling results (R2=0.41) than U-pilot areas. For quantification of urban green space and trees, the LAI performs much better than the municipal evidence. XGboost modelling outperforms MLR and overcomes issues with heteroscedascity and normality of models’ residuals.


INTRODUCTION
Trees represent an important part of viable urban environments in that they are essential for the well-being of citizens, biodiversity, healthy air condition and mitigation of urban microclimates.Their occurrence positively influences inhabitants' propensity to walk and relax outside, especially for elderly and other vulnerable groups of people.
An environmental mobility determinant represents one of five fundamental categories of elderly mobility, usually recognised as a part of the "comfort" aspect.A 10-year review of urban walkability [1] discovered only 21% of papers discuss some components of comfort such as the weather, aesthetics of buildings, enclosure ratio, cleanliness, shade, and the presence of trees.Alvez et al. [2] designed Walkability Index for Elderly Health and included in the "urban scene" dimension the existence of trees/vegetation but with a simplified classification of urban condition into three levels (no trees/vegetation, moderate, and strong occurrence).A walkability index which quantifies tree occurrences or even tree shade does not yet exist.The reason for this is simple -a lack of data.To overcome this issue, we evaluated two possible data sources -digital municipal evidence of trees and green space, and several satellite-derived vegetation indices where the LAI seems to match best for quantification of the tree coverage.
Current cities recognize the value of city trees for the urban environment, carefully invest in them, expand green areas, water trees, and protect them against diseases and pests.The required digital evidence of public trees is commonly based on regular updating and monitoring.The problem is that this evidence covers only trees owned by municipalities, usually standalone trees, and is not comprised of all trees in parks and forests within a city, nor does it include private gardens with many trees.Additionally, tree registers usually do not contain important parameters for assessment of tree growth and its canopy such as age, height, and crown diameter.
The Leaf Area Index (LAI) is specified as the one-sided green leaf area per unit surface area.The LAI serves as a measure for the amount of plant canopy and its density.Both direct methods (e.g., leaf traps) and indirect methods (e.g., hemispherical photography, LIDAR or satellite imagery [3][4] can be used for enumeration of LAI.LAI was utilized e.g., to measure the urban forest impact of decreasing UV and temperature, the effects of surface coverage types on temperature and relative humidity of the urban environment, and to explore different distributions of dense trees to ameliorate urban microclimates.A satellite-derived LAI faces several challenges -it measures 'effective' leaf area which is less than actual LAI due to the expected random distribution of leaves [5], or underestimation for really high LAI values due to trends of saturation of vegetation indices caused by the limited penetrability of sunlight [6]. Large amounts of data and their asymmetric distribution require advanced processing and utilization of machine learning methods.Among others, the XGBoost (Extreme Gradient Boosting) method offers parallelized tree building, cache awareness and out of core computing, regularization for avoiding overfitting, efficient handling of missing data and in build cross-validation capability [7].
This paper evaluates the possibility of using the LAI instead of municipal evidence of trees and green areas for assessment of urban green/tree coverage, useful e.g., for walkability modelling.

STUDY AREA AND DATA SOURCES
The city of Ostrava (population 290,000; area 214 km2) lies in the NE corner of Czechia close to the Polish and Slovakian borders.Its heterogeneous urbanization consists of an agglomeration of urban blocks separated by crop fields, forests, and industrial parks.
To assess LAI, Sentinel 2 MSI image (9.9.2021, processing level 1C) was processed using SNAP (Sentinel Application Platform).The LAI was calculated with a Biophysical Processor using the following reflectance bands (B3, B4, B5, B6, B7, B8A, B11 and B12), as well as other information such as solar zenith and relative azimuth angles using radiative transfer models.For the Sentinel-2, images have been implemented in two different neural network architectures, the NNET 10m and NNET 20m.
Geographic Information System of Ostrava City (GISMO) comprises various data sets to support urban planning and development, maintenance of municipal property, and solving environmental and social issues in the city.The important part here is a register of trees, currently including 154,874 individuals.Unfortunately, the evidence covers only trees which are municipal property, thus only trees occurring in public spaces such as streets or parks.Private trees (in gardens, industrial, agricultural or forested areas) are not included.Also, some public spaces (parks, riverbanks, etc.) are not fully covered due to the number of trees and their natural reproduction.To address these issues, two types of pilot areas were delimited in Ostrava: S-areas with almost complete evidence of trees (dominating public spaces with many registered trees) where a good correspondence between evidence of trees and real tree coverage is expected (9 suitable pilot areas, sizes 0.9-3.4km2), (fig 1), and U-areas with weak tree evidence due to prevailing family dwellings with private gardens, or natural or seminatural areas (i.e., river banks, parks, forests) (4 unsuitable pilot areas, sizes 0.9-1.6 km2) (fig.2).Other auxiliary datasets include buildings, water bodies (OSM), and green areas (GISMO).

METHODOLOGY
Sentinel-2 MSI image processing covers atmospheric corrections, resampling, subset and LAI calculation.Atmospheric correction using Sen2Cor (ESA) was applied to the input Sentinel-2 L1C data.Sen2Cor is a standalone application that allows atmospheric, terrain and cirrus correction of TOA level 1C input data.Further, all bands were resampled to 10m pixel size.Finally, the Biophysical Processor S2 was used to compute LAI (fig.1,2 down).All operations were conducted in free available SNAP software version 9.0.0 (https://step.esa.int/main/download/snap-download/).According to  For the same spatial scheme, all other factors were enumerated.Point evidence of trees were aggregated to the cell units corresponding to LAI pixels (tree_count).Point locations of tree trunks is obviously not sufficient to represent a canopy.Trees create different crown sizes, on Ostrava's streets usually ranging from 0 to 18 m, with an average of 8 m.Accordingly, a kernel density estimation (KDE) of tree coverage was calculated (ArcMap v.10.8,Kernel density function, 10 m bandwidth and 10 m cell size) (tree_kde).Other supporting data is green areas registered by GISMO mainly for the purpose of lawn cutting and maintenance.The indicator was calculated as a percentage of pixel size coverage by green areas (green_area).Tree coverage is naturally suppressed by water bodies and buildings.Similar to green areas, water and building areas were enumerated with expectations of negative correlations to tree coverage and LAI (water_area, build_area).
An Explanatory Data Analysis provided basic statistical characteristics of data sets including histograms.Multiple scatter plots enabled evaluation of types of relationships between variables.Bivariate correlation analysis included Pearson and Spearman coefficients of correlation to explore the pairwise correlations between all variables.Further, multiple linear regression (MLR, SPSS version 18) is applied to understand the relationship of LAI (a dependent variable) on this set of independent factors, including multicollinearity evaluation, ANOVA, and assessment of beta standardized coefficients.
Finally, an advanced machine learning method, implemented in Python programming language, XGBoost (https://xgboost.readthedocs.io/en/stable/)was applied to improve the model and overcome issues of MLR evaluation.XGBoost [9], is a decision-tree-based ensemble machine learning algorithm that uses a gradient boosting framework.Boosting algorithms combine weak learners into a strong learner in an iterative way.The following parameters were tuned for XGBoost in this study: learning rate, number of gradient boosted trees (n_estimators), L1 regularization of leaf weights, minimum loss reduction, maximum depth of the tree (max_depth), fraction of features to be evaluated at each split (colsample_bylevel), subsampling rate (subsample), and random number seed (random_state).
In order to interpret the model and its results, we used unified framework SHAP (Shapley Additive Values), a widely used method to explain the predictions of machine learning models [10] where the model's prediction f(x) can be represented as the sum of its computed SHAP values, plus a fixed base value (Lundberg, Lee, 2017).To explain our model, we used SHAP TreeExplainer() class.The feature importance bar plot displays the important features in descending order of their importance and the magnitude of feature attributions.The summary plot combines feature importance with feature effects.Each point on the summary plot is a Shapley value for a feature and an instance.The features that influence the model's outcome in a positive way are highlighted in red, whereas the features that impact the model's outcome negatively are highlighted in blue [10].

RESULTS
Results of the bivariate correlation analysis for two main interested variables LAI and tree_kde show important differences of correlation between suitable and unsuitable areas (fig.3 up).While the Spearman coefficient of correlation for S-pilot area ranges between 0.283 and 0.474 (p=0.001), the correlations for U-pilot areas are significantly lower (0.081-0.155).
MLR for S-pilot areas (N=160997) reached adjusted R 2 =0.354 (p=0.000) and ANOVA discovered that the model explains 35% of data variability.MLR for U-pilot areas (N=49789) provides only R 2 =0.136 and ANOVA indicates the model explains only 14% of data variability.Both models show satisfactory low multicollinearity (VIF in the range of 1.181-1.355)but residuals are not N distributed and some of the predictors indicate issues of heteroscedascity in partial regression plots (fig.3

down).
To overcome issues in MLR models, XGBoost was applied.In order to evaluate the performance of the XGBoost model, we split our dataset into two train and validation subsets (80:20) using scikitlearn library (version 1.1.1)and the method train_test_split().Train data is used for learning while validation data enables an unbiased evaluation of the model while tuning model hyperparameters (R2 and RMSE metrics).XGBoost model is defined as thus: model_xgb = xgb.XGBRegressor(n_estimators = 100, learning_rate = 0.3, max_depth = 3, reg_alpha=0.5,gamma= 0, subsample = 1 , booster = 'gbtree', colsample_bylevel = 1, random_state = 44) The feature importance bar plots (fig.5) illustrate a feature importance as defined by SHAP algorithm.For S-pilot areas, the green_area variable represents the most important feature, changing the predicted LAI value by 27 percentage points.It is followed by the build_area variable, changing the predicted value by 12 percentage points.The influence of registered trees is about 3 and 1.5 percentage points for KDE and tree counts respectively.Surprisingly, the water_area variable has very little impact on the model's predictions.
Results for U-pilot areas show changes in the order of feature priorities.As expected, the building area variable had the highest impact (16%), followed by the green_area variable (13%).While

CONCLUSION
All used methods (correlation analysis, MLR and XGBoost modelling) confirm significant differences between S-pilot and U-pilot areas in Ostrava.S-pilot areas provide much higher parametric and non-parametric correlation coefficients between LAI and treekde than U-pilot areas.MLR shows at least double adjusted R 2 for S-areas compared with U-areas.However, results of MLR models failed in several modelling requirements (heteroscedascity, N distribution of residuals).XGBoost models achieved the best results.The model for S-pilot areas reached R 2 =0.41 (RMSE=0.48)where green and built-up areas play the main role followed by a cumulative importance of tree_kde and count.Results for U-pilot areas are much worse.This confirms our hypothesis that the LAI works better than simple evidence of individual trees (or kernel density estimation of tree coverage) for evaluation of urban green space.It opens the possibility of using LAI as a suitable proxy for assessment of this important urban factor for walkability evaluation as well as other evaluations of urban living conditions.

Figure 1 :
Figure 1: Green space and trees in one S-pilot area (Ostrava-Poruba) (up) and LAI values (down)

Figure 2 :
Figure 2: Green space and trees in one U-pilot area (Ostrava-Silesian) (up) and LAI values (down)

Figure 3 :
Figure 3: Spearman r between LAI and tree_kde for both types of pilot areas (p=0.001) and partial regression plots (MLR) for S-pilot areas

Figure 4 :
Figure 4: Training and test losses for S-pilot (up) and U-pilot (down) areas