SOL-NeRF: Sunlight Modeling for Outdoor Scene Decomposition and Relighting

Outdoor scenes often involve large-scale geometry and complex unknown lighting conditions, making it difficult to decompose them into geometry, reflectance and illumination. Recently researchers made attempts to decompose outdoor scenes using Neural Radiance Fields (NeRF) and learning-based lighting and shadow representations. However, diverse lighting conditions and shadows in outdoor scenes are challenging for learning-based models. Moreover, existing methods may produce rough geometry and normal reconstruction and introduce notable shading artifacts when the scene is rendered under a novel illumination. To solve the above problems, we propose SOL-NeRF to decompose outdoor scenes with the help of a hybrid lighting representation and a signed distance field geometry reconstruction. We use a single Spherical Gaussian (SG) lobe to approximate the sun lighting, and a first-order Spherical Harmonic (SH) mixture to resemble the sky lighting. This hybrid representation is specifically designed for outdoor settings, and compactly models the outdoor lighting, ensuring robustness and efficiency. The shadow of the direct sun lighting can be obtained by casting the ray against the mesh extracted from the signed distance field, and the remaining shadow can be approximated by Ambient Occlusion (AO). Additionally, sun lighting color prior and a relaxed Manhattan-world assumption can be further applied to boost decomposition and relighting performance. When changing the lighting condition, our method can produce consistent relighting results with correct shadow effects. Experiments conducted on our hybrid lighting scheme and the entire decomposition pipeline show that our method achieves better reconstruction, decomposition, and relighting performance compared to previous methods both quantitatively and qualitatively.


Input Images
Figure 1: Given a set of input images of an outdoor scene, our SOL-NeRF pipeline decomposes them into geometry and material properties, which enables rendering the input scene from a novel viewpoint and relighting it with a different illumination.

INTRODUCTION
Neural Radiance Fields (NeRF) [Mildenhall et al. 2020] that are learnable in the differentiable rendering process have become increasingly important for inverse rendering tasks.Several pioneering works [Oechsle et al. 2021;Wang et al. 2021;Yariv et al. 2021] learn to reconstruct the geometry of the input scene with volume rendering based on NeRF and implicit surface representations.While achieving good-quality geometry and appearance on relatively small scenes under a fixed lighting condition, their performance drops on larger scenes with varying illuminations, and none of them can relight the input scene.This is because they learn the appearance using a single MLP (Multi-layer Perceptron) that entangles material, lighting and shadow.
To reconstruct outdoor scenes with varying lighting conditions, several methods have been proposed.NeRF-W [Martin-Brualla et al. 2021] splits the original NeRF network into two: a static one and a transient one, where the latter is controlled by a per-frame appearance embedding.This strategy successfully peels off the lighting effect.However, it is still an implicit illumination representation, rather than an explicit decomposition of material and lighting.Therefore, one can only relight the scene by interpolating the latent codes representing two lighting conditions.NeuralRecon-W [Sun et al. 2022] also uses latent embeddings to model different appearances across images while focusing on geometry reconstruction.By utilizing a surface-based sampling strategy, it can efficiently constrain the samples to be around the surface, yielding satisfactory reconstruction results.Nonetheless, NeuralRecon-W is incapable of relighting also due to its implicit lighting representation.
Unlike the above methods, NeRF-OSR [Rudnev et al. 2022] explicitly decomposes the lighting and material of outdoor scenes.It extends the original NeRF by modeling the lighting with a secondorder Spherical Harmonics (SH) mixture.The material is predicted using an albedo network, while shadows are estimated by a shadow network that takes the SH coefficients as input.The SH lighting for an image is predicted by a learnable embedding for each image.NeRF-OSR shows promising decomposition and relighting results, but the learned shadow network can easily overfit the input images.This is because it merely outputs a scalar value (between 0 and 1) that suppresses the non-shadowed results in the image space, rather than explicitly computing shadow effects from lighting and geometry.Very recently, FEGR [Wang et al. 2023] proposed to use hash grid encoding [Müller et al. 2022] to model scene properties while utilizing explicit ray tracing to obtain high-order effects such as shadows.It models lighting with a lighting prediction network that predicts the lighting intensity from a specific direction.However, the lighting network may predict an inaccurate illumination as it learns a single lighting model which mixes high-intensity sunlight and low-intensity sky light.As a result, the learned sunlight may have an irregular shape and the intensity distribution of the sky can be less accurate.
Aiming at high-quality scene decomposition and relighting, we propose a novel method called SOL-NeRF (Sunlight Modeling for Outdoor Scene Decomposition and ReLighting) with a hybrid lighting representation enhanced by geometry and lighting priors (see Fig. 1).To address the complication of outdoor lighting with varying intensity distributions, we use a hybrid representation based on Spherical Gaussian (SG) and Spherical Harmonics (SH) to model sunlight and sky light respectively.The proposed hybrid representation also allows more accurate and reliable estimation of shadow effects, including explicit ray-tracing using SG, and learned Ambient Occlusion (AO) using SH.To enable quality and efficient geometry reconstruction, we employ the octree-accelerated sampling strategy [Sun et al. 2022], and transform signed distance values to opacity values for volume rendering using the formulation in [Wang et al. 2021].The learned Signed Distance Field (SDF) also provides necessary inside-outside information for the shadow calculation.Moreover, we utilize essential priors to resolve ambiguities and enhance the performance of scene decomposition.More specifically, we analytically link the color of sunlight with the elevation of the sun in the sky dome, making the learned lighting more accurate.A relaxed Manhattan world assumption adapted from [Furukawa et al. 2009] is used to improve the surface normal distribution, thus improving the reconstructed scene geometry.
Overall, our contribution can be summarized as follows: • We propose SOL-NeRF, a pipeline that decomposes outdoor scenes into geometry, reflectance and lighting.It can efficiently extract high-quality geometry and reflectance from a set of images of outdoor scenes under varying illuminations.Inverse rendering [Ramamoorthi and Hanrahan 2001b;Sato et al. 1997] is a fundamental problem in computer graphics.It aims to decompose the geometry, material, and lighting from observed images.Early works [Gao et al. 2020;Yu et al. 2020] often involve intrinsic images or geometry proxies to decompose a scene.Recently, with the emergence of the implicit representation and neural radiance fields (NeRF), a number of works proposed to learn the underlying geometry, material and lighting from given observations.NeRD [Boss et al. 2021a] is the pioneering work that decomposes material and geometry with given lighting conditions.It takes direct lighting and one-bounce indirect lighting into consideration and models the visibility with a neural network.Later, PhySG [Zhang et al. 2021a] works under an unknown lighting condition and uses signed distance fields (SDF) and Spherical Gaussian (SG) functions as geometry and lighting representation respectively but assumes the whole scene shares the same roughness and specular components.Based on PhySG, InvRender [Zhang et al. 2022] introduces another set of SG functions to model indirect illumination.Boss et al. [2021a] also use SG as its representation and adopt a twostage training strategy by first optimizing a sampling network and then a decomposition network.To better predict material, Neural-PIL [Boss et al. 2021b] builds a smooth material autoencoder to predict the material parameters.NeRFactor [Zhang et al. 2021b] also trains a learned BRDF autoencoder and further separates shadows from it with a visibility network.NeROIC [Kuang et al. 2022] separates the static appearance and the transient appearance with two branches similar to NeRF-W [Martin-Brualla et al. 2021] and uses Spherical Harmonic (SH) functions to approximate lighting.
To model high-frequency lighting beyond the capability of SG and SH, NvDiffRec [Munkberg et al. 2022] and NvDiffRecMC [Hasselgren et al. 2022] propose to model it with a large environment image and use DMTet [Shen et al. 2021] as its geometry representation.These methods mainly focus on decomposing different components from a scene with a few objects under synthetic or real illumination, and only a small number of methods target outdoor scenes.NeRF-OSR [Rudnev et al. 2022] is the first work that attempts decomposition and relighting on outdoor scenes with NeRF.
NeuLighting [Li et al. 2022a] directly predicts the scene properties from latent codes extracted from the input images.FEGR [Wang et al. 2023] further improves the decomposition quality by calculating ray intersections with the explicit surface extracted from an SDF network.

Scene-Level Reconstruction with Neural
Rendering.
With the popularity of implicit representations [Chen and Zhang 2019;Mescheder et al. 2019;Park et al. 2019] and neural surface rendering [Thies et al. 2019;Yariv et al. 2020], a few methods [Guo et al. 2022;Sun et al. 2021] start to work on scene-level reconstruction by optimizing the underlying implicit geometric representation and the appearance network.Later, neural volumetric rendering becomes mainstream and several works build the relationship between implicit representations and volume rendering.NeuS [Wang et al. 2021] and VOLSDF [Yariv et al. 2021] transform signed distance values to density values in volume rendering.UNISURF [Oechsle et al. 2021] replaces alpha values in volume rendering with occupancy values.MonoSDF [Yu et al. 2022] and NeuRIS [Wang et al. 2022] utilize the normal prior learned from a 2D normal estimation network to guide the geometry reconstruction.Apart from these methods focusing on objects and indoor scenes, NeuralRecon-W [Sun et al. 2022] works on outdoor scenes and improves the sampling strategy with an octree with the point cloud produced by Structure-from-Motion (SfM) methods [Schönberger and Frahm 2016;Schönberger et al. 2016].FEGR [Wang et al. 2023] also utilizes NeuS's formulation and accelerates the reconstruction process with a multi-resolution hash table [Müller et al. 2022].

Outdoor Lighting Estimation.
For outdoor lighting estimation, there are two commonly used settings: single-view lighting estimation and multi-view lighting estimation.Given a single view observation, previous methods [Hold-Geoffroy et al. 2019, 2017;Song and Funkhouser 2019;Zhang et al. 2019;Zhu et al. 2021] propose feed-foward networks to predict the sun's location and the sky intensity.However, single-view lighting estimation is extremely ill-posed, which makes it hard to predict an accurate sun location and sky intensity.Under the multi-view setting, Duchene et al.
[2015] use a progressive method to predict both the scene material and lighting, while Philip et al. [2019] directly use a multi-view dataset and target lighting to relighting outdoor scenes.NeRF-W [Martin-Brualla et al. 2021] models the lighting implicitly with a learnable feature vector for each input image.NeRF-OSR [Rudnev et al. 2022] explicitly approximates the outdoor lighting with SH functions.FEGR [Wang et al. 2023] instead models the lighting intensity with a neural network from a specific direction.However, this lighting formulation mixes the high-intensity sunlight and the low-intensity sky light, and thus may predict the sun with an irregular shape and the sky with an

METHOD
We propose SOL-NeRF, an implicit scene representation based method that decomposes scene properties from multi-view images.Our method is briefly described with illustrations in Fig. 2. The rest of the section elaborates on the key components and is organized as follows.Firstly, we introduce some of the preliminaries, including the formulation of neural rendering and NeuS [Wang et al. 2021], the rendering equation, and the sampling strategy adopted from NeuralRecon-W [Sun et al. 2022] (Sec.3.1).Secondly, we elaborate on our novel hybrid lighting representation and present the rendering method accordingly (Sec.3.2).Thirdly, we discuss how to cast shadows under our hybrid lighting representation (Sec.3.3).Finally, we introduce additional priors that improve the decomposition and relighting performances of our method (Sec.3.4).

Preliminaries
).However, the density field cannot accurately express the surface geometry.To tackle this problem, NeuS [Wang et al. 2021] models geometry with a signed distance field (SDF) function    and transforms the signed distance value to the density value with Φ  (   ( ( ) ) , 0 , where Φ s (x) = (1 + e −sx ) −1 and s is a trainable deviation parameter.
3.1.2The rendering equation.Rather than directly using the output of the color network as in NeRF, we perform shading based on the rendering equation [Kajiya 1986] for every sample point x  .We omit the specular lobe of the BRDF, and only preserve the diffuse one with direct lighting to calculate the color of the sample point: where a  and N denote albedo and surface normal for point x  ,  (x  ,   ) denotes visibility at x  from direction   ,   (x  ,   ) denotes incoming radiance.  is approximated by the hybrid lighting of SG and SH (Sec.3.2).
Traced SG Shadow Ambient Occulsion

Hybrid Lighting Representation
Our method is dedicated to outdoor scenes, which are mostly captured during cloudy or sunny daytime.Hence we propose a lighting representation specially optimized to model daylight, which can also handle scenes without direct sunlight as a degenerate case.The upper part of Fig. 2 illustrates our lighting formulation.The basic idea is to separate the daylight into two components: 1) direct sunlight, which is a highly concentrated and bright 'local' light source.2) sky light, which is a more uniformly distributed 'global' light source on the sky dome.Actually, both components are effects of sunshine.The sky light is simply the scattered sunlight, the color of which can be calculated by Reyleigh and Mie Scattering [Tyndall 1869].
However, directly calculating the whole sky and its lighting effect at every sample point using Reyleigh and Mie Scattering equations is expensive.Instead, we model the two components based on two lighting representations: 1) a Spherical Gaussian (SG) illumination model to represent direct sunlight, and 2) a Spherical Harmonics (SH) illumination model to represent sky light.

3.2.1
The Formulation and Rendering of SG and SH.Spherical Gaussians are Gaussian functions defined on spheres:  (v; , , l) = l  ( •v−1) , where v is the function input representing the light direction,  and  control the center and deviation, and l controls the peak value.We use 7 learnable parameters (3 for peak intensity , 3 for direction , and 1 for deviation ) to represent the SG lobe and employ the same rendering equation as [Wang et al. 2009].As for Spherical Harmonics [Ramamoorthi and Hanrahan 2001a], it uses a set of basis functions    (v), where − ≤  ≤  to decompose a global lighting distribution on a sphere.Here we use the first-order SH mixture ( = 0, 1), which reduces the parameters needed to represent the sky light, and is beneficial to learning the near-uniform sky light.The first-order SH has 4 lobes, meaning we need 12 parameters for SH.
3.2.2Motivation of the Hybrid Representation.The separation modeling of sunlight and sky light using SG and SH has several advantages: 1) The rendering with SG and SH illumination is memory-efficient and fast because only 7 SG and 12 SH parameters are required and the shading calculation is close-form.
2) It separates the high-intensity sunlight and the relatively low-intensity sky light to reduce the learning difficulty of the complex outdoor illumination.3) Casting shadows under this representation is efficient since only one ray-mesh intersection test is needed (see Sec. 3.3).The explicit shadow calculation can produce more realistic relighting results compared with the implicitly learned shadow in NeRF-OSR [Rudnev et al. 2022] that may overfit input images (see Sec. 4).

Shadow Calculation
As a direct effect of the interaction between light and geometry, shadow is largely influenced by the lighting.We calculate the shadow according to the SG and SH components of the lighting, respectively.We show the shadow calculation process in Fig. 3.

Ray
Tracing for SG Shadow.The proposal of SG illumination is to simulate direct sunlight.The sunlight is nearly a directional light source given its highly concentrated nature.Consequently, the shadow cast by the sunlight can be simply obtained by tracing the light ray reversely (from the shading point along the reversed SG direction).To accelerate the ray tracing, we construct an octree based on the reconstructed geometry by    .

Learned Ambient
Experiments in Fig. 11 show the learned   is smoother than the original sampled  (•) and can boost reconstruction results.Now we have introduced both the lighting representation and the shadow calculation.Eq. 1 can be written as follows: where   (x  , ) is the intersection test result between the reconstructed mesh and the ray x  + .It equals 0 when the ray-mesh intersection exists.   is the mixture weight of spherical harmonics basis functions and l is the color of the sun.

Priors
In addition to the proposed lighting representation, we also exploit two priors to further improve the performance of SOL-NeRF, including the sunlight color prior and relaxed Manhattan world geometry prior.

Sunlight Color
Prior.The proposed hybrid lighting representation is under-constrained regarding the sunlight color.In the real world, the color of the sunlight is closely related to sun elevation angle and altitude (see Fig. 4).Various models have been proposed to model the sunlight and sky color with respect to these parameters [Hosek and Wilkie 2012;Nishita et al. 1993;Preetham et al. 1999].We adopt the model proposed in [Nishita et al. 1993], since it is purely based on physical calculations.However, if we calculate the color at rendering time, the computation overhead makes training impractical.We choose to fit the R,G,B channels of the light with an analytical function a  =   ( ), where   is a piecewise polynomial function and  is the elevation angle of the sun.However, this model only considers the clear sky with a certain density of air and water droplets.In practice, there are many factors (e.g., location-dependent air and water droplets density, cloud and mist, and errors of the imaging system) that make this model According to the learned sun elevation  , atmosphere thickand air density, the optical depth is calculated.Then the color of the sunlight that reaches the surface is obtained using wavelength-dependent Reyleigh and Mie Scattering laws [Nishita et al. 1993;Tyndall 1869].Using this method we can associate every  with a color.This function is approximated using a polynomial that can be evaluated efficiently in training time.inaccurate.To make up for this, we modify the final color of the sun SG to l = c ⊙ a  + b, where c, b are learnable parameters, and are initialized to 1 and 0 respectively.For more details of the function   , please refer to the supplementary material.

Relaxed Manhattan World
Prior.Man-made buildings often have surfaces aligned with three orthogonal directions.This assumption is called the Manhattan world assumption [Furukawa et al. 2009] and is widely used in 3D reconstruction methods.Since the target of our outdoor scene is often buildings, we consider applying this assumption to our pipeline.
However, many non-Mahattan-style buildings do not fully obey this assumption.So we instead apply a relaxed version of the assumption: we assume the scene has one upright direction (ground normal), and most surface normals are either parallel or perpendicular to this direction.This is natural as a building usually sits on the ground, and its facade normal is naturally perpendicular to the ground normal.We formulate this prior as a loss   : where N and U are normalized vectors of normal and upright direction.
After introducing all the components, we formulate the overall loss function of our pipeline as follows: where the color loss   minimizes the difference between rendered color and the ground truth pixel color and is defined as: where  is the number of sampled rays in a training batch. ( ) and   ( ) are the rendered pixel color and ground truth pixel color for the  th ray.  = |∥∇   ∥ 2 − 1| is the eikonal loss to allow smooth surface reconstruction.The foreground mask loss   is the same as that in NeuS [Wang et al. 2021] and NeuralRecon-W [Sun et al. 2022].For detailed hyperparameter settings, please refer to the supplementary material.We evaluate on two datasets: Sites 1, 2, 3 of the OSR dataset from NeRF-OSR [Rudnev et al. 2022], and a synthetic dataset (denoted synthetic) with three scenes (Syn 1, 2, 3) collected from Blendswap.com.
We evaluate the novel view synthesis results on the synthesis dataset and relighting results on both datasets.The ablation studies are conducted on the synthetic dataset.For the scene decomposition (including novel view synthesis) and relighting quality, we compare the PSNR, SSIM [Wang et al. 2004], and MSE (Mean Squared Error) to evaluate the similarity of the novel view rendering with the ground truth image and albedo (the ground truth albedo is only available in our synthetic dataset).For the geometry, we compare MAE (Mean Absolute Error) between the rendered and ground truth normals.We additionally evaluate the proposed hybrid lighting representation with eight novel HDR (High Dynamic Range) environment maps collected from the Blender software and hdrmaps.com.

Scene Decomposition
In Fig. 6 we include decomposition results for one scene (Site 3) from the OSR dataset and two scenes (Syn 1, 3) from the synthetic dataset.Our method can recover a better geometry as indicated by the normal quality.On the synthetic scenes, our decomposed albedos are less shadow-contaminated and more accurate in terms of color, while the ones produced by NeRF-OSR have shadow leftovers and appear reddish.These can be seen more clearly by the quantitative evaluation shown in Table 1.Our method has a similar performance to NeRF-OSR for novel view synthesis as a compound task under known lighting, while it outperforms NeRF-OSR in the decomposition results of albedo and normal, benefiting also the scene relighting as presented below.

Scene Relighting
In Fig. 9, we show relighting results for two scenes (Sites 1, 2) of the OSR dataset and one scene (Syn 2) of the synthetic dataset.For each scene, the ground truth image and two relighting results of that view are displayed.The calculated or predicted shadow is shown along with the relighting results.From Fig. 9 we can see the predicted shadow of NeRF-OSR does not effectively reflect the change of the lighting condition, but look more like a 'monochrome' version of the albedo.The reason behind this is the shadow prediction network only sees 200 to 300 lighting conditions, causing it to overfit these lighting settings.And the inaccurate and noisy normals produced by NeRF-OSR also negatively affect the relighting results, making the geometry surface look bumpy.Our pipeline solves these problems by utilizing the hybrid SG and SH lighting and its shadow  2 show the performance gain of our method over NeRF-OSR.

Hybrid Lighting Evaluation
Since the SG and SH hybrid lighting representation is newly proposed, we need to evaluate how well it can approximate outdoor lighting.For the lighting evaluation experiment, we collected six equirectangular environment maps, representing different conditions of sky lighting.For each environment map, we optimize our hybrid lighting representation to minimize the error between the ground truth map and the rendered map of our lighting.We then put the ground truth lighting and our optimized lighting into Blender [Community 2023], using the Cycles engine to get the rendered results of a diffuse scene under these two lighting settings.The SH lighting is directly rendered into an environment map, and the SG lighting is implemented using an explicit 'Sun' light source.
In average, the PSNR between the rendered results of our method and ground truth reaches 32.829.We show the results of this experiment in Fig. 10.It can be seen that the SG and SH hybrid lighting can effectively approximate various environment maps with minor errors in rendering results both visually and quantitatively.

Ablation Studies
In this subsection, we ablate several important design choices in our framework.
4.5.1 Sunlight Prior.The sunlight prior connects the elevation and color of the sun, which reduces the degree of freedom of the learnable lighting and its learning difficulty.We show the ablation results in the first two rows of Fig. 5 and Table 3.The absence of this prior can cause inaccurate albedo decomposition since we do not limit the light color.This effect is observed in other methods that model the sunlight without any constraints, like NeRF-OSR [Rudnev et al. 2022] and FEGR [Wang et al. 2023].Moreover, the training process can be sensitive to the initial values of SG and SH parameters and may fail to reconstruct the input scene (the second row of Fig. 5).
4.5.2Relaxed Manhattan-World Assumption.The relaxed Manhattanworld assumption regularizes the learned geometry and helps to reduce the bumpy surface artifact caused by the positional encoding.Qualitative comparisons between without and with the Manhattan-world assumption are shown in the third row of Fig. 5.It can be seen that the reconstructed normal is smoother when the Manhattan-world assumption is applied while the learned normal may be bumpy without this regularization.We also report the MAE between reconstructed normals and the ground truth normals

Shadow Calculation Strategy.
As mentioned in Sec.3.3, the shadow is approximated by explicitly casting rays to the mesh extracted by the SDF network and the ambient occlusion.Another possible shadow calculation strategy is to predict it from lighting conditions similar to NeRF-OSR.Specifically, we set up another MLP network to predict shadows from the SG and SH parameters in our framework.We compare this strategy (Implicit.S) with our method (Explicit S.) on the scene decomposition task in the fourth row of Fig. 5 and the relighting task in Fig. 8.It can be seen that the implicit shadow strategy fails to correctly decompose shadow and render the input scene under novel lighting conditions.and encourage it to be consistent with the sampled ambient occlusion in Eqn. 2.Here we ablate these two strategies of ambient occlusion calculation in Fig. 11.It shows that directly calculating ambient occlusion can cause noises in rendered results since it is computationally expensive to sample a lot of points around a single sample point while the predicted ambient occlusion by our network is smooth and noise-free.

CONCLUSION
In this paper, we propose SOL-NeRF for outdoor scene decomposition and relighting based on neural radiance fields.Specifically, we use a signed distance field as our geometry representation.Instead of modeling the sky as a whole like previous methods, we separate the learning of the sunlight and the sky light and introduce a hybrid sunlight representation where the sun is modeled as a Spherical Gaussian function and the sky is approximated by first-order Spherical Harmonics.Based on the proposed geometry and lighting representation, the shadow can be approximated by casting rays to the mesh extracted from the signed distance field and learned ambient occlusion.Our method also benefits from the relaxed Manhattan-world geometry prior and sunlight color prior.These priors reduce the learning difficulty of the ill-posed inverse rendering problem and stabilize the training process.Nevertheless, our method still has the following limitations: Firstly, our method works better on diffuse scenes since we do not take specular reflection into account.Decomposition results can be less faithful when specular reflection exists as shown in Fig. 7. Secondly, our lighting representation is based on the environment map that does not handle emitters like streetlights.For future directions, we would like to decompose more complicated lighting effects like specular reflection and emitters from the input scene similar to inverse rendering methods like [Li et al. 2022b;Wu et al. 2023;Zhu et al. 2023].49.316, 37.336, 32.507, 32.467, 32.077, and 29.758.When calculating average, the first sample(49.316)is regarded as an outlier and omitted.

Figure 2 :
Figure2: The overview of SOL-NeRF pipeline.Given a set of images under multiple different lighting conditions, we model the scene's geometry with a signed distance field (SDF) and apply an adaptive sampling strategy in the neural volume rendering process.To decompose geometry, material, shadow and lighting, we predict the diffuse albedo  with an MLP network and the normal is derived by the gradient of the SDF network.Our lighting is composed of a Spherical Gaussian (SG) function and the first-order Spherical Harmonic (SH) functions.The SG function is responsible for high-intensity lighting like the sun while the SH functions are designed to represent relatively low-intensity lighting like the sky light.We consider both the shadow cast by the directional SG light and the ambient occlusion.To enhance decomposition quality, we introduce priors for the sunlight color and geometry.Overall, SOL-NeRF enables realistic reconstruction and relighting the input scene under a novel lighting condition.
3.1.1Neural Rendering based on an Implicit Representation.NeRF uses a density field  to model geometry.At rendering time, the renderer generates a ray per pixel and samples with distances   along the ray.The final color of the pixel is obtained through discrete integration  =     c  , where  is the number of sampled points along a ray and

Figure 3 :
Figure3: The calculation of shadow under the proposed hybrid lighting.The final shadow consists of both the SG shadow and the shadow caused by ambient occlusion.Specifically, we calculate the SG shadow by casting the SG lighting direction to the extracted mesh.The ambient occlusion value of a surface point p  is determined by the ratio of points outside the surface and the total sample points {p ′ , } in a hemisphere centered at p  .Note that ambient occlusion is calculated by directly querying the SDF, while traced shadow is based on extracted mesh.

Figure 4 :
Figure4: The sunlight color is related to the sun elevation  .According to the learned sun elevation  , atmosphere thickand air density, the optical depth is calculated.Then the color of the sunlight that reaches the surface is obtained using wavelength-dependent Reyleigh and Mie Scattering laws[Nishita et al. 1993; Tyndall 1869].Using this method we can associate every  with a color.This function is approximated using a polynomial that can be evaluated efficiently in training time.

Figure 5 :
Figure 5: Qualitative comparison of decomposition results between the full pipeline (Full) and the baselines including without sunlight prior (w/o S.P.), without Manhattan-world prior (w/o M.P.), and predicting shadow with an MLP network implicitly (Implicit S.).
4.5.4Ambient Occlusion Calculation Strategy.As mentioned in Sec.3.3, we use an MLP network   to predict ambient occlusion Table 4: Quantitative comparison of reconstructed normals by our method (w/ M.P.) and the baseline (w/o M.P.) without the Manhattan-world assumption using the MAE metric on Syn 3 scene.w/o M.P. w/ M.P. Normal MAE ( • )↓ 19.564 16.205

Figure 6 :
Figure 6: Decomposition results of NeRF-OSR and our method.For each scene, we show different decomposed components (normal, albedo and shadow) and the reconstructed image.

Figure 7 :
Figure 7: Failure case: our method is incapable of decomposing the scene correctly when strong specular reflections exist.

Figure 8 :Figure 9 :
Figure8: Qualitative comparisons of relighting results between our explicit shadow calculation (Explicit S.) and the baseline (Implicit S.) using an implicit MLP network to predict shadows with two novel lighting conditions.

Figure 10 :
Figure 10: We use our hybrid lighting representation to fit six environment maps.In each pair, we compare the rendered images under the ground truth environment map and the fitted environment map.The average PSNR between the images pairs is 32.829.The per-image PSNRs are reported as follows:49.316,37.336, 32.507, 32.467, 32.077, and 29.758.When calculating average, the first sample(49.316)is regarded as an outlier and omitted.
Figure 11: Qualitative comparison of Ambient Occlusion and reconstruction results between the explicit ambient occlusion calculation baseline (Sampled A.O.) and our method that uses an MLP network to predict the A.O. (Pred.A.O.).We additionally include sampled A.O. with different numbers of samples for reference.Please zoom in for more details.
For every ray, we get the surface point p  = o  +   d  with the corresponding normal N  , where o  and d  denote the origin and direction of the  -th ray, and   =  =1     denotes the depth estimated by volume rendering.We sample  points p ′ ,1 , p ′ ,2 , ..., p ′ , within the hemisphere centered at p  and with upper direction N  .The SDF value of p ′ , is queried, and the final ambient occlusion shadow is calculated by: is an indicator function, and    ( * ) is the learned SDF function.However, the sampled ambient occlusion  (•) will cause noises since it is an approximate function based on random sampling.So we further define an MLP   as a smooth, learnable model to better estimate ambient occlusion, which is optimized during end-to-end training with data loss   and regularization loss   : [Mittring 2007;H Shadow.The SH illumination is utilized to approximate the near-uniform sky light.If we further assume the sky light is uniformly distributed, what we get is exactly the so-called 'ambient light' in the real-time rendering context.Thus, we adapt the Screen Space Ambient Occlusion (SSAO)[Mittring 2007; Shanmugam and Arikan 2007]techniques to our SDF-based geometry representation.

Table 1 :
Quantitative comparison of reconstruction results using SSIM, PSNR and MSE metrics on the synthetic dataset.

Table 2 :
Quantitative comparison of relighting results on real NeRF-OSR dataset and the synthetic dataset using SSIM, PSNR and LPIPS metrics.Results are averaged over ten different viewpoints with five different environment maps.