Extended Path Space Manifolds for Physically Based Differentiable Rendering

Physically based differentiable rendering has become an increasingly important topic in recent years. A common pipeline computes local color derivatives of light paths or pixels with respect to arbitrary scene parameters, and enables optimizing or recovering the scene parameters through iterative gradient descent by minimizing the difference between rendered and target images. However, existing approaches cannot robustly handle complex illumination effects including reflections, refractions, caustics, shadows, and highlights, especially when the initial and target locations of such illumination effects are not close to each other in the image space. To address this problem, we propose a novel data structure named extended path space manifolds. The manifolds are defined in the combined space of path vertices and scene parameters. By enforcing geometric constraints, the path vertices could be implicitly and uniquely determined by perturbed scene parameters. This enables the manifold to track specific illumination effects and the corresponding paths, i.e., specular paths will still be specular paths after scene parameters are perturbed. Besides, the path derivatives with respect to scene parameters could be computed by solving small linear systems. We further propose a physically based differentiable rendering method built upon the theoretical results of extended path space manifolds. By incorporating the path derivatives computed from the manifolds and an optimal transport based loss function, our method is demonstrated to be more effective and robust than state-of-the-art approaches in inverse rendering applications involving complex illumination effects.


Initial
Our derivatives Ours Ours+hybrid -1 1 Figure 1: Optimizing the 2D translation vectors of eight specimen objects via physically based dierentiable rendering.Note that the specimen objects are placed inside a glass box and are viewed through nested reection and refraction.Given the target image and initial scene parameters, state-of-the-art methods including Path Replay Backpropagation with reparameterization [Vicini et al. 2021] (short as PRB), PRB with a multi-scale scheme (short as PRB.mul.res), and Plateau-reduced Dierentiable Path Tracing [Fischer and Ritschel 2022] (short as PRDPT) fail to correctly recover the scene parameters.In contrast, our method successfully recovers the positions of all specimen objects due to the eectiveness of extended path space manifolds.Our hybrid optimization scheme (ours+hybrid) is able to further rene the results.We also show the close-up views of the color derivatives of PRB and our geometric derivatives with respect to a single scene parameter, respectively.Note that the two derivatives are inherently of dierent types -they are not computing the same quantities.The optimization is performed using a rendering resolution of 128 ⇥ 128 and 32 spps, while the images displayed in the gure are re-rendered in a higher resolution of 512 ⇥ 512 and 8192 spps.The scene is modied from [Bitterli 2016].
the dierence between rendered and target images.However, existing approaches cannot robustly handle complex illumination eects including reections, refractions, caustics, shadows, and highlights, especially when the initial and target locations of such illumination eects are not close to each other in the image space.
To address this problem, , we propose a novel data structure named extended path space manifolds.The manifolds are dened in the combined space of path vertices and scene parameters.By enforcing geometric constraints, the path vertices could be implicitly and uniquely determined by perturbed scene parameters.This enables the manifold to track specic illumination eects and the corresponding paths, i.e., specular paths will still be specular paths after scene parameters are perturbed.Besides, the path derivatives with respect to scene parameters could be computed by solving small linear systems.[Vicini et al. 2021] (short as PRB), PRB with a multi-scale scheme (short as PRB.mul.res), and Plateau-reduced Differentiable Path Tracing [Fischer and Ritschel 2022] (short as PRDPT) fail to correctly recover the scene parameters.In contrast, our method successfully recovers the positions of all specimen objects due to the effectiveness of extended path space manifolds.Our hybrid optimization scheme (ours+hybrid) is able to further refine the results.We also show the close-up views of the color derivatives of PRB and our geometric derivatives with respect to a single scene parameter, respectively.Note that the two derivatives are inherently of different types -they are not computing the same quantities.The optimization is performed using a rendering resolution of 128 × 128 and 32 spps, while the images displayed in the figure are re-rendered in a higher resolution of 512 × 512 and 8192 spps.The scene is modified from [Bitterli 2016].

ABSTRACT
Physically based differentiable rendering has become an increasingly important topic in recent years.A common pipeline computes local color derivatives of light paths or pixels with respect to arbitrary scene parameters, and enables optimizing or recovering the scene parameters through iterative gradient descent by minimizing the difference between rendered and target images.However, existing approaches cannot robustly handle complex illumination effects including reflections, refractions, caustics, shadows, and highlights, especially when the initial and target locations of such illumination effects are not close to each other in the image space.
To address this problem, we propose a novel data structure named extended path space manifolds.The manifolds are defined in the combined space of path vertices and scene parameters.By enforcing geometric constraints, the path vertices could be implicitly and uniquely determined by perturbed scene parameters.This enables the manifold to track specific illumination effects and the corresponding paths, i.e., specular paths will still be specular paths after scene parameters are perturbed.Besides, the path derivatives

INTRODUCTION
Physically based forward rendering has been a central research topic in computer graphics for decades, with a focus on robust and accurate light transport simulation in virtual scenes with specified shapes, illumination and materials.Recently, significant progress has been made on its inverse problem with advances in theoretical frameworks, sampling algorithms and system pipelines of differentiable rendering, which offers the capability of evaluating derivatives of the rendered image with respect to arbitrary scene parameters and being used to facilitate gradient-based optimization problems [Bangaru et al. 2020;Li et al. 2018;Loubet et al. 2019;Nimier-David et al. 2019;Vicini et al. 2021Vicini et al. , 2022;;Zeltner et al. 2021;Zhang et al. 2020].
Physically based differentiable rendering pipelines generally calculate the local color derivatives of light paths or pixels with respect to arbitrary scene parameters.These computations facilitate the optimization or reconstruction of these arbitrary scene parameters through the iterative process of gradient descent which minimizes the discrepancy between the rendered image and the target image.However, current methods often struggle with robustly handling complex illumination effects including reflections, refractions, caustics, shadows, and highlights, especially when the initial and target locations of such illumination effects are not close to each other in the image space.
On the other hand, significant efforts have been dedicated to exploring methods for efficiently rendering these complex light transport effects within the scope of forward rendering, especially the methods involving the exploration of path space manifolds (PSMs) [Jakob and Marschner 2012;Jakob 2013;Kaplanyan et al. 2014;Veach and Guibas 1997;Zeltner et al. 2020].In this work, drawing inspiration from these approaches, we bridge the gap by introducing the mathematical formulation of extended path space manifolds (EPSMs) in the context of physically based differentiable rendering, which are defined in the combined space of path vertices and scene parameters.
Concretely, our contributions include: • We present extended path space manifolds.By enforcing geometric constraints, the manifolds implicitly define a mapping from scene parameters to path vertices.We further derive path derivatives that measure how path vertices change with respect to scene parameters under constraints.
• Built upon the extended path space manifolds, we present a physically based differentiable rendering method.Experiments show it is more effective and robust than state-ofthe-art methods in inverse rendering applications involving complex illumination effects, including specular and glossy reflections, refractions, highlights, caustics, and shadows.Such an example is given in Fig. 1.

RELATED WORK 2.1 Path Space Manifolds
Physically based light transport algorithms are built on top of the path integral formulation [Veach 1998] using (Markov chain) Monte Carlo techniques.Veach and Guibas [1997] partitions the path space into submanifolds and designs several light path perturbation strategies for efficient sampling of difficult light paths, such as lens perturbation, caustic perturbation and multi-chain mutations.
Manifold exploration [Jakob and Marschner 2012; Jakob 2013] further improves the efficiency of MLT algorithms with manifold walks on the path space manifolds (PSMs), addressing the challenges posed by specular and near-specular light paths, which often lead to slow convergence rates in traditional light transport simulations.To navigate these complexities, manifold exploration leverages the inherent structure of these paths as manifolds in path space.A simple equation-solving iteration allows for efficient exploration on these path manifolds, resulting in an effective method to perturb specular paths using available geometric constraints in the path tracer.Kaplanyan et al. [2014] proposed to mutate paths by explicitly modeling the ray differentials [Igehy 1999] in half vector space manifold, yielding better rendering performance on glossy scenes.Zeltner et al. [2020] further improved the specular manifold constraints for rendering high-frequency caustics and glints.
Unlike aforementioned PSMs that are primarily tailored for forward rendering, we present extended path space manifolds (EPSMs) for physically based differentiable rendering, which differ mainly in two ways -First, PSMs are defined on the space of paths, while our EPSMs are defined on the combined space of paths and optimizable scene parameters.Second, PSMs compute the derivatives of paths with respect to the positions of endpoints, while our EPSMs compute the derivatives of paths with respect to scene parameters.Similar to PSMs where we could implicitly determine the positions of all specular vertices from the positions of diffuse endpoints, in EPSMs, we could also uniquely find the updated positions of all path vertices when scene parameters are slightly changed.

Differentiable Rendering
Inverse rendering typically requires both the development of an advanced forward parametric model and the computation of its corresponding derivatives.This process is often approached via analysis-by-synthesis techniques [Gkioulekas et al. 2013;Khungurn et al. 2015;Zhao et al. 2016].Recently, there has been a surge in interest for fully-differentiable forward rendering techniques, known as differentiable rendering.It enables practical inverse rendering applications, such as object capture and material estimation [Cai et al. 2022;Deng et al. 2022;Gao et al. 2019;Guo et al. 2020;Luan et al. 2021;Lyu et al. 2020;Munkberg et al. 2021;Shi et al. 2020].
The pioneer general-purpose differentiable rendering frameworks, including OpenDR [Loper and Black 2014] and Neural 3D Mesh Renderer [Kato et al. 2018], leveraged analytical differentiation with approximate forward models.A variety of differentiable rasterization techniques have been developed to efficiently render primitives within a scene [Laine et al. 2020;Liu et al. 2019;Ravi et al. 2020].Although capable of handling primary visibility, these frameworks encountered difficulties with complex illumination effects.
Physically based differentiable rendering has been focused on differentiating a path tracer to handle global illumination through light transport simulation [Bangaru et al. 2020;Li et al. 2018;Loubet et al. 2019;Vicini et al. 2021Vicini et al. , 2022;;Yan et al. 2022;Zeltner et al. 2021;Zhang et al. 2020Zhang et al. , 2021]].Generally, physically based differentiable rendering involves estimating two main components: (i) the interior integrals derived from differentiating the integrands associated with the forward-rendering models, and (ii) the boundary integrals determined over the discontinuities present in those integrands.Previously, the estimation of interior integrals primarily leveraged path sampling methods originally conceived for forward rendering, while reparameterization techniques [Bangaru et al. 2020;Loubet et al. 2019] apply suitable changes of variables to the integrands to avoid computing boundary integrals.Recently, Zeltner et al. [2021] and Vicini et al. [2021] investigated how reparameterization techniques and different sampling strategies such as "attached sampling" and "detached sampling" influence the performance of Monte Carlo estimations.Some of the directional and positional derivatives computed in these attached sampling methods are indeed somewhat similar to our proposed manifold derivatives, while our method focuses on computing geometric derivatives (i.e., the change of path geometries w.r.t.scene parameters) instead of per-pixel color derivatives (i.e., how per-pixel color contributions change w.r.t.scene parameters).More precisely, contemporary differentiable rendering methods that compute per-pixel image derivatives often encounter limitations in inverse rendering tasks.In particular, they are ineffective in aiding global and long-range optimization (when initial and target objects/shadows/caustics are not close to each other, i.e., are not overlapping in the image space) during inverse rendering.
Recently, Xing et al. [2022] linked screen-space pixels to their corresponding visible 3D points and derived 5D RGBXY derivatives, assessing how color and screen position change with scene parameters.However, it is based on differentiable rasterization and cannot handle inverse rendering applications involving complex global illumination effects such as reflections and caustics.Fischer and Ritschel [2022] convolved the rendering function with a kernel that blurs the scene parameter space.The method could capture long-range relationships to some extent, however, its gradients have relatively high variance and will become less effective when the number of scene parameters grows larger.

BACKGROUND: PATH SPACE MANIFOLDS
Path space manifolds (short as PSMs), or specular manifolds, are first proposed by Jakob and Marschner [2012] to address the problem of rendering scenes with difficult specular light transport.A general length- light path could be represented as x = x 0 x 1 • • • x  where the two endpoints are the positions of the eye x 0 and a point x  on a light source, and the middle bouncing vertices x  (1 ≤  ≤  − 1) could be either diffuse or specular.
Without loss of generality, let's consider a simpler case of a path x  , and we assume that the two endpoints x 1 and x  are diffuse and all other vertices are specular.While the dimension of the path space is relatively large (i.e., 2), the path in fact lies in a lower dimensional subspace (i.e., 4).Since all middle specular vertices need to satisfy the law of reflection or Snell's law, to formulate it, Jakob and Marschner [2012] introduced a half-vector constraint function to each specular vertex x  , i.e., constraining the half-vector to be parallel to surface normal: where  (x  ) is a 2 × 3 matrix that represents the local tangent vectors, and the half vector function ℎ(•) is defined as [Walter et al. 2007]: (2) and   denote the index of refraction of the two sides, respectively.
By putting together the half-vector constraints on all specular vertices, we could get a stacked constraint function C : R 2 → R 2( −2) , expressed as: and the PSMs are defined as the set of paths that satisfy the constraints: The Implicit Function Theorem [Spivak 1965] tells that the whole path x is a function of two endpoints x 1 and x  in a neighborhood of a current path.In other words, all middle specular vertices x  (2 ≤  ≤  − 1) could be implicitly determined by the positions of the two diffuse endpoints.Besides, the partial derivatives of the path x with respect to the two endpoints could be computed by solving a linear system derived from the Jacobian matrix of C.

EXTENDED PATH SPACE MANIFOLDS
Inspired by existing works of PSMs, we present extended path space manifolds (EPSMs) in order to handle differentiable rendering problems involving difficult light paths.Currently, we focus on global illumination with surface interactions, while volumetric effects are left for future works.

Definition of EPSMs
Given a length- light path x = x 0 x 1 • • • x  which starts from the position of the eye x 0 , follows with multiple diffuse or specular bounces inside the scene, and ends at a point x  on a light source, we define an extended path by associating it with scene parameters of interest  = [ 1 , • • • ,   ], simply as a combined vector of path Half Vector Fixed Position Fixed Direction Colinear Specular/Glossy Diffuse Occluder vertices and scene parameters: (x,  ).So that the extended path space is an Euclidean space (i.e., R 2(+1)+ ) defined as: Analogous to PSMs, we also introduce constraints to the extended path space so that its actual dimension could be reduced.Specifically, we always introduce  + 1 2D vector valued constraint functions, i.e., the same number as path vertices.The constraint functions could be stacked as C : R 2(+1)+ → R 2(+1) , in the form: (6) where c  denotes the -th constraint function (1 ≤  ≤  + 1).Therefore, the extended path space manifolds (EPSMs) could be defined as the set of extended paths satisfying the above constraints: The EPSMs essentially give a mapping from scene parameters  ∈ R  to the path x ∈ R 2(+1) , i.e., the positions of all path vertices could be implicitly determined by scene parameters  .The Implicit Function Theorem ensures that the mapping exists in the neighborhood of a current extended path.
A motivating example of EPSMs is given in Fig 2 .It shows an extended path (x,  ), where x = x 0 x 1 x 2 x 3 x 4 starts from the eye, then sequentially bounces at two mirrors and a diffuse surface, and ends at a light source.The associated scene parameters  = [ 1 ,  2 ] indicate the rotating angles of the two mirrors respectively.Halfvector constraints are enforced on the specular vertices and fixed position constraints are enforced on other vertices (will be explained in Sec.4.2).As shown in the figure, if we slightly rotate the mirrors, the path will be perturbed and uniquely determined, i.e., x 1 and x 2 will be moved accordingly to satisfy the law of reflection, and x 3 will keep fixed since it is a diffuse vertex.

Constraint Functions
As mentioned in Sec.4.1, in order to define the EPSMs, we need to enforce  + 1 constraint functions on the extended paths.Below, we introduce the 4 types of constraint functions we have used in EPSMs.Note that all constraint functions are 2D vector valued.4.2.1 Half-vector constraint.It constrains the half-vector of incoming and outgoing ray directions in the local frame of a specific vertex x  to be unchanged when scene parameter changes: where  (•) is a 2 × 3 matrix representing the local tangent vectors, and ℎ(•) is the half vector.Different from the one used in Jakob and Marschner [2012] which strictly constrains the half-vector to be parallel to surface normal, we constraint the half-vector in the local frame to be fixed during scene parameter perturbation.This makes the constraint function applicable to both specular and glossy surfaces.
4.2.2Fixed position constraint.This constraint enforces the position of a path vertex x  to be locally unchanged.Specifically, we constrain the barycentric weights of a vertex x  with respect to its belonging surface triangle to be unchanged: It is usually used to constrain the outgoing ray from the light source (i.e., − −−−−− → x  x −1 ) when dealing with caustic effects.

Colinear constraint.
The colinear direction constraint is used to constrain two neighboring ray directions of a path vertex x  to be always colinear: where × denotes the cross product operator.The colinear constraint is only used for shadow rays (will be explained in Sec.4.3.3).

Construction of EPSMs
EPSMs are built by enforcing constraints to the space of extended paths (x,  ).To effectively handle different rendering effects, we have designed 3 different types of EPSMs, as illustrated in Fig. 4. Different EPSMs use different combinations of constraint functions, but the number of applied constraint functions is always  + 1, the same as the number of path vertices.This ensures that the positions of all path vertices we explain how each type of EPSM is constructed.

General EPSMs.
We apply the following constraints to an extended path ) to construct a general EPSM: (1) enforcing the half-vector constraint (Sec.4.2.1) on specular vertices; (2) enforcing the fixed position constraint (Sec.4.2.2) on all diffuse vertices and the two endpoints x 0 and x  .
The general EPSM uses a combination of constraint functions similar to Jakob and Marschner [2012].It is the most general type of EPSM and could be used in handling a wide range of rendering effects including reflections, refractions, and highlights.Note that it could also handle glossy or semi-glossy surfaces, diffuse indirect illumination and direct lighting as well.
Compared to general EPSMs, caustic EPSMs additionally enforce the fixed direction constraint on the light ray while removing the fixed position constraint on the diffuse vertex x 1 .This enables a caustic EPSM to effectively track the caustic pattern cast on diffuse surfaces, which will move accordingly when light sources or specular objects on the path move.
4.3.3Shadow EPSMs.Shadow EPSMs are specifically designed for handling shadows in direct illumination.In forward rendering, we will trace shadow rays from shading points towards light sources.If the shadow ray hits an occluder before arriving at the light source, we consider that the shading point is inside shadow.As shown in Fig. 4 (c), we construct a shadow EPSM for each shadow ray that hits an occluder.Since it only considers direct illumination, the path is short and consists of only 4 vertices (i.e., the eye point x 0 , shading vertex x 1 , the occluder vertex x 2 , and the light source point x 3 ).Note that we only record the first hit point on the shadow ray as the occluder vertex.We apply the following constraints to the shadow EPSMs: (1) enforcing the fixed position constraint (Sec.4.2.2) on the two endpoints x 0 and x 3 and the occluder vertex x 2 ; (2) enforcing the colinear constraint (Sec.4.2.4) on rays − −− → x 1 x 2 and − −− → x 2 x 3 since the shadow ray (x 1 → x 2 → x 3 ) needs to be a straight line by definition.
Note that no constraints are applied to the shading vertex x 1 so that the shadow EPSMs could effectively capture moving shadows.

Computation of Path Derivatives
In this subsection, we show how to compute the path derivatives, i.e., how path geometries x change with respect to scene parameters  under the constraints of an EPSM: The Implicit Function Theorem [Spivak 1965] guarantees the existence of the derivatives in the neighborhood of a current extended path (x,  ).The derivatives could be computed through implicit partial differentiation of the stacked constraint function  ) in Eq. 6: dC(x,  ) Hence, the path derivatives x/ could be obtained through solving the following linear system: where C/x is a (2+2)×(2+2) matrix, and C/ is a (2+2)× matrix.The two matrices are actually two sub-matrices of the sparse Jacobian matrix ∇C.

OUR DIFFERENTIABLE RENDERING METHOD
In this section, we introduce our physically based differentiable rendering method based on the theoretical results of the EPSMs.In contrast, our method could handle such long-range optimization in a more robust way.First, our proposed EPSMs are rather suitable for tracking those illumination effects.For example, considering a specular path (i.e.ES + DL) causing a reflection, the updated path implicitly determined by perturbed scene parameters will still be a specular path.Second, different from previous works which consider color derivatives at fixed pixel locations, our manifold derivatives are essentially geometric derivatives, i.e., computing how path geometries change with respect to scene parameters, which are denser and potentially lead to more stable optimization (see visualization in Fig. 1).Finally, our employed optimal transport could help find long-range matching and could be directly connected and combined with our manifold derivatives.

Method pipeline.
Let's consider a typical setting of inverse rendering: given initial scene parameters and target image(s) from one or multiple viewpoints, we aim to recover desired scene parameters through iterative gradient descent by minimizing a predefined loss function between the rendered and target images.At each iteration, we perform the following steps: (1) Forward rendering.We use Monte Carlo path tracing for rendering images and we record all sampled light paths.(2) Construction of EPSMs.For each sampled light path, we associate it with scene parameters of interest to obtain an extended path and build an EPSM for it.(3) Path-pixel matching.We utilize optimal transport to obtain a pixel-to-pixel mapping between the rendered images and the target reference images.For each pair of matched pixels, we record all sampled paths in the rendered pixel as corresponding to the target pixel.(4) Optimization of scene parameters.We define a loss function according to the path-pixel correspondences, then perform backpropagation to minimize the loss function using the path derivatives computed from EPSMs.We will further describe the details in Sec.5.2.

Method Details
5.2.1 Forward rendering.We use Monte Carlo path tracing for forward rendering to produce rendered images.We record part of sampled light paths and also record shadow paths in direct illumination.The recorded paths will be used to build EPSMs and the rendered images will be used to find path-pixel matching.

Construction of EPSMs.
For each recorded path, we associate it with scene parameters of interest to obtain an extended path and then construct an EPSM of a specific type for the extended path.Specifically, we would like to construct shadow EPSMs and caustic EPSMs for shadow paths and for caustic paths, respectively, and construct general EPSMs for other types of paths.While it is easy to automatically determine shadow paths, it is non-trivial to automatically distinguish caustic paths (i.e., caustic paths must be in the form of ES * DS + L, but ES * DS + L-paths are not necessarily caustic paths).Hence, we manually specify whether or not the rendered scene is a caustic scene.For non-caustic scenes, we construct general EPSMs for all paths except shadow paths.For caustic scenes, we construct caustic EPSMs for ES * DS + L-paths and general EPSMs for other paths.
Note that the number of associated scene parameters could be different for different paths.Let's imagine a scene with two mirrors where the optimizable scene parameters are their rotating angles.Paths that only hit one mirror just need to be associated with a single scene parameter (i.e., the rotating angle of the hit mirror).Paths that hit neither of two mirrors could be directly discarded and will not be used in constructing EPSMs.
Furthermore, the general EPSMs could be potentially simplified.Since our loss function (Eq.16, will be explained later) only depends on the first two vertices (x 0 and x 1 ) of a path and the fixed position constraint applied to diffuse vertices cuts the relationship between vertices before and after a diffuse vertex, we could simplify a general EPSM by removing the path vertices after the first diffuse vertex.Fig. 3 (a) shows such an example: the general EPSM

5.2.3
Path-pixel matching.In this step, first, we follow the work of Xing et al. [2022] to use optimal transport to obtain a one-to-one pixel matching between rendered and target images.We choose optimal transport since it could capture long-range relationships.We define the unit transportation loss between a pixel p i in the rendered image and a pixel t j in the target image as follows: where I r (•), I t (•) denote pixel color values of the rendered image and the target image, respectively, and the balancing weights are set as  1 =  2 = 0.5.Note that the unit transportation loss considers both color and positional differences between pixels.
The optimal transport algorithm will find the optimal one-to-one pixel matching with the minimal sum of unit transportation losses between matched pixels.To achieve a trade-off between accuracy and efficiency, we also follow Xing et al. [2022] to use the Sinkhorn divergences [Cuturi 2013;Feydy et al. 2019] with parameter  = 0.01 to compute an approximated optimal transport.
After that, we build correspondences between sampled paths and pixels in the target image.For each pair of matched pixels, i.e. a pixel p i in the rendered image and a pixel t j in the target image, we simply set all sampled paths at pixel p i as corresponding to pixel t j .

Loss function and optimization.
As shown in Fig. 3 (b), given path-pixel correspondences, in order to drive a path to move towards the corresponding pixel location, we define the loss function  of each path x as follows: where  (x 0 , x 1 ,  ) denotes the ray-plane intersection point of ray − −− → x 0 x 1 and the image plane, t denotes the corresponding pixel position in the target image.The total loss is simply the sum of the losses in Eq. 16 over all paths.The formulation of our loss is similar to the refraction loss in [Lyu et al. 2020].
The derivative of the loss function  with respect to scene parameters  could be simply obtained through the chain rule: where x/ is the path derivative with respect to scene parameters under the constraints of an EPSM, which is computed by solving the small linear system in Eq. 14.Furthermore, since the loss function (Eq.16) only depends on the first two vertices (x 0 and x 1 ) of the path, we only need to compute the derivatives involving the first two vertices: By using the above loss derivatives, the scene parameters  could be optimized through iterative backpropagation.
5.2.5 Implementation details.We implement our method entirely on GPU using Mitsuba 3 [Jakob et al. 2022b,a] and PyTorch [Paszke et al. 2019].Thanks to the interoperability of the two frameworks, mixed computations, data transmission, and data synchronization between them can be easily achieved.Specifically, forward rendering and backpropagation are implemented using Mitsuba 3. Optimal transport based matching, solving linear systems for path derivatives (Eq.14) are implemented using PyTorch.The Jacobian matrix of the constraint functions ∇C are computed through autodifferentiation with a mixed use of PyTorch and Dr.Jit [Jakob et al. 2022a].Taking the fixed position constraint (Eq.9) as an example, the derivatives of the barycentric weights with respect to vertex positions /v  (1 ≤  ≤ 3) are computed in PyTorch, while the derivatives of vertex positions with respect to scene parameters v  / (1 ≤  ≤ 3) are computed in Dr.Jit.They are combined to obtain / through the chain rule.

EXPERIMENTS
All experiments are performed on a PC with an NVIDIA RTX 3090 GPU (24G memory).In our experiments, the paths are recorded using a rendering resolution of 128 × 128 and 32 samples per pixel (spps).For better quality of matching, we render images with a resolution 512 × 512 and 64 spps and downsample the rendered images to 128 × 128 to perform optimal transport.For all examples, we run our EPSM based method for 500 iterations.Typically, one iteration takes about 4.3 -7.2 seconds.The main bottleneck lies in the computation of the path derivatives which costs about 70% time budget, since it requires solving a small linear system for each path.We use the Adam optimizer [Kingma and Ba 2014] for backpropagation.

Scene Configurations
In order to demonstrate the robustness and effectiveness of our method, we test our method on inverse rendering applications over four representative scenes.The tested scenes cover various types of complex illumination effects including reflections, refractions, caustics, shadows, and glossy highlights.
The Bathroom scene in Fig. 1 contains eight specimen objects inside a glass box, which are viewed through reflection paths via the mirror followed by refraction paths via the glass.We aim at recovering the 2D translation vectors of the specimen objects through tracking the reflected (and refracted) images.
The Shadow scene in Fig. 5 (a) contains an area light, a floor, and 400 spheres.The light is put above the spheres and casts shadows onto the floor.The camera is put below the spheres and towards the floor so that the camera can see shadows.The goal is to recover the 2D translation vectors of all occluder spheres by their shadows.
The CornellBox scene in Fig. 5 (b) presents a challenging mix of intricate and colorful caustics originating from a glass ball, testing the robustness of caustics path derivatives estimation with respect to the rotation angles of six area lights.
The Highlight scene in Fig. 5 (c) serves to assess the ability to differentiate through glossy and near-specular light paths.Starting with five parallel glossy planes reflecting vibrant, out-of-view emitters, we would like to simultaneously optimize the plane rotations and horizontal translation offsets of the emitters.

Results and Analysis
6.2.1 Visual comparisons.We compare the results of our method with three baselines, including Path Replay Backpropagation [Vicini et al. 2021] with reparameterization [Bangaru et al. 2020] (short as PRB), PRB with a multi-scale scheme (short as PRB.mul.res) and Plateau-reduced Differentiable Path Tracing [Fischer and Ritschel 2022] (short as PRDPT).In the multi-scale scheme (PRB.mul.res),we always render images with a resolution of 512 × 512, while using progressively downsampled rendered images with a resolution from 8 × 8 to 512 × 512 for computing derivatives.For each scene, we run each method for 500 iterations.The results are given in Fig. 1 and Fig. 5.
However, neither PRB nor PRDPT converges to the correct results on all these scenes.While PRB is effective in optimizing various types of scene parameters under complex illumination, it usually requires that the initial rendered image is sufficiently aligned with the target image since it relies on local color derivatives, which limits PRB in handling rendering effects with long-range relationships.The multi-scale scheme could alleviate this problem (see Fig. 5 (a), the 4th column), but still cannot solve it robustly.While PRDPT is able to capture long-range relationships to some extent, however, due to the increase of sampling variance in a higher dimensional space, it will quickly become less effective when the number of optimizable parameters grows larger.
In contrast, our method successfully produces nice convergence to the target images on all the scenes.The results demonstrate the robustness of our method in handling a range of complex illumination effects and the superior effectiveness of EPSM-based derivatives over baseline approaches.
6.2.2 Hybrid optimization scheme.Noticing that in scenes Bathroom (Fig. 1) and Shadow (Fig. 5 (a)), our optimized results are close to the target images but still have some subtle differences.This is because the approximated optimal transport using Sinkhorn divergences may produce inaccurate pixel matching when the rendered and target images are already aligned well [Xing et al. 2022], which may lead to less accurate loss derivatives and lower convergence rate in the last iterations of optimization.To address this issue, similar to [Xing et al. 2022], we could optionally employ a hybrid optimization scheme to refine the results.This is done by simply running the optimization for 100 iterations using PRB after running our method for 500 iterations.As shown in Fig. 1 and Fig. 5 (a), the hybrid scheme generates better aligned results.6.2.3 Error curves.Fig. 6 offers the error curves from the four inverse rendering experiments discussed above.For the Bathroom, CornellBox, and Highlight scenes, we opt for parameter loss, as the target parameters are available.For the Shadow scene, featuring 400 spheres, we compute the image RMSE loss between the rendered image and the target image.Our method converges quickly on all the scenes.6.2.4 Visualization of derivatives.In Fig. 1 (the middle column), We visualize the derivatives of PRB [Vicini et al. 2021] and our derivatives with respect to a single scene parameter (i.e., the vertical movement of one specimen object), respectively.The derivatives of PRB compute how pixel colors change with scene parameters, which are sparse and only have non-zero values near object boundary.In contrast, we compute a different type of derivatives -how path geometries change with scene parameters.Our derivatives are dense inside the object and could lead to more stable optimization.6.2.5 Additional results.Fig. 7 showcases more inverse rendering results.The tasks incorporate multiple types of optimizing goalslight translation vector through caustics, object translation through nested reflections, camera pose estimation, translation vectors of three light sources via glossy highlights, normal map of a refractive glass slab via the refracted image, and a 72-parameter human model (Skinned Multi-Person Linear, SMPL) [Loper et al. 2015] from its curved shadow.These tasks further demonstrate the robustness of our method in handling various scenes involving complex illumination effects and different types of optimizable scene parameters.
We will show the progressive optimization process for all examples in the supplemental video.

CONCLUSION
In this paper, we introduced a novel approach to physically based differentiable rendering.Our key contribution lies in the formulation of extended path space manifolds (EPSMs), designed to navigate the complex non-local and long-range relationships that often characterize intricate illumination effects in a scene.By enforcing geometric constraints of EPSMs that implicitly enable a mapping from scene parameters to path vertices, our approach significantly enhances the robustness and efficacy of the optimization processes involved in differentiable rendering.Through various experiments, our method demonstrated marked improvements over existing techniques, particularly in handling complex illumination effects.
While our method shows promise in inverse rendering applications, it still faces limitations in computation time and memory consumption.The timing bottleneck lies in the computation of path derivatives which requires solving small linear systems.Currently we use PyTorch routine torch.linalg.inv.Since the Jacobian matrix ∇C is sparse, a possible way for acceleration would be using iterative methods instead of direct methods to solve linear systems.As for the usage of GPU memory, currently we store all sampled light paths for the simplicity of implementation.In the future, we could switch to a batch computation mode and only store light paths in a batch to reduce memory consumption.
Furthermore, our method cannot handle "shadow in mirror" effect since it does not fit well with any EPSMs we have defined.To handle it, we need to extend the definition of shadow EPSMs, where the shadow path could contain zero or more specular vertices between the eye point and the shading point.Finally, the effectiveness of our method depends heavily on the quality of matching.More sophisticated matching algorithms beyond optimal transport are also worth investigating.

Figure 1 :
Figure 1: Optimizing the 2D translation vectors of eight specimen objects via physically based differentiable rendering.Note that the specimen objects are placed inside a glass box and are viewed through nested reflection and refraction.Given the target image and initial scene parameters, state-of-the-art methods including Path Replay Backpropagation with reparameterization [Vicini et al. 2021] (short as PRB), PRB with a multi-scale scheme (short as PRB.mul.res), and Plateau-reduced Differentiable PathTracing[Fischer and  Ritschel 2022] (short as PRDPT) fail to correctly recover the scene parameters.In contrast, our method successfully recovers the positions of all specimen objects due to the effectiveness of extended path space manifolds.Our hybrid optimization scheme (ours+hybrid) is able to further refine the results.We also show the close-up views of the color derivatives of PRB and our geometric derivatives with respect to a single scene parameter, respectively.Note that the two derivatives are inherently of different types -they are not computing the same quantities.The optimization is performed using a rendering resolution of 128 × 128 and 32 spps, while the images displayed in the figure are re-rendered in a higher resolution of 512 × 512 and 8192 spps.The scene is modified from[Bitterli 2016].

Figure 2 :
Figure 2: A motivating example of extended path space manifolds.
where v k ( ) (1 ≤  ≤ 3) denotes the three vertices of the surface triangle, and  (x, a, b, c) computes the barycentric coordinates of point x with respect to a triangle △abc.The fixed position constraint is usually applied to diffuse vertices and the two endpoints, i.e., the eye point and the point on the light source.4.2.3Fixed direction constraint.This constraint enforces a ray direction − −−−− → x  x  −1 to be locally unchanged in the local frame of its neighboring vertex x  :

Figure 5 :Figure 6 :
Figure 5: Visual comparisons between our method, Path Replay Backpropagation with reparameterization [Bangaru et al. 2020; Vicini et al. 2021] (PRB), PRB with a multi-scale scheme (PRB.mul.res), and Plateau-reduced Differentiable Path Tracing [Fischer and Ritschel 2022] (PRDPT).The Shadow scene also shows a result of our hybrid optimization scheme (ours+hybrid).The optimization of all scenes is performed using a rendering resolution of 128 × 128 and 32 spps, while the images displayed in the figure are re-rendered in a higher resolution of 512 × 512 and 8192 spps.(a) The Shadow scene contains an area light, a floor, and 400 spheres.The area light is put above the spheres and casts shadows onto the floor.The camera is put below the spheres and towards the floor so that the camera can see shadows on the floor.The optimizable scene parameters are the 2D translation vectors of all occluder spheres; Initially the spheres are placed to make a circle and the goal is to cast desired shadows like text 'SIGGRAPH ASIA'.(b) The CornellBox scene contains a refractive glass ball and six area light sources with different colors around the ball.Each area light generates a caustic pattern on the wall.The optimizable scene parameters are the rotating angles of the six light sources.(c) The Highlight scene contains five parallel glossy planes with microfacet GGX [Walter et al. 2007] of different roughness values, and five out-of-view area lights with different colors.The optimizable parameters are the rotation angles of the planes and the 1D horizontal translation offsets of the lights.

Figure 7 :
Figure 7: Additional results of our method (without hybrid scheme).The optimization of all scenes is performed using a rendering resolution of 128 × 128 and 32 spps, while the images displayed in the figure are re-rendered in a higher resolution of 512 × 512 and 8192 spps.Scenes Egg and Bedroom are modified from [Bitterli 2016].(a) Scene Egg.Optimizing the 1D translation offset of the light source through caustics.(b) Scene Bunny.Optimizing the 2D translation vector of the bunny object through nested reflections.(c) Scene Bedroom.Optimizing the camera pose.(d) Scene Glossy Ball.Optimizing the 2D translation vectors of three light sources via glossy highlights.(e) Scene Glass Slab.Optimizing the normal map with a 32 × 32 resolution of a refractive glass slab via refraction.(f) Scene Human.Optimizing 72 parameters of an SMPL human model[Loper et al. 2015] from its curved shadows on a staircase.