Amortizing Samples in Physics-Based Inverse Rendering Using ReSTIR

Recently, great progress has been made in physics-based differentiable rendering. Existing differentiable rendering techniques typically focus on static scenes, but during inverse rendering---a key application for differentiable rendering---the scene is updated dynamically by each gradient step. In this paper, we take a first step to leverage temporal data in the context of inverse direct illumination. By adopting reservoir-based spatiotemporal resampled importance resampling (ReSTIR), we introduce new Monte Carlo estimators for both interior and boundary components of differential direct illumination integrals. We also integrate ReSTIR with antithetic sampling to further improve its effectiveness. At equal frame time, our methods produce gradient estimates with up to 100× lower relative error than baseline methods. Additionally, we propose an inverse-rendering pipeline that incorporates these estimators and provides reconstructions with up to 20× lower error.


INTRODUCTION
Physics-based forward rendering synthesizes photorealistic images of fully described 3D scenes.Given its importance, decades of forward rendering research have led to numerous techniques capable of accurately reproducing complex lighting, including soft shadows, environmental illumination, and interreflection.
Inverse rendering instead infers parameters such as object shapes and reflectance from images of the scene.Due to the complexity of forward rendering, analytical inversions are generally infeasible.Typically, inverse rendering relies on numerical optimizations to find scene parameters that minimize differences (according to some rendering loss) between target and forward-rendered images.
Efficient inverse-rendering optimizations using gradient-based methods (e.g., stochastic gradient descent or Adam [Kingma and Ba 2014]) require differentiating physics-based forward rendering with respect to arbitrary scene parameters-a process known as physics-based differentiable rendering.
Unfortunately, most differentiable rendering techniques focus on static scenes.However, during inverse-rendering optimizations, the scene is updated dynamically by each gradient step.Exploiting temporal consistency between steps to improve efficiency has remained largely unexplored.
In this paper, we take a first step to leverage temporal data in the context of physics-based inverse direct illumination.Specifically, we adopt reservoir-based spatiotemporal importance resampling (ReSTIR) [Bitterli et al. 2020]-a powerful technique for real-time direct lighting in dynamic scenes-to significantly improve the efficiency of differentiable direct illumination under complex lighting conditions.
Concretely, we make the following contributions: • We introduce new ReSTIR-based Monte Carlo estimators for both interior and boundary components of differential directillumination integrals ( §4 and §5).
• We discuss the interplay between ReSTIR and pixel-level antithetic sampling [Yu et al. 2022] and introduce a simple technique to improve the effectiveness of antithetic sampling ( §6).
• We propose an efficient inverse-rendering pipeline that utilizes our ReSTIR estimators ( §7).Additionally, we discuss how multiview configurations can be supported efficiently by: (i) preserving reservoirs across multiple iterations; (ii) using slightly biased gradient estimates with respect to object shapes.

RELATED WORKS
We review works in forward and differentiable rendering of surfaces.
Forward rendering of dynamic scenes.Forward rendering played a major role in graphics for the past 50 years, with significant bodies of work on predictive and film rendering (e.g., Pharr et al. [2016]) and interactive techniques targeting gaming and visualization (e.g., Akenine-Moller et al. [2018]).There are common simplifications that reduce cost by assuming scenes remain static or have fixed animations.
Until recently, forward rendering for truly dynamic scenes was often quite biased, with image appeal trumping predictive convergence properties as rasterization efficiency pushed renderers to use approximations like ambient occlusion [Bavoil and Sainz 2009], image-space ray tracing [McGuire and Mara 2014], probe-based global light [McGuire et al. 2017], and precomputed lightmaps [Seyb et al. 2020] rather than more accurate light transport.
While researchers [Laine et al. 2020;Liu et al. 2019] have demonstrated differentiable rasterization, it remains limited to fairly simple scenes involving primary visibility.
Recent ray tracing hardware [Burgess 2020] offers an opportunity to use unbiased, physics-based light transport while dynamically changing a scene each frame, but current per-pixel ray budgets have limited use in many renderers to improve shadows [Heitz et al. 2018], reflections [Deligiannis and Schmid 2019], and diffuse global illumination [Majercik et al. 2019].
Resampling and ReSTIR.Recent work builds on importance resampling [Talbot et al. 2005] to amortize ray costs, aiming to minimize rendering overhead in dynamic scenes by spatiotemporally reusing sampled rays [Bitterli et al. 2020].This reservoir-based spatiotemporal importance sampling, aka ReSTIR, observes that spatially-and temporally-neighboring integrals have similar forms.With careful reweighting, samples can be reused despite being pulled from different integration domains [Lin et al. 2022].This is a form of filtering, similar to post-process denoisers [Schied et al. 2017], but applied to the sampling pdfs.While it assumes signals change smoothly, edgestopping functions and similar image-processing tricks [Durand and Dorsey 2002] can be applied to better handle sharper discontinuities.
While we build on Bitterli et al.'s [2020] early formulation to efficiently sample complex many-light dynamic scenes, ReSTIR can also reuse complex paths including diffuse global illumination [Ouyang et al. 2021], volumetric media [Lin et al. 2021], and more complex multi-bounce light transport [Lin et al. 2022].
Physics-based differentiable rendering.A main challenge in developing general-purpose differentiable renderers has been differentiating with respect to scene geometry, which generally requires calculating additional boundary integrals.To address this, Li et al. [2018] introduced a Monte Carlo edge-sampling method giving unbiased estimates of boundary integrals, but it requires detecting object silhouettes, which can be expensive.Later, reparameterization-based methods [Loubet et al. 2019;Bangaru et al. 2020] were introduced to avoid computing boundary integrals.Further, by differentiating Veach's path integrals [1997], Zhang et al. [2020] formulated differential path integrals, extending Monte Carlo differentiable rendering beyond unidirectional path tracing.

PRELIMINARIES 3.1 Direct Illumination
Under a path-integral formulation [Veach 1997;Pauly et al. 2000] of the rendering equation for surface-only direct illumination, the intensity  of a pixel is given by where  = M 3 (with M the union of all surfaces) is the one-bounce path space containing light paths x = ( 0 ,  1 ,  2 ) with  0 on the light and  2 on the detector;  is the area-product measure given by d ( x) = 2 =0 d(  ); and  is the measurement contribution function defined as Here  e and  e are the source emission and detector response functions, respectively;  s is the bidirectional scattering distribution function (BSDF); and  denotes the (visibility-aware) geometric term.
The pixel intensity  in Eq. ( 1) is often estimated by (i) tracing a camera ray from  2 that intersects the scene at surface  1 ∈ M, and (ii) sampling point  0 -which we term a light vertex-on the surface of a light source.This gives a Monte Carlo estimator: for  ( 1 ,  2 ) the joint probability density of sampling  1 ,  2 and the conditional probability density of drawing  0 .
In practice, the camera ray from  2 to  1 can be sampled using standard ray tracing.In the rest of this paper, we assume the perspective pinhole camera model.In this case,  2 becomes fixed at the center of projection and  1 is obtained by tracing a ray from  2 through a randomly selected point on the image plane.
With  1 and  2 obtained, importance sampling the light vertex  0 remains challenging-especially under complex illumination and visibility.Bitterli et al. [2020] recently introduced a technique that efficiently samples  0 .We briefly revisit this work in §3.2 and build on it later.

Reservoir-based SpatioTemporal Importance
Resampling (ReSTIR) for Direct Illumination Resampled importance sampling.Monte Carlo estimation of requires generating samples  with probability density proportional to some function p (e.g., p = ℎ when ℎ is nonnegative).But for complicated p, analytically sampling from this probability density can be difficult.Instead, resampled importance sampling (RIS) [Talbot et al. 2005] samples from a simple randomized distribution approximating p. First, RIS draws  independent candidates { 1 , . . .,   } from some other distribution  c .Then, one sample  is selected from the  candidates with (discrete) probability for  = 1, 2, . . ., .This gives a one-sample RIS estimator of Eq. ( 4): Naïve implementations of Eq. ( 6) generate and store all  candidates  1 , . . .,   before selecting final sample .This can be inefficient.Bitterli et al. [2020] proposed a streaming version using weighted reservoir sampling (WRS) [Chao 1982] offering significantly better efficiency.
Reservoir-based spatiotemporal importance resampling.To estimate direct illumination using Eq. ( 3), after generating the camera ray  2  1 , RIS (Algorithm 1) can be used to select the light vertex  0 by setting the target function p to the measurement contribution of the path ( 0 ,  1 ,  2 ).However, under complex illumination conditions, RIS needs to use many candidates (i.e., large ) to achieve high efficiency, which is typically impractical.
To address this problem, Bitterli et al. [2020] introduced reservoirbased spatiotemporal importance resampling (ReSTIR) that amortizes sampling costs by reusing spatial and temporal candidates via iterative applications of RIS.
Specifically, when rendering a pixel, ReSTIR first generates a reservoir using RIS (Algorithm 1).Then, the reservoir is merged with reservoirs from various neighboring pixels (i.e., spatial reuse) as well as prior frames (i.e., temporal reuse).To ensure unbiasedness, Bitterli et al. [2020] proposed recomputing r. after determining the sample r., as expressed in Algorithm 2. 1 This is because the target function p () for a pixel  generally differs from those p (  ) for its spatial/temporal neighbors   .Further, precautions must be taken when evaluating p (  ) (r.) (Line 9 of Algorithm 2) when   is from the previous frame (i.e., via temporal reuse) and the scene geometry varies between the previous and the current frames.Specifically, the evaluation should use the scene geometry of the previous frame with the light sample r. moved accordingly (by, for example, using motion vectors).
Algorithm 2 has recently been generalized by Lin et al. [2022] to allow reusing samples from a wider range of domains (including full path samples from varying domains).

Differential Direct Illumination
In physics-based differentiable rendering, Zhang et al. [2020;2021b] recently introduced a formulation of differential path integrals as well as several associated Monte Carlo estimators.Below, we briefly describe how these apply to direct illumination.
When scene geometry M evolves with parameters  ∈ R   (for integer   ≥ 1), Zhang et al. proposed a material-form reparameterization to facilitate differentiating Eq. ( 1) with respect to  .This parameterization introduces a motion X such that for any  , X(•,  ) is a differentiable one-to-one mapping from some predetermined reference configuration B (independent of parameters  ) to the scene geometry M ( ).To distinguish points in the ordinary 1 Our Algorithm 2 is identical to Algorithm 6 in Bitterli et al.'s [2020] work.Please refer to their paper for a proof of unbiasedness.
ALGORITHM 2: Combining reservoirs from spatial and temporal neighbors [Bitterli et al. 2020] 1 SpatiotemporalReuse(r, ,  geometry from those in the reference configuration, we term any  ∈ M ( ) as spatial points and  ∈ B as material points.
Applying the change of variable of x = X( p,  ) to Eq. ( 1) yields its material-form variant: where Ω := B 3 is termed the material path space, and with ∥ d ( x ) /d ( p) ∥ = 2 =0 ∥ d(  ) /d(  ) ∥ being the Jacobian determinant capturing this change of variable.
Zhang et al. showed differentiating Eq. ( 7) with respect to  equals where the interior component comes from differentiating the integrand f of Eq. ( 7).
The boundary integral instead covers material boundary path space  Ω, is unique to differentiable rendering, and usually depends on scene parameters  .For direct lighting,  Ω comprises material boundary paths p = ( 0 ,  1 ,  2 ) with exactly one boundary segment    +1 ( ∈ {0, 1}) such that corresponding spatial point   = X(  ,  ) is a jump discontinuity of the mutual visibility function V(  ↔  +1 ) when fixing  +1 = X( +1 ,  ).Further,   (  ) ⊤ ∈ R 1×3 is the unit normal of the visibility boundary at   , and d /d ∈ R 3×  is the boundary gradient.The differential measure  over  Ω is given by d  ( p) = dℓ (  ) ≠ d(  ) with ℓ being the curve-length measure.
Choice of reference configuration.In practice, to estimate gradients at  =  0 we use B = M ( 0 ) as the reference configuration for convenience.In this case, the material path space Ω coincides with ordinary path space , and mapping X(•,  ) reduces to the identity.In other words, a material point  is essentially the detached copy of its spatial representation  = X(,  ) at  =  0 .
We note that while the Jacobian ∥ d ( x ) /d ( p) ∥ in Eq. ( 8) is one at  =  0 , its gradient with respect to  generally remains nonzero.

Inverse Rendering and Radiative Backpropagation
Inverse-rendering optimizations.Optimization-based inverse rendering, or analysis by synthesis, infers scene parameters  ∈ R   by minimizing a predetermined rendering loss L (e.g.,  1 or 2 ) between rendered images 2 I ∈ R  I with  I pixels and user-specified targets In practice, additional regularization improves robustness.We omit such details as they are largely orthogonal to our work.Minimizing Eq. ( 10) with gradient-based methods such as stochastic gradient descent (SGD) or Adam [Kingma and Ba 2014] requires differentiating the rendering loss L with respect to parameters  .According to the chain rule, the gradient d  L := dL /d satisfies where  I L :=  L /I on the right comes from differentiable evaluation of the loss L, and d  I := dI /d is given by Eq. ( 9).
Radiative backpropagation.When the number of parameters   and pixels  I are large, storing the rendered image I and full computational graph (for reverse-mode automatic differentiation [Griewank and Walther 2008]) can lead to performance and storage problems.To this end, previous methods [Nimier-David et al. 2020;Vicini et al. 2021] apply the chain rule in Eq. ( 11) at the path-integral level.We do this by left-multiplying the gradient  I L on both sides of Eq. ( 9), yielding: where  denotes the pixel index where the path p contributes, and ( I L) [] ∈ R indicates the value in  I L at pixel  (where I and  I L have the same shape).Also, f () denotes the material measurement contribution of the path p to pixel , and Δ f ()  indicates the difference in f () across visibility boundaries (as per §3.3).
In §4 and §5, we derive our ReSTIR algorithms that efficiently estimate the interior and boundary integrals on the right-hand side of Eq. ( 12) for optimization-based inverse rendering.

RESTIR FOR INTERIOR INTEGRALS
We now describe our ReSTIR-based method to efficiently estimate the (vector-valued) interior component of Eq. ( 12): When computing gradients at a fixed  =  0 , as discussed in §3.3, we set the reference configuration to B = M ( 0 ).Note that during inverse-rendering optimization,  0 varies at each gradient step.
Overview.We introduce our differentiable RIS estimator in §4.1 followed by the full ReSTIR version-our main contribution of this section-in §4.2.Lastly, we discuss the relation between our method and forward-rendering ReSTIR as well as several other technical aspects in §4.3.

Differentiable Streaming RIS
Estimating Eq. ( 13) via Monte Carlo integration requires sampling material light paths p = ( 0 ,  1 ,  2 ).Since the material path space Ω coincides with the ordinary one ( 0 ), we can sample p by repurposing methods developed for forward rendering.
We first construct a camera ray from  2 and compute the closest intersection  1 between this ray and the scene (specifically, the reference configuration B).With  1 and  2 determined, we apply streaming RIS (Algorithm 1) to sample  0 on a light source by letting3 p ( 0 ;  1 ) =  e ( 0  1 )  s ( 0  1  2 )  ( 0 ↔  1 ), and  c ( 0 ) provided by standard light sampling (i.e., being proportional to emitted radiance  e ( 0  1 )).Using the resulting reservoir r from  candidate light samples, we obtain a one-sample differential RIS estimator: where p = (r., 1 ,  2 ) and r. is the reservoir's stored sample.
In practice, we efficiently compute Eq. ( 15) by applying reversemode automatic differentiation (AD) to where quantities surrounded by the detach() function are treated as constants (i.e., independent of  ) by AD.

Spatiotemporal Reuse
Building on our differential streaming RIS estimator in Eq. ( 15), we introduce a ReSTIR algorithm to reuse spatial and temporal candidates.Note that unlike forward rendering, where temporal reuse applies to neighboring frames, our technique reuses temporal candidates on consecutive inverse-rendering gradient steps.As outlined in Algorithm 4, our method consists of path sampling and gradient computation stages with only the latter needing differentiable computation.We detail both stages below.
Path sampling.This stage samples a set of material light paths p = ( 0 ,  1 ,  2 ) that are later used to compute gradients.As discussed in §4.1, we sample the path by: (i) constructing a camera ray from  2 on the detector and intersecting the scene at  1 (Line 6 of Algorithm 4); and (ii) drawing  0 on a light source using streaming RIS (Line 7).
To improve RIS effectiveness, we reuse reservoirs from the previous gradient step (Line 24).To this end, we record the shading point  1 (where RIS occurs) in each reservoir as r. (Line 8).After obtaining a reservoir r using RIS, we find neighbor reservoirs  such that each r ′ ∈  has its shading point r ′ .near r. (Line 9).We accelerate this nearest neighbor search using point Kd-trees [Wald 2022].After finding neighbors , we combine them with r using Algorithm 3 (Line 10), which is equivalent to Algorithm 2 except for having the target function p (•;  ′ ) defined with respect to a shading point  ′ (as opposed to a pixel).The light sample r. after combining reservoirs completes the path sample p = (r., 1 ,  2 ).
Updating reservoirs.We note that as previous parameters  prev generally differ from current step values  cur , reservoirs must be updated (Line 3) before reuse.Specifically, any prior step reservoir r stores light sample r. and shading point r. from prior reference configuration B prev = M ( prev ).The current step must instead use reference configuration B cur = M ( cur ).
To keep r. and r. updated, we focus on the common inverserendering setting that represents scene geometries M ( prev ) and M ( cur ) with triangle meshes of identical topology (but possibly differing vertex positions).We record indices of triangles containing r. and r. and corresponding barycentric coordinates.Then, during the updateInteriorReservoirs() on Line 3, we recompute r. and r. for each reservoir from the stored triangle indices, barycentrics, and now-current vertex positions (see Figure 2).Lastly, ALGORITHM 4: Our streaming RIS with spatiotemporal reuse for the interior integral in Eq. ( 13 after updating all reservoirs, we rebuild the acceleration structure used for efficient nearest-neighbor search (Line 9).
Gradient computation.After path sampling, our gradient computation evaluates path contributions using Eq. ( 15) with gradient d f () /d obtained using reverse-mode automatic differentiation.As differentiation of f () is local (i.e., occurs per path), no global computation graph is needed-similar to previous radiative backpropagation methods [Nimier-David et al. 2020;Vicini et al. 2021].

Discussion
Relation to forward-rendering ReSTIR.Although our ReSTIR-based technique expressed in §4.2 is closely related to forward-rendering ReSTIR [Bitterli et al. 2020;Lin et al. 2022], there are several technical differences.Notably, previous ReSTIR methods store reservoirs in the screen space.Our method, on the other hand, stores reservoirs at shading points r. over the reference surface B. Doing so not only allows our stored reservoirs to evolve with the scene geometry naturally, but also enables efficient multi-view rendering with minibatching-which we will discuss in §7.
Visibility awareness.The original ReSTIR by Bitterli et al. [2020] proposed an option that neglects visibility from light vertex  0 to the shading point  1 for faster forward rendering: with  0 the geometric term  with visibility V( 0 ↔  1 ) omitted.Instead, we consider the full visibility in Eq. ( 14)-which has been demonstrated necessary for ensuring unbiasedness [Bitterli et al. 2020;Lin et al. 2022].Further, this allows the sampling of  0 to be more occlusion-aware, improving result quality.
Rejecting reservoirs.Occasionally, a prior reservoir r ′ (returned by the PickNeighbors() function) near the current one r can have a very different surface normal.In other words, (r ′ .)•(r.)≪ 1 with () the surface normal at .In this case, we follow Bitterli et al. and exclude any reservoirs r ′ with (r ′ .)• (r.)below some user-specified threshold from the SpatiotemporalReuse () call.

RESTIR FOR BOUNDARY INTEGRALS
We now show how to apply ReSTIR to estimate the boundary component of Eq. ( 12) from §3.3: (18) Since this boundary integral is unique to differentiable rendering, our technique, to our knowledge, is the first to utilize ReSTIR for efficient estimation of these integrals.
Two types of boundary paths.Recall that a material boundary path p is identical to an ordinary path except it has exactly one boundary segment.For direct illumination, shown in Figure 3, there are two boundary paths types depending on the location of boundary segment    +1 .Any path with  = 1 has endpoint  2 on the detector and is called a primary boundary path; any with  = 0 has endpoint  0 on a light and is called a secondary boundary path (see Figure 3).
Previously, contributions from primary paths could be estimated relatively easily, but the secondary ones remained more challenging.In §5.1 and §5.2, we design new ReSTIR-based estimators specifically to sample both types of boundary paths.

Sampling Primary Boundary Paths
A primary boundary path has its boundary segment on the camera ray (see Figure 3-a).Similar to sampling ordinary paths (in §4.1), we generate a primary boundary path by: (i) constructing a camera ray defining  1 and  2 ; and (ii) drawing  0 on a light source.In the first step, we efficiently pick the camera ray using Monte Carlo edge sampling [Li et al. 2018] with object silhouettes projected onto the image plane during preprocessing.
With  1 and  2 determined, we sample the light vertex  0 using streaming RIS (Algorithm 1) with the same p and  c functions as the interior integral ( §4).This gives the following RIS-based estimator: where p = (r., 1 ,  2 ).
In practice, we efficiently compute Eq. ( 19) by applying reversemode automatic differentiation to where "•" denotes a dot product, and  1 outside the detach() function is determined by differentiable ray intersection between the camera ray and the scene.
Spatiotemporal reuse.The similarity in sampling  0 for ordinary and primary boundary paths allows us to improve the effectiveness of our boundary estimator in Eq. ( 19) by reusing reservoirs generated to estimate the interior component (discussed in §4).
Specifically, Algorithm 5 outlines how we reuse these reservoirs from the prior gradient step (and updated via updateReservoirs() in Algorithm 4).Normally it requires only a few samples to estimate the primary boundary, so we opt not to store new reservoirs (obtained after Line 10) in Algorithm 5 to simplify the overall pipeline.

Sampling Secondary Boundary Paths
Unlike primary boundary paths, secondary paths (as illustrated in Figure 3b) are more challenging as they need knowledge of object silhouettes with respect to arbitrary shading locations  1 .
Multi-directional sampling.Previously, Li et al. [2018]  This sampling process gives an ordinary Monte Carlo estimator: where with the Jacobian determinant for changing from joint probability density  ( 0 ,  1 ) to  ( B ,  0 ).Refer to Zhang et al. [2020] for more details, including the derivation of  B .
Streaming RIS.The first step (S.1) in multi-directional sampling draws light vertex  0 given  B on a triangle edge.For scenes with complex illumination and visibility, sampling  0 via streaming RIS (Algorithm 1) improves efficiency.To this end, given  B we set p as with  snd bnd defined in Eq. ( 22), and standard light sampling with pdf  c .Here we neglect gradient term   ( 0 ) ⊤ d 0 d as it is difficult to obtain in the path sampling stage.This leads to our RIS estimator: which essentially replaces  ( 0 |  B ) in the ordinary Monte Carlo estimator of Eq. ( 21) with 1/r. as  0 is now drawn with RIS.With automatic differentiation, Eq. ( 25) can be computed as: Spatiotemporal reuse.To improve our estimator's efficiency, we introduce a new spatiotemporal reuse scheme for secondary boundary paths.As described in Algorithm 6, we record  B in each reservoir r (Line 9) and use this point for nearest neighbor searches (Line 10), rather than using shading point  1 as in our other computations.After merging r with nearby reservoirs from the previous gradient step, we trace a ray to obtain  1 (S.2) to complete the path sample p = (r., 1 ,  2 ).
As in Algorithms 4 and 5, the sampled paths are then evaluated using automatic differentiation (Line 24 of Algorithm 6) via Eq.( 26).
Importance sampling of  B .Before drawing  0 on a light, Zhang et al.'s [2020] multi-directional sampling picks  B from a triangle edge (Line 7).A naïve implementation could (i) select an edge   with probability   proportional to edge length ∥  ∥; and (ii) draw  B uniformly from   .But this loses efficiency in complex scenes if it frequently returns samples  B not visible to any light.
To address this problem, we leverage our reservoirs to guide edge sampling as follows.For an edge   containing   reservoirs (i.e.,   reservoirs r with r. lying on   ), we set its (discrete) sampling probability   to be proportional to the average weight of the   reservoirs modulated by the edge length ∥  ∥: Importance sampling for edges using Eq. ( 27) significantly improves performance, as we demonstrate in §8.1.Various guiding methods in primary-sample space [Zhang et al. 2020;Yan et al. 2022]  edge importance sampling is a by-product of ReSTIR sample reuse and is orthogonal to these methods.

RESTIR AND ANTITHETIC SAMPLING
When differentiating pixel reconstruction filters with respect to object geometries using PSDR [Zhang et al. 2020], Yu et al. [2022] demonstrated that pixel-level antithetic sampling is crucial for ensuring fast convergence.
As illustrated in Figure 4, antithetic sampling shoots two or four camera rays whose image plane intersections exhibit point symmetry with respect to the pixel center.Additionally, these camera rays must be correlated so the derivatives of their contributions cancel.
r b e r Q / r c 1 5 a s o q e Y 7 A A 6 + s X 2 R a d p g = = < / l a t e x i t > p ? 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " J d g 3 9 x a J C Y q q S q j z G S i y M R a I P q Q 2 R 4 7 i t V d u J b A e p i v I R f A M r z G y I l U 9 g 5 E 9 w 2 g y 0 5 U i W j 8 6 5 V / f e E 8 S M K u 0 4 3 1 Z p b X 1 j c 6 u 8 X d n Z 3 d s / s A + r b e r Q / r c 1 5 a s o q e Y 7 A A 6 + s X 2 R a d p g = = < / l a t e x i t > p ? 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " J d g 3 9 3 Fig. 4. Pixel-level antithetic sampling: To differentiate pixel reconstruction filters using the PSDR formulation [Zhang et al. 2020], Yu et al. [2022] showed pixel-level antithetic sampling is crucial to efficiently estimate derivatives relative to object shape.(a) When computing interior integrals ( §4), for each camera ray intersection at  ⊥ , Yu et al. [2022] proposed generating three correlated rays through  ⊥ 1 ,  ⊥ 2 , and  ⊥ 3 symmetrically around the pixel center.(b) When handling pixel boundaries ( §6), which emerges from pixel filter discontinuities and can be handled identically to primary boundaries ( §5.1), these correlated rays are sampled on the pixel boundary.
Previously, multi-path correlation was normally handled by forcing paths to use identical pseudo-random sequences by reusing the random seeds when sampling each path.
Unfortunately, this is insufficient when using ReSTIR.As shown in Figure 5-a, even with identical random numbers, the nearest neighbor search at two antithetic shading points  (0)
As we show in §8, weakened correlation among antithetic paths increases variance.To address this, we force all antithetic paths to reuse one reservoir set . Precisely, let r ( ) be a reservoir generated using streaming RIS (Line 9) for the -th correlated path, and be the corresponding nearest neighbor reservoirs.We randomly select one set of reservoirs  * ∈ { ( ) :  = 0, 1, . ..} for spatiotemporal reuse (Line 10) across all correlated paths: for all .We illustrate this process in Figure 5-b.

I m a g e p l a n e
Pixel center (a) < l a t e x i t s h a 1 _ b a s e 6 4 = " S P 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " S P 9 We randomly select one antithetic shading point and force others to use the same reservoir set.Here  is chosen, so both  and  * reuse  = {r 1 , r 2 }.This better correlates antithetic paths, improving sampling efficiency.This works for both interior and pixel-boundary paths-which emerge when the pixel reconstruction filter is discontinuous (e.g., box) and behave identically to primary boundaries.
After each gradient step, we compute the full gradient d  L (Line 6) and update scene parameters (Line 15) via simple gradient descent (i.e.,  ←  −  d  L for step size   > 0) or more advanced methods like Adam [Kingma and Ba 2014].
Burn-in stage.To reduce variance even when starting an inverserendering optimization we add an optional "burn-in" stage.During  this stage, we perform ReSTIR based sampling without backpropagating gradients to obtain temporal information for our method.This makes selecting learning rates easier.In this stage, we repeatedly render the initial configuration (without calculating d  L or applying gradient descent), allowing ReSTIR to establish reservoirs throughout the scene.We note that, as differentiable evaluation (which takes most of the time) is not needed during burn-in, iterations are typically much faster than those during optimization.
In practice, we perform about 32 burn-in iterations for our inverserendering experiments.
Supporting multi-view configurations.Thus far, our derivations assume only one target image I 0 and one rendered image I for inverse-rendering optimizations.In practice, such single-image configurations are insufficient to solve many real-world inverse rendering problems.Instead, multi-view configurations observing the same scene under multiple viewing conditions are common.Since these configurations keep scene (and lighting) fixed across multiple views, light samples stored in reservoirs (i.e., r. for each reservoir r) remain valid.This allows us to use one set of "resvr" and "edgeResvr" buffers to store reservoirs from all views, largely reusing our single-image pipeline for multi-view optimizations.
But when mini-batching over camera views (i.e., using random subsets of views for each gradient step), reservoirs from the prior step may offer little help if not visible in the current views.
To address this problem we observe that, at each iteration, regions of the scene invisible to all randomly selected camera views will not be updated by gradient-descent because there is generally no gradient.Based on this observation, after each iteration, we only replace previous reservoirs visible to at least one randomly selected view with newly generated reservoirs; we preserve all occluded reservoirs for the next iteration.
We detail this process in Algorithm 8.For each camera view , our method applies the same process (Lines 10-13) to estimate the gradient as the single-view variant expressed in Algorithm 7.Then, we preserve all previous reservoirs contained in resvr ( −1) and edgeResvr ( −1) whose shading points are occluded to all camera views in the current mini-batch by "forwarding" them to resvr ( )  and edgeResvr ( ) (Lines 15-24) for future reuse.
Additionally, when solving for object shapes using multi-view images, combining reservoirs using Algorithm 2 would require storing copies (e.g., one per view) of scene geometries (and corresponding acceleration structures)-which can be expensive.In practice, for better performance, we introduce a small amount of bias by checking visibilities (required by Line 9 of Algorithm 2) using only scene geometry from the current iteration.To minimize the amount of bias introduced, we use small learning rates for these optimizations.
Computational overhead.Overall, when applied for inverse rendering, our technique introduces the following computational overhead (compared with streaming RIS): (i) the burn-in stage mentioned above; (ii) nearest-neighbor searches (e.g., Line 9 of Algorithm 4); and (iii) combining reservoirs (e.g., Line 10 of Algorithm 4).
In our experiments, using equal sample, the overheads combined take no more than 20% of the total optimization time.When comparing inverse-rendering results, we allow baseline methods to use higher sample counts so that all methods use approximately identical total optimization times.

RESULTS
Experiment setup.We employ ReSTIR [Bitterli et al. 2020] to the PSDR [Zhang et al. 2020] formulation to improve the Monte-Carlo sampling efficiency of PSDR in differentiable and inverse rendering.To show our technique's practical use, we compare differentiableand inverse-rendering performance against two baseline methods: B.1 PT: The first baseline is a standard Monte Carlo method generating one light sample per camera ray.This is widely adopted by physics-based differentiable renderers including Mitsuba 3 [Jakob et al. 2022] and PSDR [Zhang et al. 2020].
B.2 RIS: Our second baseline uses streaming RIS (Algorithm 1).This is the most relevant baseline as it outperforms PT (B.1) for complex lighting and differs from our technique only by performing no spatiotemporal reuse.We implement our method and the two baselines using the PSDR formulation on a GPU-based wavefront differentiable renderer.All results are generated on a workstation with an AMD Rayzen 9 5900X 12-core CPU, 32GB of RAM, and an NVIDIA RTX 3090 GPU.
All our inverse rendering results use unbiased gradient estimates expect those using multi-view input images to optimize object geometries (as discussed at the end of §7).

Evaluation and Ablation
Differentiable-rendering comparisons.We validate our technique and compare it to the two baselines (B.1 and B.2) via differentiable rendering in Figure 6.
The first example in this figure uses the "Knot" with translating geometry, and we compute derivatives with respect to the knot  translation.As illustrated on the top, lighting includes one bright spotlight and two dim fill lights.The second example shows a rotating "Key" lit by one small and one occluded large area light.We compute derivatives with respect to the key rotation angle.For both scenes, the baselines often draw samples from bright lights that make little contribution, leading to high variance.Our method reuses valuable samples from prior frames, quickly adapting to this setting and providing significantly cleaner gradient estimates that closely match the reference.
Per-component ablation.We recall that, under the PSDR formulation [Zhang et al. 2020], a full derivative generally equals the sum of an interior ( §4), a primary boundary ( §5.1),4 and a secondary boundary ( §5.2) components.To evaluate our technique's effectiveness across these components, we conduct an ablation in Figure 7 using the same Knot as Figure 6.Without spatiotemporal reuse, RIS (B.2) suffers from high variance (see Figure 7-a).Using our method only on the interior component, derivative estimates on the knot surface become significantly better (see Figure 7-b1).By also applying our technique to primary-boundaries, derivatives along the silhouette of the knots improve (see Figure 7-b2).Also using our method for secondary-boundaries reduces variance around shadow boundaries (see Figure 7-b3).Lastly, by importance sampling edge vertices, further variance reduction can be achieved (see Figure 7-b4).
Antithetic sampling.As discussed in §6, standard antithetic sampling of reconstruction filters, which is crucial to estimate derivatives relative to object geometry, can be insufficient when using ReSTIR.We demonstrate this in Figure 8 for a "Pig" scene using a standard box filter and containing a diffuse pig lit by an area light.We estimate derivatives with respect to the vertical displacement of the pig.At equal time, without pixel-level antithetic sampling, the derivative estimates are extremely noisy (Figure 8-c).The antithetic sampling proposed by Yu et al. [2022] greatly reduces variance but still suffers from high variance where gradients are smooth (Figure 8-d).By forcing antithetic shading points to reuse identical sets of reservoirs, our method offers estimates with significantly improved quality (Figure 8-e).
Forward-only vs. full ReSTIR.According to the chain rule in Eq. ( 11), the gradient d  L of rendering loss L is the product of  I L (gradient of L with respect to forward render I) and d  I (derivative of forward render I with respect to scene parameters  ).
For inverse rendering, one can apply forward-rendering ReSTIR only to improve I (and its gradient  I L) while leaving d  I handled with methods like RIS (B.2).We compare with this baseline in Figures 9 and 10 as follows.
Figure 9 shows a "Tree" lit by one small area light and one occluded large light.We compare forward-and differentiable-rendering (relative to the tree's rotation around its vertical axis) of the cast shadow using three configurations: • RIS + RIS uses RIS for estimating both I and d  I; • Ours + RIS uses ReSTIR for estimating I and RIS for d  I; • Ours + Ours is our full method, using ReSTIR for I and d  I.   rendering, so they enjoy low-variance estimates of I and gradient  I L. 5 But "Ours + RIS" does not apply ReSTIR for derivative d  I, producing much the same noise as the "RIS + RIS" baseline.Thus this approach's final gradient d  L, given by the chain rule, also suffers from high variance.We note that, while d  I is not explicitly computed for radiative backpropagation methods [Nimier-David et al. 2020;Vicini et al. 2021], the variance in this term's estimates still affect the final gradient d  L.
To further show d  L's impact, we compare inverse-rendering performance on the Tree in Figure 10, where we optimize tree rotation angle by only looking at its shadow.We reuse the optimization setup (e.g., optimizer and learning rates) and adjust light candidate counts (i.e., ) so all methods take roughly equal iteration time.
As demonstrated in Figure 10, both "RIS + RIS" and "Ours + RIS" fail to converge to ground truth due to high variance in the gradient d  L. Our full method "Ours + Ours", however, allows the optimization to converge smoothly. 5Another major benefit for having low-variance forward rendering I is to lower the bias of the gradient  I L when  I L is nonlinear with respect to I.

Inverse-Rendering Comparisons
We now show further inverse-rendering comparisons with our baselines (B.1 and B.2).We adjust sample counts so all methods take roughly equal optimization time (including our burn-in from §7).
Please see Table 1 for performance statistics and the supplemental material for animated versions of these results.
Texture optimization.We show texture reconstruction in Figures 1, 11, and 12.As there is no differentiation with geometry, we apply ReSTIR only for forward rendering and the interior component ( §4).
Figure 1 contains an "Oxalis" scene with a painting is lit by two bright spotlights and a dim fill light.Initialized with a constant gray image, we optimize the spatially varying painting albedo.The baselines perform okay where directly lit by bright spotlights and very poorly in darker regions lit only by the fill light.Baseline reconstruction results severely overestimate albedos inside the spotlights and underestimate those outside.With spatiotemporal reuse, our method performs significantly better than baselines in darker regions, yielding a far less biased reconstruction.
Figure 11 uses a "Painting" scene modeled after the Veach Ajar scene, with a new fill light in the room.Initialized with a constant gray image, we optimize the spatially varying albedo of the painting.Figure 12 shows a "Kitty" under environmental illumination.It is backlit (i.e., most light from behind), causing the baselines to produce high variance on the front.Our method produces much cleaner estimates on the front, giving a less biased and more detailed reconstruction (demonstrated by the bottom row re-renderings).
Shape optimization.Lastly, we show shape reconstruction in Figures 13 and 14.In both, we use one or multiple images of an object under complex illumination and optimize object shape (expressed as triangle meshes).As geometric derivatives are computed, we apply our method interior ( §4) and boundary ( §5) components.
Figure 13 uses an "Octagon" lit by 17 spotlights, casting shadows on the walls (shown in "Config.").Using 17 images of casting shadows on the walls, we reconstruct the geometry from a sphere.
Figure 14 use a "World Map" where a relief is on a wall lit by a large area light partially occluded through a window (shown in "Config.").Initialized with a flat rectangle, we optimize the relief shape using six input images (with only one shown).
For both examples, because of the complex lighting, both PT (B.1) and RIS (B.2) produce noisy forward renderings and gradient estimates.This causes inaccurate reconstructions with visible artifacts (e.g., along the left map edge).Our method, on the contrary, offers the robustness to produce clean reconstructions.

DISCUSSION AND CONCLUSION
Limitations and future work.Our technique builds on Bitterli et al.'s [2020] early formulation that supports only direct lighting.Extending our technique to handle more advanced reuse [Lin et al. 2021[Lin et al. , 2022] ] to support multi-bounce light transport is an important topic for future research.Also, when using multiple input images, our technique focuses only on multi-view configurations observing a static scene under different viewing conditions.Extending our work to support settings that allow, for example, illumination conditions and object poses to change across images would benefit many inverse-rendering applications.
Lastly, other forms of amortized sampling beyond ReSTIR's spatiotemporal reuse are worth exploring.

Conclusion.
In this paper, we introduced a technique that exploits temporal consistency in the context of physics-based inverse direct illumination.Specifically, by adopting the reservoir-based spatiotemporal important resampling (ReSTIR) framework, we developed new Monte Carlo methods to efficiently estimate both interior and boundary components of differential illumination integrals under complex illumination conditions.Incorporating these ReSTIR-based estimators, we further proposed a new pipeline for physics-based inverse direct illumination.
We validated our technique by comparing derivatives estimated with unbiased baselines and our methods.Additionally, we demonstrated the effectiveness of our method using several differentiablerendering and inverse-rendering experiments.

Fig. 1 .
Fig. 1.Our new technique reuses temporal data in the context of physics-based inverse direct illumination.During inverse-rendering, when optimizing a scene iteratively using gradient-based methods such as stochastic gradient descent or Adam [Kingma and Ba 2014], we reuse light samples spatially and temporally (across iterations), offering significantly cleaner forward rendering and gradient estimates than baseline methods without reuse.This example uses the "Oxalis" painting lit by several bright spot lights and a dim fill light.We optimize the spatially varying albedo (initialized using the gray texture shown) of the painting.(c, d) The baseline methods PT (B.1) and RIS (B.2) produce high variance-especially in dark areas only lit by the fill light, causing highly biased results.(b) Our method, on the other hand, enjoys significantly more accurate reconstructions.

Fig. 2 .
Fig. 2. Updating a reservoir's light sample  cur from  prev in the previous gradient step uses stored triangle ID and barycentric coordinates along with new vertex positions from B cur .

Fig. 3 .
Fig. 3.Primary and secondary boundary paths with boundary segments in red.We estimate contributions of these paths separately ( §5.1 and §5.2).
Boundary sampling< l a t e x i t s h a 1 _ b a s e 6 4 = " 4 x C 7 P 1 u u s d c m a z x y g h b L e P g F Y J Z i A < / l a t e x i t >

Fig. 5 .
Fig. 5. Correlating reservoirs: (a) Independent spatiotemporal reuse at antithetic shading points  and  * may reuse different reservoirs.Here,  = {r 1 , r 2 } will be used at  and  * = {r 2 , r 3 } at  * .This weakens antithetic path correlation, reducing antithetic sampling's effectiveness.(b)We randomly select one antithetic shading point and force others to use the same reservoir set.Here  is chosen, so both  and  * reuse  = {r 1 , r 2 }.This better correlates antithetic paths, improving sampling efficiency.

Fig. 6 .
Fig. 6.Differentiable-rendering comparisons with PT (B.1) and RIS (B.2).Both examples use dynamic scenes (including 250 frames) where the Knot translates horizontally (top) and the Key rotates around the vertical axis (bottom).The derivatives are computed with respect to the translation for Knot and rotation angle for Key.The derivative renderings shown are for the last frame.At equal per-frame time, our method (c) produces significantly cleaner derivative estimates than the baselines (d, e).

Fig. 7 .
Fig. 7. Our technique for individual components: Using the same "Knot" scene from Figure 6, we demonstrate the effectiveness of our method on individual components of the derivative by applying it to: (b1) the interior component only; (b2) the interior and the primary boundary components; (b3) all components; and (b4) all components with importance sampling edge points.The derivative images use the same color map as Figure 6.

Fig. 8 .
Fig.8.Antithetic sampling: When differentiating with respect to object geometries using PSDR[Zhang et al. 2020], it is crucial to apply antithetic sampling at the pixel level[Yu et al. 2022], as shown in (c) and (d).Further, when ReSTIR is used, our method-which forces all antithetic shading points to reuse the same set of reservoirs-offers further variance reduction, as shown in (e).The derivative images use the same color map as Figure6.

Fig. 9 .
Fig. 9. Forward-only vs. full ReSTIR: This example contains a "Tree" lit by one small and one occluded large area lights.Using a predefined loss L ( I ) := ∥ I − I 0 ∥ 2 2 with some fixed I 0 , we compare estimates of gradients  I L and d  I-whose product gives the gradient d  L of the loss L. Without applying ReSTIR (c), both estimates are noisy.When applying ReSTIR only for forward rendering (d), one gets better estimates of I, improving the gradient  I L. Applying ReSTIR fully (e) produces low-variance estimates for both gradients.
Fig.10.Forward-only vs. full ReSTIR: Using the same "Tree" scene as Figure9, we compare inverse-rendering results where we optimize the rotation angle of the tree-like object by only looking at its cast shadow.Our full method ("Ours + Ours") allows the optimization to smoothly converge to the groundtruth while the other methods fail due to high-variance estimates of the gradient d  L.

Fig. 11 .
Fig. 11.Inverse-rendering comparison using the "Painting" setup modeled after the Veach Ajar scene with an added fill light in the room.We optimize the painting's spatially varying albedo (initialized using a gray texture).(c, d) The baseline methods (B.1 and B.2) suffer from extremely high variance, thus failing to reconstruct the map.(b) Our method, on the other hand, offers the robustness to reconstruct the target albedo map (a) nicely.The albedo error on the top-right is used for evaluation only.

Fig. 12 .
Fig. 12. Inverse-rendering comparison using the diffuse "Kitty" under environmental illumination.We optimize the spatially varying albedo (initialized with a constant texture) of the Kitty.(c, d) The baseline methods (B.1 and B.2) produce high variance on front of the Kitty, causing over-blurring and color shifts in the reconstructions.(b) Our method, on the other hand, provides significantly more accurate results.

Table 1 .
Inverse-rendering configurations and performance statistics.In this table, "Trg."indicates the number of target images; "Bat." is the size of mini-batches (i.e., the number of images rendered per iteration); "Param."denotes the number of scene parameters being optimized; "Iter." is the number of iterations; and "DR" indicates the time spent per iteration on differentiable rendering using our method.