Nerfstudio: A Modular Framework for Neural Radiance Field Development

Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and practitioners to incorporate NeRF into their projects. Additionally, the modular design enables support for extensive real-time visualization tools, streamlined pipelines for importing captured in-the-wild data, and tools for exporting to video, point cloud and mesh representations. The modularity of Nerfstudio enables the development of Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality, while also remaining flexible to future modifications. To promote community-driven development, all associated code and data are made publicly available with open-source licensing at https://nerf.studio.


INTRODUCTION
Neural Radiance Fields (NeRFs) [Mildenhall et al. 2021] are gaining popularity for their ability to create 3D reconstructions in realworld settings, with rapid research in the area pushing the field forward.Since the introduction of NeRFs in 2020, there has been an influx of papers focusing on advancements to the core method including few-image training [Wang et al. 2021b;Yu et al. 2021], explicit features for editing [Liu et al. 2020;Wang et al. 2022;Zhang et al. 2022], surface representations for high-quality 3D mesh exports [Oechsle et al. 2021;Wang et al. 2021a;Yariv et al. 2021], speed improvements for real-time rendering and training [Fridovich-Keil et al. 2022;Müller et al. 2022;Sun et al. 2022], 3D object generation [Poole et al. 2023], and more [Xie et al. 2022].
These research innovations have driven interests in a wide variety of disciplines in both academia and industry.Roboticists have explored using NeRFs for manipulation, motion planning, simulation, and mapping [Adamkiewicz et al. 2021;Byravan et al. 2022;Driess et al. 2022;Kerr et al. 2022;Simeonov et al. 2022;Zhu et al. 2022].NeRFs are also explored for tomography applications [Rückert et al. 2022], as well as perceiving people in videos [Pavlakos et al. 2022].Visual effects and gaming studios are exploring the technology for production and digital asset creation.News outlets capture NeRF portraits to tell stories in new formats [Watson et al. 2022].The potential applications are vast, and even startups 1 are emerging to focus on deploying this technology.
Despite the growing use of NeRFs, support for development is still rudimentary.Due to the influx of papers and lack of code consolidation, tracking progress is difficult.Many papers implement features in their own siloed repository.This complicates the process of transferring features and research contributions across different implementations.Additionally, few tools exist to easily run NeRFs on real-world data collected by users.To address these challenges, we present Nerfstudio (Fig. 1), a modular framework that consolidates NeRF research innovations and makes them easier to use in real-world applications.
Furthermore, while NeRFs solve an inherently visual task, there is a lack of comprehensive and extensible tools for visualizing and interacting with NeRFs trained on real-world data.Despite the availability of several NeRF repositories, existing implementations are often focused on achieving state-of-the-art results on metrics such as PSNR, SSIM, and LPIPS.These evaluations are typically based on held-out images along the capture trajectory that are similar to the training images.This often makes them misleading indicators of performance for many real-world applications when data is captured in unstructured environments and novel views are rendered with large baselines.Qualitative evaluations have historically been a challenge due to the computational demands of NeRF, which often resulted in rendering times up to multiple seconds per image.Recent developments such as Instant-NGP [Müller et al. 2022] significantly reduce computational overhead, enabling real-time training and rendering.However, Instant-NGP relies significantly on GPU acceleration with custom CUDA kernels, making development and quick prototyping a challenge.We present a framework that enables interactive visualizations while also being flexible and model-agnostic.
Nerfstudio is an extensible and versatile framework for neural radiance field development.Our design goals are the following: (1) Consolidating various NeRF techniques into reusable, modular components.
(2) Enabling real-time visualization of NeRF scenes with a rich suite of controls.A dictionary of losses supervises the pipeline end-to-end.
(3) Providing an end-to-end, easy-to-use workflow for creating NeRFs from user-captured data.For modularity, we devise an organization among components across various NeRFs that allows abstracting away method-specific implementations.Our real-time visualizer is designed to work with any model during training or testing.Furthermore, the visualizer is hosted on the web, making it accessible without requiring a local GPU machine.The modular nature of our framework facilitates the integration of various data input formats, thereby simplifying the workflow for incorporating user-captured real-world scenes.We provide support for images and videos with various camera types, as well as other mobile capture applications (Polycam, Record3D, KIRI Engine) and outputs from popular photogrammetry software like RealityCapture and Metashape.In particular, integration with these applications enable users to by-pass structure-from-motion tools like COLMAP [Schönberger and Frahm 2016], which can be time-consuming.Furthermore, we provide support for multiple export formats, including video, depth maps, point clouds, and meshes.
The modularity of Nerfstudio enables developing Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality.We show that this method is comparable to the other state-of-the-art methods such as MipNeRF-360 [Barron et al. 2022] while achieving an order of magnitude speedup.We also conduct an ablation study that demonstrates its flexibility on a new in-the-wild dataset consisting of 10 in-the-wild scenes.Our findings highlight the limitations of commonly used NeRF metrics and the importance of a real-time viewer for qualitative assessments.The potential of our framework as a consolidated codebase for NeRF research is reflected in the traction thus far with extensions such as SDFStudio [Yu et al. 2022].Furthermore, Nerfstudio is an open-source project with active improvements from both academic and industry contributors.

RELATED WORKS 2.1 Frameworks and tools
Software frameworks have played a crucial role in consolidating and driving the advancement of various fields.In deep learning, Caffe [Jia et al. 2014], TensorFlow [Abadi et al. 2016], and Py-Torch [Paszke et al. 2019] provide readily usable machine learning functionalities.Similarly, frameworks such as PyTorch3D [Ravi et al. 2020] and Kornia [Riba et al. 2020]

Sculpture
Figure 3: Nerfstudio Dataset.Our Nerfstudio Dataset contains 10 scenes: 4 phone captures with pinhole lenses and 6 mirrorless camera captures with a fisheye lens.We focus our efforts on real-world data, and these scenes can help benchmark progress.
Taichi [Hu et al. 2019], andReyes [Cook et al. 1987] for graphics, Phototourism [Snavely et al. 2006] and COLMAP [Schönberger and Frahm 2016;Schönberger et al. 2016a,b] for photogrammetry and visualization, and AverageExplorer [Zhu et al. 2014] for data collection.Despite the diversity of topics covered, each of these frameworks originated from the need to provide reusability and reproducibility to a rapidly expanding field.In light of the fast-paced growth of NeRFs in both academia and industry, Nerfstudio aims to streamline advancements in neural rendering by offering a flexible and comprehensive framework for development.

Neural rendering frameworks
Concurrent efforts such as NeRF-Factory [Jeong et al. 2022], Ner-fAcc [Li et al. 2022], MultiNeRF [Mildenhall et al. 2022b], and Kaolin-Wisp [Takikawa et al. 2022] all make significant efforts in advancing the usability of NeRFs.While NeRF-Factory consolidates multiple prior works into a single repository, it places less emphasis on reusable modules shared across these prior works and focuses more on benchmarking.NerfAcc prioritizes pythonic modularity, but focuses primarily on the lower-level components rather than the entire pipeline.Kaolin-Wisp and Multi-NeRF each consolidate multiple paper implementations into a single repository.None of these repositories are as comprehensive as Nerfstudio in delivering our three design goals: modularity, real-time visualization, and end-toend usability for user-captured data.Furthermore, Nerfstudio is released under an Apache2 license, which allows for its use by both researchers and companies.

FRAMEWORK DESIGN
The goals of Nerfstudio are to provide (1) modularity, (2) real-time visualization for development, and (3) ease of use with real data.In designing the framework, we consider trade-offs against designs that optimize for faster rendering or higher quality results on synthetic scenes.For instance, we prefer an implementation that allows for a modularized pythonic non-CUDA method (i.e., where CUDA functionality is exposed via a PyTorch API) over one that supports a faster, non-modularized CUDA method (where CUDA code is written directly).Additionally, our design choices lead to simpler interfacing with an extensive visualization ecosystem which supports real-time rendering during test and train with custom camera paths.Finally, we focus on delivering results for real-world data rather than synthetic scenes to address audiences outside research including those in industry and non-technical users.With these three goals, the design of Nerfstudio promotes collaborations by providing a consolidated platform on which people can request for or contribute to new features.The long-term goal is for Nerfstudio to continue improving through community-driven contributions.

Modularity
We propose an organization of components that is both intuitive and abstract, enabling the implementation of existing and novel NeRFs by swapping reusable components.Fig. 1 shows a subset of the components types and implementations we currently have available in Nerfstudio.

Visualization for development
The Nerfstudio real-time viewer offers an interactive and intuitive way to visualize Neural Radiance Fields (NeRFs) during both training and testing phases.To ensure ease of use, the visualizer is simple to install, works seamlessly across both local and remote GPU compute environments, supports different models, and offers a user interface for creating and rendering custom camera paths, shown in Fig. 6 (a).
Our real-time visualization interface is particularly useful for qualitatively evaluating a model, allowing for more informed decisions during method development.While metrics such as PSNR can provide some insight, they do not offer a comprehensive understanding of performance-especially for views that are far away from the capture trajectory.Qualitative evaluation with an interactive viewer addresses these limitations and allows developers to gain a more holistic understanding of the model performance.

Easy workflow for user-captured data
While we offer support for synthetic datasets (Blender [Mildenhall et al. 2021], D-NeRF [Pumarola et al. 2021]), in Nerfstudio we focus primarily on "real world data" -images or videos from a physical phone or camera.To this end, we present a new Nerfstudio Dataset (shown in Fig. 3) composed of real-world scenes casually captured with mobile phones and a mirrorless camera.Our motivation is to provide a framework compatible with a diverse array of applications which requires supporting real data.For instance, a few use cases for Nerfstudio outside of research include VFX, gaming, and nontechnical film-makers who create 3D and video art.To support this wide range of expertise in NeRFs, we ensure our codebase is easily installable and deployable.

CORE COMPONENTS
The proposed framework of Nerfstudio, illustrated in Fig. 2, is based on the conceptual grouping of NeRF methods into a series of basic building blocks.Nerfstudio takes a set of posed images and optimizes for a 3D representation of the scene, which is defined by radiance (color), density (structure), and possibly other quantities (semantics, normals, features, etc.).We ingest these inputs into the framework which comprises of a DataManager and a Model, where the DataManager is responsible for (1) parsing image formats via a DataParser and (2) generating rays as RayBundles.These rays are then passed into a Model, which will query Fields and render quantities.Finally, the whole Pipeline is supervised end-to-end with a loss.

DataManagers and DataParsers
The first step of the Pipeline is the DataManager which is responsible for turning posed images into RayBundles, which are slices of 3D space that start at a camera origin.Within the DataManager, the DataParser first loads the input images and camera data.The DataParser is designed to be compatible with arbitrary data formats such as COLMAP.Previous research codebases primarily utilize COLMAP with helper scripts [Müller et al. 2022], however, COLMAP can be challenging to install and use for non-technical users.To make the framework more accessible to a wider range of users, including scientists, artists, photographers, hobbyists and journalists, we have implemented DataParsers for mobile apps (Record3D, Polycam, KIRI Engine) and 3D tools such as Metashape and Reality Capture.Once the images are properly loaded and formatted, the DataManager iterates through the data, generating RayBundles and ground truth supervision.It can also optimize camera poses during training.

RayBundles, RaySamples, and Frustums
NeRFs operate on regions of 3D space, which can be parametrized in many different ways.We have adopted a more generic representation of 3D space through the use of Frustum for both point-based and volume-based samples.The RayBundles, which are primitives that represent a slice through 3D space, are parameterized with an origin, direction, and other meta-information such as camera indices and time.By specifying the interval bin spacing, the Ray-Bundles generate RaySamples, which represent sampled chunks of 3D space along each ray.These chunks, represented as Frustums, can be encoded either as point samples [Mildenhall et al. 2021] or as Gaussians with mean and covariance [Barron et al. 2021], which have been shown to help with anti-aliasing.This abstraction allows for flexibility in representation, as the user can decide which representation to use with a simple function call.A visualization of this abstraction can be found in Fig. 4.

Models and Fields
The RayBundles are sent to Models as input, which samples them into RaySamples.The RaySamples are consumed by Fields to turn regions of space (i.e., Frustums) into quantities such as color or density.The Nerfstudio framework contains various implementations of models and fields.We've implemented various feature encoding schemes including fourier features, hash encodings [Müller et al. 2022], spherical harmonics, and matrix decompositions [Chen et al. 2022].Field components include fused MLPs, voxel grids, and surface normal MLPs [Verbin et al. 2022], activation functions, spatial distortions [Barron et al. 2022], and temporal distortions [Pumarola et al. 2021].

Real-time Web Viewer
We draw inspiration from the real-time viewer presented in Instant NGP [Müller et al. 2022], which facilitates real-time rendering during training.However, the viewer in Instant NGP is designed to work on local compute, which can be cumbersome to setup in remote settings.To address this issue, we have developed a ReactJSbased web viewer packaged as a publicly hosted website.
The viewer is designed to be accessible to a wide range of users, including those utilizing both local and remote GPUs.The process of utilizing remote compute is streamlined, requiring only the forwarding of a port locally via SSH.Once training begins, the web interface renders the NeRF in real-time as training progresses (See Fig. 10).Users can pan, zoom and rotate around the scene as the optimization runs or while evaluating a trained model.The design of the viewer is illustrated in Fig. 5.

Nerfstudio Code
Hosted Web Viewer  When the camera moves quickly, the rendering resolution will be smaller to maintain a frame rate and prevent lag in the user experience.We can also reduce the time spent on training and allocate more resources for rendering in the viewer.Some of the features of our viewer include: • Switching between various model outputs (e.g., rgb, depth, normals, semantics).• Creating custom camera paths composed of keyframes with position and focal length interpolation (Fig. 6).• Visualizing the captured training images in 3D.
• Crop and export options for point clouds and meshes.
• Mouse and keyboard controls to easily navigate in the scene.
The viewer played an instrumental role in providing qualitative assessments that informed design choices in our default method Nerfacto.Other codebases have integrated our viewer into their own codebases, including ArcNerf [Yue Luo 2022] and SDFStudio [Yu et al. 2022].

Geometry Export
Many creators and artists have workflows that require exporting to point clouds or meshes for further processing and incorporation in downstream tools such as game engines.Hence, our framework accommodates various export methods and facilitates the easy addition of new export methods.Fig. 6b illustrates our export interface, as well as some of the supported formats, including point clouds, a truncated signed distance function (TSDF) to mesh, and Poisson surface reconstruction [Kazhdan et al. 2006].We apply texture to the mesh by densely sampling the texture image, utilizing barycentric interpolation to determine corresponding 3D point locations, and rendering short rays near the surface along the normals to obtain RGB values.

NERFACTO METHOD
We leverage our modular design to integrate ideas from multiple research papers into our default and recommended method, Nerfacto.This method is heavily influenced by the structure of MipNeRF-360 [Barron et al. 2022

Ray generation and sampling
The Nerfacto method first optimizes camera views using an optimized SE(3) transformation [Lin et al. 2021;Tancik et al. 2022;Wang et al. 2021c].These camera views are then used to generate RayBundles.To improve the efficiency and effectiveness of the sampling process, we employ a piece-wise sampler.This sampler samples uniformly up to a fixed distance from the camera, followed by samples that are distributed such that the step size increases with each sample.This allows efficient sampling of distant objects while still maintaining a dense set of samples for nearby objects.These samples are then fed into a proposal network sampler, proposed in the MipNeRF-360 method [Barron et al. 2022].The proposal  sampler consolidates the sample locations into regions of the scene that contribute most to the final render, typically the first surface intersection.This importance sampling greatly improves reconstruction quality.Furthermore, we use a small fused MLP with a hash encoding [Müller et al. 2022] for the scene's density function as it has been found to have sufficient accuracy and is computationally efficient.To further reduce the number of samples along rays, the proposal network sampler can contain multiple density fields.These density fields iteratively reduce the number of samples.Empirically, using two density fields works well.In our base Nerfacto configuration, we generate 256 samples from the piece-wise sampler, which gets resampled into 96 samples in the first iteration of the proposal sampler followed by 48 samples in the second.

Scene contraction and NeRF field
Many real-world scenes are unbounded, meaning they could extend indefinitely.This poses a challenge for processing as input samples could have position values that vary across many scales of magnitude.To overcome this issue, we utilize scene contraction, which compresses the infinite space into a fixed-size bounding box.Our method of contraction is based on the one proposed in MipNeRF-360 [Barron et al. 2022], but we use  ∞ norm contraction instead of  2 norm, which contracts to a cube rather than a sphere.The cube better aligns with voxel-based hash encodings.Fig. 8 illustrates how  ∞ contraction maps samples into the range with minimum values of -2,-2,-2 and maximum values of 2,2,2.These samples can then be used with the hash encoding introduced by Instant-NGP and is available via the tiny-cuda-nn [Müller 2021] Python bindings.

No scene contraction contraction contraction
Figure 8: Scene contraction.Here we show cameras contained in an inner sphere with Gaussian samples along rays.Scene contraction warps the unbounded samples into bounded space before querying a NeRF field.We use  ∞ contraction rather than MipNeRF-360's  2 contraction to better accommodate the geometry/capacity of the hash grid.
Nerfacto's field incorporates per-image appearance embeddings to account for differences in exposure among training cameras [Martin-Brualla et al. 2021].Additionally, we use techniques from Ref-NeRF [Verbin et al. 2022] to compute and predict normals.Nerfacto is implemented using PyTorch, which allows for easy customization and eliminates the need for complex and custom CUDA code.We will incorporate new papers into Nerfacto as the field progresses.

NERFSTUDIO DATASET
Our "Nerfstudio Dataset" includes 10 in-the-wild captures obtained using either a mobile phone or a mirror-less camera with a fisheye lens.We processed the data using either COLMAP or the Polycam app to obtain camera poses and intrinsic parameters.Our goal is to provide researchers with more 360 real-world captures that are not limited to forward-facing scenes [Mildenhall et al. 2019].Our dataset is similar to MipNeRF-360 [Barron et al. 2022]   is the "Egypt" capture and (bottom) is the "aspen" capture from the Nerfstudio Dataset.These novel views are far from the training images to get a sense of how well these methods perform qualitatively.We zoom in on crops to highlight differences in the rendered images.
not focus on a central object and includes captures with varying degrees quality.We have used this dataset to select the default settings for our proposed NeRF-based method, Nerfacto, and we encourage other researchers to similarly employ real-world data in the development and evaluation of NeRF methods.

EXPERIMENTS
We benchmark Nerfacto against a state-of-the-art method MipNeRF-360 and emphasize the modularity of our repository by conducting ablation studies.Furthermore, we highlight the limitations of commonly used evaluation metrics such as PSNR, SSIM, and LPIPS when applied to subsampled evaluation images.

Mip-NeRF 360 Dataset Comparison
Here we compare Nerfacto with numbers reported in the MipNeRF-360 [Barron et al. 2022] paper.We evaluate on their 7 publicly available scenes.We train our method for up to 70K iterations (∼30 minutes) on an NVIDIA RTX A5000, but we also report results at 5K iterations (∼2 minutes).
7.1.1Evaluation protocol.The evaluation protocol followed is similar to that of MipNeRF360, but we process their data using our COLMAP pipeline to recover poses.The original images were downsampled by a factor of 4x.We used 7/8 of the images for training and the remaining 1/8 images were evenly spaced and used for evaluation.Note that this protocol does not include camera pose optimization as it is not an option implemented in MipNeRF360.
7.1.2Findings.Table 1 presents the averages of the results across the 7 captures in the MipNeRF-360 dataset.The complete table can be found in the supplementary material.In as little as 5K iterations (∼2 minutes), our Nerfacto method achieves reasonable quality in contrast to MipNeRF-360 which takes several hours on a TPU with 32 cores.Training for up to 70K (∼30 minutes) iterations further improves quality.While Nerfacto falls short of metric results obtained by MipNeRF-360, we prioritize efficiency and general usability over optimizing quantitative metrics on this particular benchmark.Fig. 10 shows qualitative results on the "garden" scene in our viewer after only a few minutes.It is worth emphasizing that our Nerfacto method is optimized for qualitative novel-view quality by using the web viewer, rather than solely relying on common metrics.For further illustration, we refer the reader to the supplementary material where we provide rendered videos from our Nerfacto method.

Nerfacto Component Ablations
Given the modularity of our codebase, we can easily conduct ablation studies on our method Nerfacto, a unified approach that combines important components from various papers to achieve a fast, high-quality method.We experiment with disabling the pose optimization, appearance embeddings, scene contraction, and variations of the proposal networks, and more.The modularity of our codebase allows for easy implementation of these modifications through the use of different flags with the command line interface.

Evaluation protocol.
In our ablation study, we utilize the Nerfstudio Dataset for evaluation.Due to the complexity of the appearance embeddings and pose optimization modules, we adopt a test-time optimization procedure for the evaluation.Specifically, we employ Adam optimizers to optimize the evaluation camera poses.Once the camera poses are fixed, we randomly select either the left or right side of the evaluation image and optimize the appearance code as done in Martin et al. [Martin-Brualla et al. 2021].Finally, with the optimized camera pose and appearance embedding, we compute PSNR, SSIM, and LPIPS.For these experiments, 10% of the images chosen at equal intervals were used for evaluation.This is similar to MipNeRF-360 [Barron et al. 2022] but instead of 1 in every 8 consecutive frames, we use 1 in every  *  where  is the percentage and  is the number of images in the captured dataset.We use  = 0.1.We will release this evaluation protocol so future work can run similar experiments.2), due to the fact that held-out evaluation images are close to the training images.For instance, disabling the appearance embeddings ("w/o app") leads to an improvement in PSNR and SSIM.However, Fig. 9 illustrates that the "w/o app" method results in the production of blurry "floater" artifacts.These artifacts correspond with the training camera locations because the model overfits to small discrepancies in lighting conditions in the training data by placing these artifacts directly in front of the training cameras.
(bottom row, bottom left crop).Furthermore, ablations such as "1 prop network" result in subtle changes in the metrics but are more evident in visualizations of the novel views.The use of "1 prop network" as opposed to "Nerfacto (default)" with 2 prop networks leads to aliasing artifacts as can be seen around the small tree branches (bottom row, middle crop).While these artifacts are visible to the eye, especially in the interactive viewer, such temporal discontinuity caused by aliasing is not captured by the quantitative metrics.Furthermore, scene contraction is necessary to correctly recover far objects (top row, right crop).Our conclusions are even more evident in video renderings where the camera moves compared to still image renderings.Overall, the real-time viewer proves to be useful for viewing out-of-distribution renders.The crops in Fig. 9 aid in illustrating where certain methods excel over others, regardless of the metrics on the evaluation images.Developing more appropriate evaluation metrics is an important avenue for future research.

OPEN-SOURCE CONTRIBUTIONS
One of the key strengths of our proposed framework is its versatility and ease of use, as demonstrated by our open-source contributions.Our GitHub repository has grown to include over 60 contributors and over 3K stars, reflecting a strong and active community.Additionally, two new libraries, SDFStudio [Yu et al. 2022] and Arc-Nerf [Yue Luo 2022], have been built on top of our framework.Since the release of Nerfstudio in October 2022, our contributors have enhanced and expanded Nerfstudio by addressing various GitHub issues and feature requests including improved camera paths, colab support, additional camera models, reconstruction of dynamic objects.In the future, we plan on supporting 3D generative pipelines, NeRF compositing, and more.

CONCLUSION AND FUTURE WORK
We draw upon existing techniques and propose a framework that supports a more modularized approach to NeRF development, allows for real-time visualization, and is readily usable with realworld data.We emphasize the importance of utilizing the interactive real-time viewer during training to compensate for imperfect quantitative metrics in model design decisions.We hope the consolidation brought about by this new framework will facilitate the development of NeRF-based methods, thereby accelerating advances in the neural rendering community.Future research directions include the development of more appropriate evaluation metrics and integration of the framework with other fields such as computer vision, computer graphics, and machine learning.

Figure 4 :
Figure 4: Sample representations.(Top) We define a frustum as a cone with a start and end.This region of space can be converted into Gaussians (bottom left) or point samples (bottom right) depending on the field input format.

Figure 5 :
Figure 5: Web viewer design.A machine with a GPU (left) starts a NeRF training session.When a user navigates to to the hosted web viewer (right), the viewer client will establish WebSocket and WebRTC connections with the training session.4.4.1 Implementation.Real-time training visualization utilizes Web-Sockets and WebRTC to establish a connection between the NeRF training session and the web client.This approach eliminates the need to install local screens and other GUI software.Upon opening the web viewer, a WebSocket connection is established with the training session, which subsequently populates the scene with training images as illustrated in Fig. 5 (right).The web viewer continuously streams the viewport camera pose to the training session during the training process.The training session utilizes this camera pose to render images and transmits them via a WebRTC video stream.Additionally, the viewer camera controls and UI are implemented using ThreeJS, allowing us to overlay 3D assets such as images, splines, and cropping boxes in front of the NeRF renderings.For instance, the viewer displays training images at their capture locations, letting users intuitively compare performance at seen and novel viewpoints.4.4.2 Viewer features.Our viewer is compatible with different models of varying rendering speeds.We accomplish this by balancing the computation of training and viewer rendering on a single ], but certain parts of the original design are replaced to improve performance.We reference papers such as NeRF--[Wang et al. 2021c], Instant-NGP [Müller et al. 2022], NeRF-W [Martin-Brualla et al. 2021], and Ref-NeRF [Verbin et al. 2022] in Nerfacto.Fig. 7 illustrates how these papers are used.

Figure 6 :
Figure 6: Exporting videos and geometry.We make exporting videos (a) and geometry (b) easy with real-data captures.The left side shows the interactive camera trajectory editor, which allows animatable poses, FOVs, and speed, to eventually render videos of NeRF's outputs.On the right we show the cropping interface in the viewer and resulting export formats including point clouds, TSDFs, and textured meshes.

Figure 7 :
Figure 7: Nerfacto method.Diagram of the Nerfacto method.It combines features from many papers (bottom left).The method will evolve over time as new papers and features are added to the Nerfstudio codebase.

Figure 9 :
Figure 9: Nerfstudio ablation qualitative examples.Here we show renderings from different Nerfacto ablation variants.(Top)is the "Egypt" capture and (bottom) is the "aspen" capture from the Nerfstudio Dataset.These novel views are far from the training images to get a sense of how well these methods perform qualitatively.We zoom in on crops to highlight differences in the rendered images.

Figure 10 :
Figure 10: Real-time viewer use.(a) Training Nerfacto on the MipNeRF-360 garden scene.Good quality can be achieved after a few minutes.Pausing the training increases the rendered resolution.(b) Visualizing different model outputs with the viewer.(c) Viewer controls and settings available in the viewer.
provide reusable components for 3D computer vision tasks.Other examples of frameworks include Mitsuba3 [Jakob et al. 2022], Halide [Ragan-Kelley et al. 2013], but does

Table 2 :
Average metrics for ablations on the Nerfstudio Dataset.We remove and change various components of the Nerfacto method and report { PSNR, SSIM, LPIPS } on the Nerfstudio Dataset.Further details on the experiments can be found in the supplementary material.Table2presents the average results of our ablation studies.The complete table for all 10 scenes can be found in the supplementary material.This study highlights the challenge in extracting meaningful insights from quantitative metrics alone (Table

ACKNOWLEDGMENTS
This project was supported in part by the Bakar Fellows Program and the BAIR/BDD sponsors.We want to thank the many opensource contributors who have helped create Nerfstudio, including Cyrus Vachha and Rohan Mathur from UC Berkeley and the follow-

Table 4 :
Ablations of Nerfacto method on Nerfstudio Dataset.We remove and change various components of the Nerfacto method and report { PSNR, SSIM, LPIPS } on the Nerfstudio Dataset."synthetic on real" is our synthetic settings adjusted to work on a real-world capture (i.e., a bounded scene, no appearance embeddings, and no scene contraction).For the "1 prop network" experiment, we use only the first proposal network from Nerfacto and only do one iteration of the proposal sampling instead of two.