Abstract
In order to achieve the highest possible performance, the ray traversal and intersection routines at the core of every high-performance ray tracer are usually hand-coded, heavily optimized, and implemented separately for each hardware platform—even though they share most of their algorithmic core. The results are implementations that heavily mix algorithmic aspects with hardware and implementation details, making the code non-portable and difficult to change and maintain.
In this paper, we present a new approach that offers the ability to define in a functional language a set of conceptual, high-level language abstractions that are optimized away by a special compiler in order to maximize performance. Using this abstraction mechanism we separate a generic ray traversal and intersection algorithm from its low-level aspects that are specific to the target hardware. We demonstrate that our code is not only significantly more flexible, simpler to write, and more concise but also that the compiled results perform as well as state-of-the-art implementations on any of the tested CPU and GPU platforms.
- Timo Aila and Samuli Laine. 2009. Understanding the Efficiency of Ray Traversal on GPUs. In Proceedings of the Conference on HighPerformance Graphics (HPG). ACM, 145–149. Google Scholar
Digital Library
- Timo Aila, Samuli Laine, and Tero Karras. 2012. Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum. Technical Report NVR-2012-002. NVIDIA Technical Report.Google Scholar
- L.O Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. Københavns Universitet. Datalogisk Institut.Google Scholar
- P.H. Andersen. 1995. Partial Evaluation Applied to Ray Tracing. DIKU Research Report 95/2.Google Scholar
- Rasmus Barringer and Tomas Akenine-Möller. 2014. Dynamic Ray Stream Traversal. ACM Trans. Graph. 33, 4, Article 151 (2014), 9 pages. Google Scholar
Digital Library
- Carsten Benthin and Ingo Wald. 2009. Efficient Ray Traced Soft Shadows using Multi-Frusta Tracing. In High-Performance Graphics. Google Scholar
Digital Library
- Carsten Benthin, Ingo Wald, Sven Woop, Manfred Ernst, and William R. Mark. 2012. Combining Single and Packet-Ray Tracing for Arbitrary Ray Distributions on the Intel MIC Architecture. IEEE Transactions on Visualization and Computer Graphics 18, 9 (2012), 1438–1448. Google Scholar
Digital Library
- Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In 12th International Conference on Parallel Architectures and Compilation Techniques (PACT). 89–100. Google Scholar
Digital Library
- Jacques Carette, Oleg Kiselyov, and Chung-chieh Shan. 2009. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. J. Funct. Program. 19, 5 (2009), 509–543. Google Scholar
Digital Library
- Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. 2010. Language virtualization for heterogeneous parallel computing. In Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 835– 847. Google Scholar
Digital Library
- Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A Parallel DSL for Image Analysis and Visualization. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 111–120. Google Scholar
Digital Library
- Robert L. Cook, Thomas Porter, and Loren Carpenter. 1984. Distributed Ray Tracing. SIGGRAPH Comput. Graph. 18, 3 (1984), 137–145. Google Scholar
Digital Library
- Holger Dammertz, Johannes Hanika, and Alexander Keller. 2008. Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays. In Proceedings of the Nineteenth Eurographics Conference on Rendering. Eurographics Association, 1225–1233. Google Scholar
Digital Library
- Zach DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: a domain specific language for building portable mesh-based PDE solvers. In Conference on High Performance Computing Networking, Storage and Analysis (SC). 9:1–9:12. Google Scholar
Digital Library
- Tim Foley and Jeremy Sugerman. 2005. KD-tree Acceleration Structures for a GPU Raytracer. In Proceedings of the ACM SIGGRAPH/EU-ROGRAPHICS Conference on Graphics Hardware. ACM, 15–22. Google Scholar
Digital Library
- Iliyan Georgiev and Philipp Slusallek. 2008. RTfact: Generic Concepts for Flexible and High Performance Ray Tracing. In IEEE Symposium on Interactive Ray Tracing (RT). 115–122. Google Scholar
Cross Ref
- Johannes Gunther, Stefan Popov, Hans-Peter Seidel, and Philipp Slusallek. 2007. Realtime Ray Tracing on GPU with BVH-based Packet Traversal. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing. 113–118. Google Scholar
Digital Library
- Maurice H. Halstead. 1977. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc.Google Scholar
- Christian Hofer, Klaus Ostermann, Tillmann Rendel, and Adriaan Moors. 2008. Polymorphic embedding of DSLs. In Proceedings of the 7th International Conference on Generative Programming and Component Engineering (GPCE). 137–148. Google Scholar
Digital Library
- Daniel Reiter Horn, Jeremy Sugerman, Mike Houston, and Pat Hanrahan. 2007. Interactive K-d Tree GPU Raytracing. In Proceedings of the Symposium on Interactive 3D Graphics and Games. ACM, 167–174. Google Scholar
Digital Library
- P. Hudak. 1998. Modular Domain Specific Languages and Tools. In Proceedings of the 5th International Conference on Software Reuse (ICSR). IEEE Computer Society, 134–. http://dl.acm.org/citation.cfm? id=551789.853532Google Scholar
Cross Ref
- Timothy L. Kay and James T. Kajiya. 1986. Ray Tracing Complex Scenes. SIGGRAPH Comput. Graph. 20, 4 (1986), 269–278. Google Scholar
Digital Library
- Roland Leißa, Klaas Boesche, Sebastian Hack, Richard Membarth, and Philipp Slusallek. 2015. Shallow Embedding of DSLs via Online Partial Evaluation. In Proceedings of the 14th International Conference on Generative Programming: Concepts & Experiences (GPCE). ACM, 11–20. Google Scholar
Digital Library
- Roland Leißa, Marcel Köster, and Sebastian Hack. 2015. A GraphBased Higher-Order Intermediate Representation. In International Symposium on Code Generation and Optimization (CGO). IEEE, 202–212. Google Scholar
Cross Ref
- Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. HIPA cc : A Domain-Specific Language and Compiler for Image Processing. IEEE Trans. Parallel Distrib. Syst. 27, 1 (2016), 210–224. Google Scholar
Digital Library
- Tomas Möller and Ben Trumbore. 1997. Fast, Minimum Storage RayTriangle Intersection. J. Graphics, GPU, & Game Tools 2, 1 (1997). Google Scholar
Digital Library
- Chris J. Newburn, Byoungro So, Zhenying Liu, Michael D. McCool, Anwar M. Ghuloum, Stefanus Du Toit, Zhi-Gang Wang, Zhaohui Du, Yongjian Chen, Gansha Wu, Peng Guo, Zhanglin Liu, and Dan Zhang. 2011. Intel’s Array Building Blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the 9th International Symposium on Code Generation and Optimization (CGO). 224–235. Google Scholar
Cross Ref
- NVIDIA. 2014. Whitepaper: NVIDIA GeForce GTX 980. Technical Report. NVIDIA Corporation.Google Scholar
- Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. 2013. Spiral in Scala: Towards the Systematic Construction of Generators for Performance Libraries. In International Conference on Generative Programming: Concepts & Experiences (GPCE). 125–134. Google Scholar
Digital Library
- Steven Parker, William Martin, Peter-Pike J. Sloan, Peter Shirley, Brian Smits, and Charles Hansen. 1999. Interactive Ray Tracing. In Proceedings of the Symposium on Interactive 3D Graphics. ACM, 119–126. Google Scholar
Digital Library
- Steven G. Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock, David Luebke, David McAllister, Morgan McGuire, Keith Morley, Austin Robison, and Martin Stich. 2010. OptiX: A General Purpose Ray Tracing Engine. ACM Transactions on Graphics (2010). Google Scholar
Digital Library
- M. Pharr and W. R. Mark. 2012. ispc: A SPMD Compiler for HighPerformance CPU Programming. In In Proceedings of Innovative Parallel Computing (InPar). Google Scholar
Cross Ref
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 519–530. Google Scholar
Digital Library
- Alexander Reshetov, Alexei Soupikov, and Jim Hurley. 2005. Multi-level Ray Tracing Algorithm. ACM Trans. Graph. 24, 3 (2005), 1176–1185. Google Scholar
Digital Library
- Tiark Rompf and Martin Odersky. 2010. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the 10th International Conference on Generative Programming and Component Engineering (GPCE). 127–136. Google Scholar
Digital Library
- Kai Selgrad, Alexander Lier, Franz Köferl, Marc Stamminger, and Daniel Lohmann. 2015. Lightweight, Generative Variant Exploration for High-performance Graphics Applications. In Proceedings of the 14th International Conference on Generative Programming: Concepts & Experiences (GPCE). ACM, 141–150. Google Scholar
Digital Library
- Martin Stich, Heiko Friedrich, and Andreas Dietrich. 2009. Spatial Splits in Bounding Volume Hierarchies. In Proceedings of the Conference on High-Performance Graphics (HPG). ACM, 7–13. Google Scholar
Digital Library
- Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. In Proceedings of the 28th International Conference on Machine Learning (ICML). 609–616.Google Scholar
- John A. Tsakok. 2009. Faster Incoherent Rays: Multi-BVH Ray Stream Tracing. In Proceedings of the Conference on High-Performance Graphics (HPG). ACM, 151–158. Google Scholar
Digital Library
- Ingo Wald. 2005. The OpenRT-API. In ACM SIGGRAPH Courses. ACM, Article 21. Google Scholar
Digital Library
- Ingo Wald, Carsten Benthin, and Solomon Boulos. 2008. Getting rid of packets: Efficient SIMD single-ray traversal using multibranching BVHs. In IEEE/Eurographics Symposium on Interactive Ray Tracing. 49–57. Google Scholar
Cross Ref
- Ingo Wald, Philipp Slusallek, Carsten Benthin, and Markus Wagner. 2001. Interactive Rendering with Coherent Ray Tracing. Computer Graphics Forum (2001). Google Scholar
Digital Library
- Ingo Wald, Sven Woop, Carsten Benthin, Gregory S. Johnson, and Manfred Ernst. 2014. Embree: A Kernel Framework for Efficient CPU Ray Tracing. ACM Trans. Graph. 33, 4, Article 143 (2014), 8 pages. Google Scholar
Digital Library
- Sven Woop. 2004. A Ray Tracing Hardware Architecture for Dynamic Scenes. Technical Report. Saarland University.Google Scholar
Index Terms
RaTrace: simple and efficient abstractions for BVH ray traversal algorithms
Recommendations
RaTrace: simple and efficient abstractions for BVH ray traversal algorithms
GPCE 2017: Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesIn order to achieve the highest possible performance, the ray traversal and intersection routines at the core of every high-performance ray tracer are usually hand-coded, heavily optimized, and implemented separately for each hardware platform—even ...
Stackless Multi-BVH Traversal for CPU, MIC and GPU Ray Tracing
Stackless traversal algorithms for ray tracing acceleration structures require significantly less storage per ray than ordinary stack-based ones. This advantage is important for massively parallel rendering methods, where there are many rays in flight. ...
CPU-style SIMD ray traversal on GPUs
HPG '18: Proceedings of the Conference on High-Performance GraphicsIn this paper we describe and evaluate an implementation of CPU-style SIMD ray traversal on the GPU. We show how spreading moderately wide BVHs (up to a branching factor of eight) across multiple threads in a warp can improve performance while not ...







Comments