Abstract
Bounding volume hierarchies (BVH) are the most widely used acceleration structures for ray tracing due to their high construction and traversal performance. However, the bounding planes shared between parent and children bounding boxes is an inherent storage redundancy that limits further improvement in performance due to the memory cost of reading these redundant planes. Dual-split trees can create identical space partitioning as BVHs, but in a compact form using less memory by eliminating the redundancies of the BVH structure representation. This reduction in memory storage and data movement translates to faster ray traversal and better energy efficiency. Yet, the performance benefits of dual-split trees are undermined by the processing required to extract the necessary information from their compact representation. This involves bit manipulations and branching instructions which are inefficient in software. We introduce hardware acceleration for dual-split trees and show that the performance advantages over BVHs are emphasized in a hardware ray tracing context that can take advantage of such acceleration. We provide details on how the operations needed for decoding dual-split tree nodes can be implemented in hardware and present experiments in a number of scenes with different sizes using path tracing. In our experiments, we have observed up to 31% reduction in render time and 38% energy saving using dual-split trees as compared to binary BVHs representing identical space partitioning.
Supplemental Material
- Rajeev Balasubramonian, Andrew B Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization (TACO) 14, 2 (2017), 14.Google Scholar
Digital Library
- Carsten Benthin, Ingo Wald, Sven Woop, and Attila T. Áfra. 2018. Compressed-leaf Bounding Volume Hierarchies. In High-Performance Graphics (HPG '18). 1--4.Google Scholar
- Niladrish Chatterjee, Rajeev Balasubramonian, Manjunath Shevgoor, Seth Pugsley, Aniruddha Udipi, Ali Shafiee, Kshitij Sudan, Manu Awasthi, and Zeshan Chishti. 2012. USIMM: the utah simulated memory module. Technical Report. University of Utah.Google Scholar
- Holger Dammertz, Johannes Hanika, and Alexander Keller. 2008. Shallow bounding volume hierarchies for fast SIMD ray tracing of incoherent rays. In Computer Graphics Forum, Vol. 27. Wiley Online Library, 1225--1233.Google Scholar
- Michael J Doyle, Colin Fowler, and Michael Manzke. 2013. A hardware unit for fast SAH-optimised BVH construction. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1--10.Google Scholar
Digital Library
- Michael J Doyle, Ciaran Tuohy, and Michael Manzke. 2017. Evaluation of a BVH construction accelerator architecture for high-quality visualization. IEEE Transactions on Multi-Scale Computing Systems (TMSCS) 4, 1 (2017), 83--94.Google Scholar
Cross Ref
- Bartosz Fabianowski and John Dingliana. 2009. Compact BVH storage for ray tracing and photon mapping. In Proceedings of Eurographics Ireland Workshop. 1--8.Google Scholar
- Venkatraman Govindaraju, Peter Djeu, Karthikeyan Sankaralingam, Mary Vernon, and William R. Mark. 2008. Toward A Multicore Architecture for Real-time Ray-tracing. In 41st IEEE/ACM International Symposium on Microarchitecture.Google Scholar
- Christiaan P Gribble and Karthik Ramani. 2008. Coherent ray tracing via stream filtering. In 2008 IEEE Symposium on Interactive Ray Tracing (IRT '08). 59--66.Google Scholar
Cross Ref
- Vlastimil Havran, Robert Herzog, and Hans-Peter Seidel. 2006. On the fast construction of spatial hierarchies for ray tracing. In IEEE Symposium on Interactive Ray Tracing (IRT '06). IEEE, 71--80.Google Scholar
Cross Ref
- Sean Keely. 2014. Reduced Precision for Hardware Ray Tracing in GPUs. In High-Performance Graphics (HPG '14).Google Scholar
Digital Library
- John Kelm, Daniel Johnson, Matthew Johnson, Neal Crago, William Tuohy, Aqeel Mahesri, Steven Lumetta, Matthew Frank, and Sanjay Patel. 2009. Rigel: an architecture and scalable programming interface for a 1000-core accelerator. In International Symposium on Computer Architecture (ISCA '09).Google Scholar
Digital Library
- Hong-Yun Kim, Young-Jun Kim, and Lee-Sup Kim. 2010a. Reconfigurable mobile stream processor for ray tracing. In IEEE Custom Integrated Circuits Conference 2010 (CICC '10).Google Scholar
- Hong-Yun Kim, Young-Jun Kim, and Lee-Sup Kim. 2012. MRTP: Mobile Ray Tracing Processor With Reconfigurable Stream Multi-Processors for High Datapath Utilization. IEEE Journal of Solid-State Circuits (JSSC) 47, 2 (2012), 518--535.Google Scholar
Cross Ref
- Tae-Joon Kim, Bochang Moon, Duksu Kim, and Sung-Eui Yoon. 2010b. RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies. IEEE Transactions on Visualization and Computer Graphics (TVCG) 16 2 (2010).Google Scholar
- Daniel Kopta. 2016. Ray tracing from a data movement perspective. Ph.D. Dissertation. The University of Utah.Google Scholar
- Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2013. An energy and bandwidth efficient ray tracing architecture. In Proceedings of High-Performance Graphics (HPG '13). 121--128.Google Scholar
Digital Library
- Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2015. Memory Considerations for Low Energy Ray Tracing. Computer Graphics Forum 34, 1 (2015), 47--59.Google Scholar
Digital Library
- Daniel Kopta, Josef Spjut, Erik Brunvand, and Alan Davis. 2010. Efficient MIMD architectures for high-performance ray tracing. In IEEE International Conference on Computer Design (ICCD '10).Google Scholar
Cross Ref
- Won-Jong Lee, Youngsam Shin, Jaedon Lee, Jin-Woo Kim, Jae-Ho Nah, Seokyoon Jung, Shihwa Lee, Hyun-Sang Park, and Tack-Don Han. 2013. SGRT: A mobile GPU architecture for real-time ray tracing. In Proceedings of the 5th High-Performance Graphics Conference (HPG '13). 109--119.Google Scholar
Digital Library
- Alexander Lier, Magdalena Martinek, Marc Stamminger, and Kai Selgrad. 2018. A High-Resolution Compression Scheme for Ray Tracing Subdivision Surfaces with Displacement. Proc. ACM Comput. Graph. Interact. Tech. 1, 2 (Aug. 2018), 1--17.Google Scholar
Digital Library
- Gábor Liktor and Karthik Vaidyanathan. 2016. Bandwidth-efficient BVH Layout for Incremental Hardware Traversal. In Proceedings of High Performance Graphics (HPG '16). 51--61.Google Scholar
Digital Library
- Daqi Lin, Konstantin Shkurko, Ian Mallett, and Cem Yuksel. 2019a. Dual-split trees. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '19). 1--9.Google Scholar
Digital Library
- Daqi Lin, Konstantin Shkurko, Ian Mallett, and Cem Yuksel. 2019b. Dual-split trees-supplemental materials. (2019).Google Scholar
- Xingyu Liu, Yangdong Deng, Yufei Ni, and Zonghui Li. 2015. FastTree: A hardware KD-tree construction acceleration engine for real-time ray tracing. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1595--1598.Google Scholar
- Jae-Ho Nah, Hyuck-Joo Kwon, Dong-Seok Kim, Cheol-Ho Jeong, Jinhong Park, Tack-Don Han, Dinesh Manocha, and Woo-Chan Park. 2014. RayCore: A ray-tracing hardware architecture for mobile devices. ACM Transactions on Graphics (TOG) 33, 5 (2014), 1--15.Google Scholar
Digital Library
- Karthik Ramani and Christiaan Gribble. 2009. StreamRay: A Stream Filtering Architecture for Coherent Ray Tracing. In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '09).Google Scholar
Digital Library
- Jörg Schmittler, Ingo Wald, and Philipp Slusallek. 2002. SaarCOR -- A Hardware Architecture for Realtime Ray-Tracing. In EUROGRAPHICS Workshop on Graphics Hardware.Google Scholar
- Jörg Schmittler, Sven Woop, Daniel Wagner, Wolfgang J. Paul, and Philipp Slusallek. 2004. Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip. In Graphics Hardware (GH '04). 95--106.Google Scholar
Digital Library
- Kai Selgrad, Alexander Lier, Magdalena Martinek, Christoph Buchenau, Michael Guthe, Franziska Kranz, Henry Schäfer, and Marc Stamminger. 2016. A Compressed Representation for Ray Tracing Parametric Surfaces. ACM Transactions on Graphics (TOG) 36, 1 (Nov. 2016).Google Scholar
Digital Library
- Konstantin Shkurko, Tim Grant, Erik Brunvand, Daniel Kopta, Josef Spjut, Elena Vasiou, Ian Mallett, and Cem Yuksel. 2018. SimTRaX: Simulation Infrastructure for Exploring Thousands of Cores. In Proceedings of the 2018 on Great Lakes Symposium on VLSI (GLSVLSI). 503--506.Google Scholar
Digital Library
- Konstantin Shkurko, Tim Grant, Daniel Kopta, Ian Mallett, Cem Yuksel, and Erik Brunvand. 2017. Dual Streaming for Hardware-accelerated Ray Tracing. In Proceedings of High Performance Graphics (HPG '17).Google Scholar
Digital Library
- Josef Spjut, Andrew Kensler, Daniel Kopta, and Erik Brunvand. 2009. TRaX: A multicore hardware architecture for real-time ray tracing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 28, 12 (2009).Google Scholar
Digital Library
- Josef Spjut, Daniel Kopta, Solomon Boulos, Spencer Kellis, and Erik Brunvand. 2008. TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing. In IEEE Symposium on Application Specific Processors (SASP).Google Scholar
- Elena Vasiou, Konstantin Shkurko, Erik Brunvand, and Cem Yuksel. 2019. Mach-RT: A Many Chip Architecture for High-Performance Ray Tracing. In High-Performance Graphics (HPG '19). ACM, New York, NY, USA.Google Scholar
- Timo Viitanen, Matias Koskela, Pekka Jääskeläinen, Heikki Kultala, and Jarmo Takala. 2017. MergeTree: A fast hardware HLBVH constructor for animated ray tracing. ACM Transactions on Graphics (TOG) 36, 5 (2017), 1--14.Google Scholar
Digital Library
- Timo Viitanen, Matias Koskela, Pekka Jääskeläinen, Aleksi Tervo, and Jarmo Takala. 2018. PLOCTree: A Fast, High-Quality Hardware BVH Builder. Proc. ACM Comput. Graph. Interact. Tech. 1, 2 (2018), 1--19.Google Scholar
Digital Library
- Carsten Wächter. 2008. Quasi-Monte Carlo light transport simulation by efficient ray tracing. Ph.D. Dissertation. Universität Ulm.Google Scholar
- Carsten Wächter and Alexander Keller. 2006. Instant ray tracing: The bounding interval hierarchy. Rendering Techniques 2006 (2006), 139--149.Google Scholar
Digital Library
- Ingo Wald, Carsten Benthin, and Solomon Boulos. 2008. Getting rid of packets - Efficient SIMD single-ray traversal using multi-branching BVHs. In Symposium on Interactive Ray Tracing (IRT '08). 49--57.Google Scholar
Cross Ref
- Ingo Wald, Sven Woop, Carsten Benthin, Gregory S Johnson, and Manfred Ernst. 2014. Embree: a kernel framework for efficient CPU ray tracing. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1--8.Google Scholar
Digital Library
- Sven Woop, Erik Brunvand, and Philipp Slusallak. 2006a. Estimating Performance of a Ray Tracing ASIC Design. In Interactive Ray Tracing (IRT '06).Google Scholar
Cross Ref
- Sven Woop, Gerd Marmitt, and Philipp Slusallek. 2006b. B-KD trees for hardware accelerated ray tracing of dynamic scenes. In Graphics Hardware (GH '06). 67--77.Google Scholar
Digital Library
- Sven Woop, Jörg Schmittler, and Philipp Slusallek. 2005. RPU: A Programmable Ray Processing Unit for Realtime Ray Tracing. ACM Transactions on Graphics (TOG) 24, 3 (July 2005).Google Scholar
Digital Library
- Henri Ylitie, Tero Karras, and Samuli Laine. 2017. Efficient incoherent ray traversal on GPUs through compressed wide BVHs. In Proceedings of High Performance Graphics (HPG '17). 1--13.Google Scholar
Digital Library
Index Terms
Hardware-Accelerated Dual-Split Trees
Recommendations
Dual-split trees
I3D '19: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and GamesWe introduce the dual-split tree, a new tree-based acceleration structure for ray tracing. Each internal node of a dual-split tree uses two axis-aligned planes to either split the parent node into two child nodes or to mark the empty regions of the ...
Hardware-accelerated global illumination by image space photon mapping
HPG '09: Proceedings of the Conference on High Performance Graphics 2009We describe an extension to photon mapping that recasts the most expensive steps of the algorithm -- the initial and final photon bounces -- as image-space operations amenable to GPU acceleration. This enables global illumination for real-time ...
Hardware-accelerated parallel non-photorealistic volume rendering
NPAR '02: Proceedings of the 2nd international symposium on Non-photorealistic animation and renderingNon-photorealistic rendering can be used to illustrate subtle spatial relationships that might not be visible with more realistic rendering techniques. We present a parallel hardware-accelerated rendering technique, making extensive use of multi-...






Comments