skip to main content
research-article

PLOCTree: A Fast, High-Quality Hardware BVH Builder

Published:24 August 2018Publication History
Skip Abstract Section

Abstract

In the near future, GPUs are expected to have hardware support for real-time ray tracing in order to, e.g., help render complex lighting effects in video games and enable photorealistic augmented reality. One challenge in real-time ray tracing is dynamic scene support, that is, rebuilding or updating the spatial data structures used to accelerate rendering whenever the scene geometry changes. This paper proposes PLOCTree, an accelerator for tree construction based on the Parallel Locally-Ordered Clustering (PLOC) algorithm. Tree construction is highly memory-intensive, thus for the hardware implementation, the algorithm is rewritten into a bandwidth-economical form which converts most of the external memory traffic of the original software-based GPU implementation into streaming on-chip data traffic. As a result, the proposed unit is 3.9 times faster and uses 7.7 times less memory bandwidth than the GPU implementation. Compared to state-of-the-art hardware builders, PLOCTree gives a superior performance-quality tradeoff: it is nearly as fast as a state-of-the-art low-quality linear builder, while producing trees of similar Surface Area Heuristic (SAH) cost as a comparatively expensive binned SAH sweep builder.

References

  1. Timo Aila, Tero Karras, and Samuli Laine. 2013. On quality metrics of bounding volume hierarchies. In Proc. High-Performance Graphics. ACM, 101--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ciprian Apetrei. 2014. Fast and simple agglomerative LBVH construction. In Proc. Computer Graphics and Visual Computing.Google ScholarGoogle Scholar
  3. Jirí Bittner and Daniel Meister. 2015. T-SAH: Animation optimized bounding volume hierarchies. In Computer Graphics Forum, Vol. 34. 527--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jared Casper and Kunle Olukotun. 2014. Hardware acceleration of database operations. In Proc. ACM/SIGDA Int. Symp. Field-programmable gate arrays. 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Karthik Chandrasekar, Christian Weis, Yonghui Li, Benny Akesson, Norbert Wehn, and Kees Goossens. 2012. DRAMPower: Open-source DRAM power 8 energy estimation tool. Retrieved Feb 30, 2017 from http://www.drampower.infoGoogle ScholarGoogle Scholar
  6. Yangdong Deng, Yufei Ni, Zonghui Li, Shuai Mu, and Wenjun Zhang. 2017. Toward real-time ray tracing: A survey on hardware acceleration and microarchitecture techniques. Comput. Surveys 50, 4 (2017), 58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leonardo Domingues and Helio Pedrini. 2015a. Bounding volume hierarchy optimization through agglomerative treelet restructuring. In Proc. High-Performance Graphics. 13--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Leonardo Domingues and Helio Pedrini. 2015b. Bounding volume hierarchy optimization through agglomerative treelet restructuring. In Proc. High-Performance Graphics. 13--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Doyle, Colin Fowler, and Michael Manzke. 2013. A hardware unit for fast SAH-optimized BVH construction. ACM Transactions on Graphics 32, 4 (2013), 139:1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael Doyle, Ciaran Tuohy, and Michael Manzke. 2017. Evaluation of a BVH construction accelerator architecture for high-quality visualization. IEEE Transactions on Multi-Scale Computing Systems (2017).Google ScholarGoogle Scholar
  11. Kirill Garanzha, Jacopo Pantaleoni, and David McAllister. 2011. Simpler and faster HLBVH with work queues. In Proc. High-Performance Graphics. 59--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yan Gu, Yong He, Kayvon Fatahalian, and Guy Blelloch. 2013. Efficient BVH construction via approximate agglomerative clustering. In Proc. High-Performance Graphics. 81--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. ACM SIGARCH Computer Architecture News 38, 3 (2010), 37--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mark Harris. 2016. Inside Pascal: NVIDIA's newest computing platform. Retrieved April 9, 2018 from https://devblogs.nvidia.com/inside-pascal/Google ScholarGoogle Scholar
  15. Thiago Ize, Ingo Wald, and Steven G Parker. 2007. Asynchronous BVH construction for ray tracing dynamic scenes on parallel multi-core architectures. In Proc. Eurographics Conf. Parallel Graphics and Visualization. 101--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tero Karras. 2012. Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. In Proc. High-Performance Graphics. 33--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tero Karras and Timo Aila. 2013. Fast parallel construction of high-quality bounding volume hierarchies. In Proc. High-Performance Graphics. 89--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sean Keely. 2014. Reduced precision hardware for ray tracing. In Proc. High-Performance Graphics. 29--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yoongu Kim, Weikun Yang, and Onur Mutlu. 2015. Ramulator: A fast and extensible DRAM simulator. IEEE Computer Architecture Letters PP, 99 (2015), 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Daniel Kopta, Thiago Ize, Josef Spjut, Erik Brunvand, Al Davis, and Andrew Kensler. 2012. Fast, effective BVH updates for animated scenes. In Proc. ACM SIGGRAPH Symp. Interactive 3D Graphics and Games. 197--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christian Lauterbach, Michael Garland, Shubhabrata Sengupta, David Luebke, and Dinesh Manocha. 2009. Fast BVH construction on GPUs. Computer Graphics Forum 28, 2 (2009), 375--384.Google ScholarGoogle ScholarCross RefCross Ref
  22. Sukhan Lee, Yuhwan Ro, Young Hoon Son, Hyunyoon Cho, Nam Sung Kim, and Jung Ho Ahn. 2017. Understanding power-performance relationship of energy-efficient modern DRAM devices. In Proc. IEEE Int. Symp. Workload Characterization. 110--111.Google ScholarGoogle ScholarCross RefCross Ref
  23. Sheng Li, Ke Chen, Jung Ho Ahn, Jay B Brockman, and Norman P Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proc. IEEE/ACM Int. Conf. Computer-Aided Design. 694--701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xingyu Liu, Yangdong Deng, Yufei Ni, and Zonghui Li. 2015. FastTree: A hardware KD-tree construction acceleration engine for real-time ray tracing. In Proc. Design, Automation 8 Test in Europe Conference 8 Exhibition. 1595--1598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J David MacDonald and Kellogg S Booth. 1990. Heuristics for ray tracing using space subdivision. The Visual Computer 6, 3 (1990), 153--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. James McCombe. 2014. New Techniques Made Possible by PowerVR Ray Tracing Hardware. GDC Talk.Google ScholarGoogle Scholar
  27. Morgan McGuire, Petrik Clarberg, and Nir Benty. 2018. The ray + raster era begins - an R8D roadmap for the game industry. (2018). Game Developers Conference Talk.Google ScholarGoogle Scholar
  28. Daniel Meister and Jiří Bittner. 2018. Parallel locally-ordered clustering for bounding volume hierarchy construction. IEEE Transactions on Visualization and Computer Graphics 24, 3 (2018), 1345--1353.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jae-Ho Nah, Jin-Woo Kim, Junho Park, Won-Jong Lee, Jeong-Soo Park, Seok-Yoon Jung, Woo-Chan Park, Dinesh Manocha, and Tack-Don Han. 2015. HART: A hybrid architecture for ray tracing animated scenes. IEEE Transactions on Visualization and Computer Graphics 21, 3 (2015), 389--401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jae-Ho Nah, Hyuck-Joo Kwon, Dong-Seok Kim, Cheol-Ho Jeong, Jinhong Park, Tack-Don Han, Dinesh Manocha, and Woo-Chan Park. 2014. RayCore: A ray-tracing hardware architecture for mobile devices. ACM Transactions on Graphics 33, 5 (2014), 162:1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jae-Ho Nah, Jeong-Soo Park, Chanmin Park, Jin-Woo Kim, Yun-Hye Jung, Woo-Chan Park, and Tack-Don Han. 2011. T8I engine: Traversal and intersection engine for hardware accelerated raytracing. ACM Transactions on Graphics 30, 6 (Dec. 2011), 160:1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jacopo Pantaleoni and David Luebke. 2010. HLBVH: Hierarchical LBVH construction for real-time ray tracing of dynamic geometry. In Proc. High-Performance Graphics. 87--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Anuj Pathania, Qing Jiao, Alok Prakash, and Tulika Mitra. 2014. Integrated CPU-GPU power management for 3D mobile games. In Proc. Design Automation Conf. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Konstantin Shkurko, Tim Grant, Daniel Kopta, Ian Mallett, Cem Yuksel, and Erik Brunvand. 2017. Dual streaming for hardware-accelerated ray tracing. In Proc. High Performance Graphics. 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Karthik Vaidyanathan, Tomas Akenine-Möller, and Marco Salvi. 2016. Watertight ray traversal with reduced precision. Proc. High-Performance Graphics (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Timo Viitanen, Matias Koskela, Pekka Jääskeläinen, Heikki Kultala, and Jarmo Takala. 2017. MergeTree: A fast hardware HLBVH constructor for animated ray tracing. ACM Transactions on Graphics 36, 5 (2017), 169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Marek Vinkler, Jiri Bittner, and Vlastimil Havran. 2017. Extended Morton codes for high performance bounding volume hierarchy construction. In Proc. High-Performance Graphics. 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ingo Wald. 2007. On fast construction of SAH-based bounding volume hierarchies. In Proc. IEEE Symp. Interactive Ray Tracing. 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ingo Wald, Solomon Boulos, and Peter Shirley. 2007. Ray tracing deformable scenes using dynamic bounding volume hierarchies. ACM Transactions on Graphics 26, 1 (2007), 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Bruce Walter, Kavita Bala, Milind Kulkarni, and Keshav Pingali. 2008. Fast agglomerative clustering for rendering. In Proc. IEEE Symp. Interactive Ray Tracing. 81--86.Google ScholarGoogle ScholarCross RefCross Ref
  41. Sven Woop, Erik Brunvand, and Philipp Slusallek. 2006. Estimating performance of a ray-tracing ASIC design. In Proc. IEEE Symp. Interactive Ray Tracing. 7--14.Google ScholarGoogle ScholarCross RefCross Ref
  42. Sven Woop, Jörg Schmittler, and Philipp Slusallek. 2005. RPU: A programmable ray processing unit for real-time ray tracing. ACM Transactions on Graphics 24, 3 (2005), 434--444. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PLOCTree: A Fast, High-Quality Hardware BVH Builder

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!