skip to main content
research-article

A hardware unit for fast SAH-optimised BVH construction

Published:21 July 2013Publication History
Skip Abstract Section

Abstract

Ray-tracing algorithms are known for producing highly realistic images, but at a significant computational cost. For this reason, a large body of research exists on various techniques for accelerating these costly algorithms. One approach to achieving superior performance which has received comparatively little attention is the design of specialised ray-tracing hardware. The research that does exist on this topic has consistently demonstrated that significant performance and efficiency gains can be achieved with dedicated microarchitectures. However, previous work on hardware ray-tracing has focused almost entirely on the traversal and intersection aspects of the pipeline. As a result, the critical aspect of the management and construction of acceleration data-structures remains largely absent from the hardware literature.

We propose that a specialised microarchitecture for this purpose could achieve considerable performance and efficiency improvements over programmable platforms. To this end, we have developed the first dedicated microarchitecture for the construction of binned SAH BVHs. Cycle-accurate simulations show that our design achieves significant improvements in raw performance and in the bandwidth required for construction, as well as large efficiency gains in terms of performance per clock and die area compared to manycore implementations. We conclude that such a design would be useful in the context of a heterogeneous graphics processor, and may help future graphics processor designs to reduce predicted technology-imposed utilisation limits.

Skip Supplemental Material Section

Supplemental Material

tp175.mp4

References

  1. Caustic Graphics, 2012. Caustic Graphics Company Website. https://caustic.com/. {Online; accessed 15-November-2012}.Google ScholarGoogle Scholar
  2. Chung, E. S., Milder, P. A., Hoe, J. C., and Mai, K. 2010. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In MICRO-43: Proceedings of the 43th Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dally, B. 2009. Power efficient supercomputing (presentation). In Accelerator-based Computing and Manycore Workshop.Google ScholarGoogle Scholar
  4. Dally, B. 2011. Power, programmability, and granularity: The challenges of exascale computing (keynote presentation). In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Doyle, M. J., Fowler, C., and Manzke, M. 2012. Hardware accelerated construction of sah-based bounding volume hierarchies for interactive ray tracing. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D '12, 209--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ernst, M., 2012. Embree: Photo-realistic ray tracing kernels. http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels. {Online; accessed 29-March-2013}.Google ScholarGoogle Scholar
  7. Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., and Burger, D. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th annual international symposium on Computer architecture, ISCA '11, 365--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fabianowski, B., and Dingliana, J. 2009. Interactive global photon mapping. Computer Graphics Forum 28, 4, 1151--1159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Garanzha, K., Pantaleoni, J., and McAllister, D. 2011. Simpler and faster HLBVH with work queues. In Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics, HPG '11, 59--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hall, D. 2001. The AR350: Today's ray trace rendering processor. In Proceedings of the EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware - Hot 3D Session.Google ScholarGoogle Scholar
  11. Johnsson, B., Ganestam, P., Doggett, M., and Akenine-Möller, T. 2012. Power efficiency for software algorithms running on graphics processors. In Proceedings of the Fourth ACM SIGGRAPH/Eurographics conference on High-Performance Graphics, EGGH-HPG'12, 67--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Karras, T. 2012. Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. In High Performance Graphics, 33--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kim, H.-Y., Kim, Y.-J., and Kim, L.-S. 2012. MRTP: Mobile ray tracing processor with reconfigurable stream multiprocessors for high datapath utilization. Solid-State Circuits, IEEE Journal of 47, 2 (feb.), 518--535.Google ScholarGoogle Scholar
  14. Kopta, D., Ize, T., Spjut, J., Brunvand, E., Davis, A., and Kensler, A. 2012. Fast, effective BVH updates for animated scenes. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D '12, 197--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lauterbach, C., Yoon, S.-E., Tuft, D., and Manocha, D. 2006. RT-DEFORM: Interactive ray tracing of dynamic scenes using BVHs. In IEEE Symposium on Interactive Ray Tracing 2006, 39--46.Google ScholarGoogle ScholarCross RefCross Ref
  16. Lauterbach, C., Garland, M., Sengupta, S., Luebke, D., and Manocha, D. 2009. Fast BVH construction on GPUs. Comput. Graph. Forum 28, 2, 375--384.Google ScholarGoogle ScholarCross RefCross Ref
  17. Lee, W.-J., Lee, S.-H., Nah, J.-H., Kim, J.-W., Shin, Y., Lee, J., and Jung, S.-Y. 2012. SGRT: a scalable mobile GPU architecture based on ray tracing. In ACM SIGGRAPH 2012 Posters, SIGGRAPH '12, 44:1--44:1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In IEEE/ACM International Symposium on Microarchitecture, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nah, J.-H., Park, J.-S., Park, C., Kim, J.-W., Jung, Y.-H., Park, W.-C., and Han, T.-D. 2011. T&I engine: traversal and intersection engine for hardware accelerated ray tracing. ACM Trans. Graph. 30, 6 (Dec.), 160:1--160:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. NVIDIA. 2010. NVIDIA GeForce GTX 480/470/465 GPU datasheet. NVIDIA Datasheet.Google ScholarGoogle Scholar
  21. Pantaleoni, J., and Luebke, D. 2010. HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry. In Proceedings of the Conference on High Performance Graphics, HPG '10, 87--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Parker, S. G., Bigler, J., Dietrich, A., Friedrich, H., Hoberock, J., Luebke, D., McAllister, D., McGuire, M., Morley, K., Robison, A., and Stich, M. 2010. Optix: a general purpose ray tracing engine. ACM Trans. Graph. 29, 4 (July), 66:1--66:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Schmittler, J., Woop, S., Wagner, D., Paul, W. J., and Slusallek, P. 2004. Realtime ray tracing of dynamic scenes on an FPGA chip. In Proceedings of Graphics Hardware, 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sopin, D., Bogolepov, D., and Ulyanov, D. 2011. Real-time SAH BVH construction for ray tracing dynamic scenes. In Proceedings of the 21th International Conference on Computer Graphics and Vision (GraphiCon), 2011.Google ScholarGoogle Scholar
  25. Spjut, J., Kensler, A., Kopta, D., and Brunvand, E. 2009. TRaX: a multicore hardware architecture for real-time ray tracing. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 12 (Dec.), 1802--1815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stich, M., Friedrich, H., and Dietrich, A. 2009. Spatial splits in bounding volume hierarchies. In Proceedings of the Conference on High Performance Graphics 2009, HPG '09, 7--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., and Taylor, M. B. 2010. Conservation cores: reducing the energy of mature computations. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, ASPLOS '10, 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wald, I. 2007. On fast construction of SAH-based bounding volume hierarchies. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wald, I. 2012. Fast construction of SAH BVHs on the Intel Many Integrated Core (MIC) architecture. Visualization and Computer Graphics, IEEE Transactions on 18, 1 (jan.), 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wittenbrink, C., Kilgariff, E., and Prabhu, A. 2011. Fermi GF100 GPU architecture. IEEE Micro 31, 5059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Woop, S., Schmittler, J., and Slusallek, P. 2005. RPU: a programmable ray processing unit for realtime ray tracing. ACM Trans. Graph. 24, 3 (July), 434--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Woop, S., Marmitt, G., and Slusallek, P. 2006. B-kd trees for hardware accelerated ray tracing of dynamic scenes. In Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 67--77. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A hardware unit for fast SAH-optimised BVH construction

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Graphics
          ACM Transactions on Graphics  Volume 32, Issue 4
          July 2013
          1215 pages
          ISSN:0730-0301
          EISSN:1557-7368
          DOI:10.1145/2461912
          Issue’s Table of Contents

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 July 2013
          Published in tog Volume 32, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader