Abstract
Ray-tracing algorithms are known for producing highly realistic images, but at a significant computational cost. For this reason, a large body of research exists on various techniques for accelerating these costly algorithms. One approach to achieving superior performance which has received comparatively little attention is the design of specialised ray-tracing hardware. The research that does exist on this topic has consistently demonstrated that significant performance and efficiency gains can be achieved with dedicated microarchitectures. However, previous work on hardware ray-tracing has focused almost entirely on the traversal and intersection aspects of the pipeline. As a result, the critical aspect of the management and construction of acceleration data-structures remains largely absent from the hardware literature.
We propose that a specialised microarchitecture for this purpose could achieve considerable performance and efficiency improvements over programmable platforms. To this end, we have developed the first dedicated microarchitecture for the construction of binned SAH BVHs. Cycle-accurate simulations show that our design achieves significant improvements in raw performance and in the bandwidth required for construction, as well as large efficiency gains in terms of performance per clock and die area compared to manycore implementations. We conclude that such a design would be useful in the context of a heterogeneous graphics processor, and may help future graphics processor designs to reduce predicted technology-imposed utilisation limits.
Supplemental Material
- Caustic Graphics, 2012. Caustic Graphics Company Website. https://caustic.com/. {Online; accessed 15-November-2012}.Google Scholar
- Chung, E. S., Milder, P. A., Hoe, J. C., and Mai, K. 2010. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In MICRO-43: Proceedings of the 43th Annual IEEE/ACM International Symposium on Microarchitecture. Google Scholar
Digital Library
- Dally, B. 2009. Power efficient supercomputing (presentation). In Accelerator-based Computing and Manycore Workshop.Google Scholar
- Dally, B. 2011. Power, programmability, and granularity: The challenges of exascale computing (keynote presentation). In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. Google Scholar
Digital Library
- Doyle, M. J., Fowler, C., and Manzke, M. 2012. Hardware accelerated construction of sah-based bounding volume hierarchies for interactive ray tracing. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D '12, 209--209. Google Scholar
Digital Library
- Ernst, M., 2012. Embree: Photo-realistic ray tracing kernels. http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels. {Online; accessed 29-March-2013}.Google Scholar
- Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., and Burger, D. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th annual international symposium on Computer architecture, ISCA '11, 365--376. Google Scholar
Digital Library
- Fabianowski, B., and Dingliana, J. 2009. Interactive global photon mapping. Computer Graphics Forum 28, 4, 1151--1159. Google Scholar
Digital Library
- Garanzha, K., Pantaleoni, J., and McAllister, D. 2011. Simpler and faster HLBVH with work queues. In Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics, HPG '11, 59--64. Google Scholar
Digital Library
- Hall, D. 2001. The AR350: Today's ray trace rendering processor. In Proceedings of the EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware - Hot 3D Session.Google Scholar
- Johnsson, B., Ganestam, P., Doggett, M., and Akenine-Möller, T. 2012. Power efficiency for software algorithms running on graphics processors. In Proceedings of the Fourth ACM SIGGRAPH/Eurographics conference on High-Performance Graphics, EGGH-HPG'12, 67--75. Google Scholar
Digital Library
- Karras, T. 2012. Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. In High Performance Graphics, 33--37. Google Scholar
Digital Library
- Kim, H.-Y., Kim, Y.-J., and Kim, L.-S. 2012. MRTP: Mobile ray tracing processor with reconfigurable stream multiprocessors for high datapath utilization. Solid-State Circuits, IEEE Journal of 47, 2 (feb.), 518--535.Google Scholar
- Kopta, D., Ize, T., Spjut, J., Brunvand, E., Davis, A., and Kensler, A. 2012. Fast, effective BVH updates for animated scenes. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D '12, 197--204. Google Scholar
Digital Library
- Lauterbach, C., Yoon, S.-E., Tuft, D., and Manocha, D. 2006. RT-DEFORM: Interactive ray tracing of dynamic scenes using BVHs. In IEEE Symposium on Interactive Ray Tracing 2006, 39--46.Google Scholar
Cross Ref
- Lauterbach, C., Garland, M., Sengupta, S., Luebke, D., and Manocha, D. 2009. Fast BVH construction on GPUs. Comput. Graph. Forum 28, 2, 375--384.Google Scholar
Cross Ref
- Lee, W.-J., Lee, S.-H., Nah, J.-H., Kim, J.-W., Shin, Y., Lee, J., and Jung, S.-Y. 2012. SGRT: a scalable mobile GPU architecture based on ray tracing. In ACM SIGGRAPH 2012 Posters, SIGGRAPH '12, 44:1--44:1. Google Scholar
Digital Library
- Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In IEEE/ACM International Symposium on Microarchitecture, 3--14. Google Scholar
Digital Library
- Nah, J.-H., Park, J.-S., Park, C., Kim, J.-W., Jung, Y.-H., Park, W.-C., and Han, T.-D. 2011. T&I engine: traversal and intersection engine for hardware accelerated ray tracing. ACM Trans. Graph. 30, 6 (Dec.), 160:1--160:10. Google Scholar
Digital Library
- NVIDIA. 2010. NVIDIA GeForce GTX 480/470/465 GPU datasheet. NVIDIA Datasheet.Google Scholar
- Pantaleoni, J., and Luebke, D. 2010. HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry. In Proceedings of the Conference on High Performance Graphics, HPG '10, 87--95. Google Scholar
Digital Library
- Parker, S. G., Bigler, J., Dietrich, A., Friedrich, H., Hoberock, J., Luebke, D., McAllister, D., McGuire, M., Morley, K., Robison, A., and Stich, M. 2010. Optix: a general purpose ray tracing engine. ACM Trans. Graph. 29, 4 (July), 66:1--66:13. Google Scholar
Digital Library
- Schmittler, J., Woop, S., Wagner, D., Paul, W. J., and Slusallek, P. 2004. Realtime ray tracing of dynamic scenes on an FPGA chip. In Proceedings of Graphics Hardware, 95--106. Google Scholar
Digital Library
- Sopin, D., Bogolepov, D., and Ulyanov, D. 2011. Real-time SAH BVH construction for ray tracing dynamic scenes. In Proceedings of the 21th International Conference on Computer Graphics and Vision (GraphiCon), 2011.Google Scholar
- Spjut, J., Kensler, A., Kopta, D., and Brunvand, E. 2009. TRaX: a multicore hardware architecture for real-time ray tracing. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 12 (Dec.), 1802--1815. Google Scholar
Digital Library
- Stich, M., Friedrich, H., and Dietrich, A. 2009. Spatial splits in bounding volume hierarchies. In Proceedings of the Conference on High Performance Graphics 2009, HPG '09, 7--13. Google Scholar
Digital Library
- Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., and Taylor, M. B. 2010. Conservation cores: reducing the energy of mature computations. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, ASPLOS '10, 205--218. Google Scholar
Digital Library
- Wald, I. 2007. On fast construction of SAH-based bounding volume hierarchies. In Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing, 33--40. Google Scholar
Digital Library
- Wald, I. 2012. Fast construction of SAH BVHs on the Intel Many Integrated Core (MIC) architecture. Visualization and Computer Graphics, IEEE Transactions on 18, 1 (jan.), 47--57. Google Scholar
Digital Library
- Wittenbrink, C., Kilgariff, E., and Prabhu, A. 2011. Fermi GF100 GPU architecture. IEEE Micro 31, 5059. Google Scholar
Digital Library
- Woop, S., Schmittler, J., and Slusallek, P. 2005. RPU: a programmable ray processing unit for realtime ray tracing. ACM Trans. Graph. 24, 3 (July), 434--444. Google Scholar
Digital Library
- Woop, S., Marmitt, G., and Slusallek, P. 2006. B-kd trees for hardware accelerated ray tracing of dynamic scenes. In Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 67--77. Google Scholar
Digital Library
Index Terms
A hardware unit for fast SAH-optimised BVH construction
Recommendations
MergeTree: A Fast Hardware HLBVH Constructor for Animated Ray Tracing
Ray tracing is a computationally intensive rendering technique traditionally used in offline high-quality rendering. Powerful hardware accelerators have been recently developed that put real-time ray tracing even in the reach of mobile devices. However, ...
Grid-based SAH BVH construction on a GPU
CGI'2011 ConferenceWe present an efficient algorithm for building an adaptive bounding volume hierarchy (BVH) in linear time on commodity graphics hardware using CUDA. BVHs are widely used as an acceleration data structure to quickly ray trace animated polygonal scenes. ...
MergeTree: a HLBVH constructor for mobile systems
SA '15: SIGGRAPH Asia 2015 Technical BriefsPowerful hardware accelerators have been recently developed that put interactive ray-tracing even in the reach of mobile devices. However, supplying the rendering unit with up-to date acceleration trees remains difficult, so the rendered scenes are ...





Comments