ABSTRACT
The wide availability of commodity graphics processors has made real-time graphics an intrinsic component of the human/computer interface. These graphics cores accelerate the z-buffer algorithm and provide a highly interactive experience at a relatively low cost. However, many applications in entertainment, science, and industry require high quality lighting effects such as accurate shadows, reflection, and refraction. These effects can be difficult to achieve with z-buffer algorithms but are straightforward to implement using ray tracing. Although ray tracing is computationally more complex, the algorithm exhibits excellent scaling and parallelism properties. Nevertheless, ray tracing memory access patterns are difficult to predict and the parallelism speedup promise is therefore hard to achieve.
This paper highlights a novel approach to ray tracing based on stream filtering and presents StreamRay, a multicore wide SIMD microarchitecture that delivers interactive frame rates of 15-32 frames/second for scenes of high geometric complexity and exhibits high utilization for SIMD widths ranging from eight to 16 elements. StreamRay consists of two main components: the ray engine, which is responsible for stream assembly and employs address generation units that generate addresses to form large SIMD vectors, and the filter engine, which implements the ray tracing operations with programmable accelerators. Results demonstrate that separating address and data processing reduces data movement and resource contention. Performance improves by 56% while simultaneously providing 11.63% power savings per accelerator core compared to a design which does not use separate resources for address and data computations.
- ATI. ATI products from AMD. http://ati.amd.com/products/index.html.Google Scholar
- C. Benthin. Realtime Ray Tracing on Current CPU Architectures. PhD thesis, Saarland University, 2006.Google Scholar
- S. Boulos, D. Edwards, J. D. Lacewell, J. Kniss, J. Kautz, I. Wald, and P. Shirley. Packet-based Whitted and distribution ray tracing. In Graphics Interface 2007, pages 177--184, May 2007. Google Scholar
Digital Library
- S. Boulos, I. Wald, and P. Shirley. Geometric and arithmetic culling methods for entire ray packets. Technical Report UUCS-06-10, University of Utah, 2006.Google Scholar
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, pages 83--94, 2000. Google Scholar
Digital Library
- D. Burger and T. M. Austin. The simplescalar toolset, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.Google Scholar
- E. E. Catmull. A submdivision algorithm for computer display of curved surfaces. PhD thesis, University of Utah, 1974. Google Scholar
Digital Library
- J. Cleary, B. Wyvill, G. Birtwistle, and R. Vatti. A parallel ray tracing computer. In Proceedings of the Association of Simulat Users Conference, pages 77--80, 1983.Google Scholar
- W. Dally and P. Hanrahan. Merrimac: Supercomputing with Streams. In Supercomputing, 2003. Google Scholar
Digital Library
- M. Ernst and G. Greiner. Multi bounding volume hierarchies. In 2008 IEEE/Eurographics Symposium on Interactive Ray Tracing, pages 35--40.Google Scholar
- C. P. Gribble and K. Ramani. Coherent ray tracing via stream filtering. In 2008 IEEE/Eurographics Symposium on Interactive Ray Tracing, pages 59--66, August 2008.Google Scholar
Cross Ref
- P. Hanrahan. Using caching and breadth-first search to speed up ray--tracing. In Proceedings on Graphics Interface '86, pages 56--61, May 1986. Google Scholar
Digital Library
- J. T. Kajiya. The rendering equation. In Siggraph 1986, pages 143--150, 1986. Google Scholar
Digital Library
- B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, A. Change, and S. Rixner. Imagine: Media processing with streams. IEEE Micro, 21(2):35--46, 2001. Google Scholar
Digital Library
- J. Mahovsky and B. Wyvill. Memory-conserving bounding volume hierarchies with coherent raytracing. Computer Graphics Forum, 25(2):173--182, 2006.Google Scholar
Cross Ref
- E. Mansson, J. Munkberg, and T. Akenine-Moller. Deep coherent ray tracing. In 2007 IEEE Symposium on Interactive Ray Tracing, pages 79--85, September 2007. Google Scholar
Digital Library
- B. K. Mathew, A. Davis, and M. A. Parker. A low power architecture for embedded perception. In CASES '04: International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 46--56, September 2004. Google Scholar
Digital Library
- K. Nakamaru and Y. Ohno. Breadth-first ray tracing utilizing uniform spatial subdivision. IEEE Transactions on Visualization and Computer Graphics, 3(4):316--328, 1997. Google Scholar
Digital Library
- P. Navratil, D. Fussell, C. Lin, and W. R. Mark. Dynamic ray scheduling for improved system performance. In 2007 IEEE Symposium on Interactive Ray Tracing, pages 95--104, September 2007. Google Scholar
Digital Library
- NVIDIA. NVIDIA GeForce 8800 GPU Architectural Overview. November 2006.Google Scholar
- S. Parker, W. Martin, P.-P. J. Sloan, P. Shirley, B. Smits, and C. Hansen. Interactive ray tracing. In Symposium on Interactive 3D Graphics, pages 119--126, 1999. Google Scholar
Digital Library
- M. Pharr, C. Kolb, R. Gershbein, and P. Hanrahan. Rendering complex scenes with memory-coherent ray tracing. Computer Graphics, 31(Annual Conference Series):101--108, 1997. Google Scholar
Digital Library
- K. Ramani and A. Davis. Application driven embedded system design: A face recognition case study. In CASES '07: Proceedings of the 2007 International conference on compilers, architectures, and synthesis for embedded Systems, pages 103--114, 2007. Google Scholar
Digital Library
- K. Ramani and A. Davis. Automating the Design of Embedded Domain Specific Accelerators. Technical report, University of Utah, 2008.Google Scholar
- K. Ramani, A. Ibrahim, and D. Shimizu. PowerRed: A Flexible Modeling Framework for Power Efficiency Exploration in GPUs. In Proceedings of the Workshop on General Purpose Processing on GPUs, GPGPU'07.Google Scholar
- A. Reshetov. Omnidirectional ray tracing traversal algorithm for kd-trees. In 2006 IEEE Symposium on Interactive Ray Tracing, pages 57--60, September 2006.Google Scholar
Cross Ref
- A. Reshetov. Faster ray packets-triangle intersection through vertex culling. In 2007 IEEE Symposium on Interactive Ray Tracing, pages 105--12, September 2007. Google Scholar
Digital Library
- A. Reshetov, A. Soupikov, and J. Hurley. Multi-level ray tracing algorithm. ACM Transacions on Graphics, 24(3):1176--1185, July 2005. Google Scholar
Digital Library
- J. Schmittler, I. Wald, and P. Slusallek. SaarCOR: A hardware architecture for ray tracing. In Eurographics Workshop on Graphics Hardware, pages 27--36, September 2002. Google Scholar
Digital Library
- L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics, 27(3), 2008. To appear. Google Scholar
Digital Library
- I. Wald, C. Benthin, and S. Boulos. Getting rid of packets -- efficient simd single-ray traversal using multi-branching bvhs. In 2008 IEEE/Eurographics Symposium on Interactive Ray Tracing, pages 49--57, August 2008.Google Scholar
Cross Ref
- I. Wald, C. Benthin, M. Wagner, and P. Slusallek. Interactive rendering with coherent ray tracing. Computer Graphics Forum, 20(3):153--164, September 2001.Google Scholar
Digital Library
- I. Wald, S. Boulos, and P. Shirley. Ray tracing deformable scenes using dynamic bounding volume hierarchies. ACM Transactions on Graphics, 26(1):6, January 2007. Google Scholar
Digital Library
- T. Whitted. An improved illumination model for shaded display. Communications of the ACM, 23(6):343--349, 1980. Google Scholar
Digital Library
- S. Woop, J. Schmittler, and P. Slusallek. RPU: a programmable ray processing unit for realtime ray tracing. ACM Transactions on Graphics, 24(3):434--444, 2005. Google Scholar
Digital Library
Index Terms
StreamRay: a stream filtering architecture for coherent ray tracing
Recommendations
StreamRay: a stream filtering architecture for coherent ray tracing
ASPLOS 2009The wide availability of commodity graphics processors has made real-time graphics an intrinsic component of the human/computer interface. These graphics cores accelerate the z-buffer algorithm and provide a highly interactive experience at a relatively ...
StreamRay: a stream filtering architecture for coherent ray tracing
ASPLOS 2009The wide availability of commodity graphics processors has made real-time graphics an intrinsic component of the human/computer interface. These graphics cores accelerate the z-buffer algorithm and provide a highly interactive experience at a relatively ...
Simplified photon mapping for real-time caustics rendering
The objective of this paper is to adapt photon mapping for real-time simulation of caustics. High-performance algorithm adapted for the GPU and implemented on the basis of cross-platform OpenGL and OpenCL APIs is proposed. For effective rendering of ...








Comments