Abstract
We present RayCore, a mobile ray-tracing hardware architecture. RayCore facilitates high-quality rendering effects, such as reflection, refraction, and shadows, on mobile devices by performing real-time Whitted ray tracing. RayCore consists of two major components: ray-tracing units (RTUs) based on a unified traversal and intersection pipeline and a tree-building unit (TBU) for dynamic scenes. The overall RayCore architecture offers considerable benefits in terms of die area, memory access, and power consumption. We have evaluated our architecture based on FPGA and ASIC evaluations and demonstrate its performance on different benchmarks. According to the results, our architecture demonstrates high performance per unit area and unit energy, making it highly suitable for use in mobile devices.
Supplemental Material
Available for Download
Supplemental movie and image files for, RayCore: A Ray-Tracing Hardware Architecture for Mobile Devices
- Timo Aila and Tero Karras. 2010. Architecture considerations for tracing incoherent rays. In Proceedings of the Conference on High-Performance Graphics. Google Scholar
Digital Library
- Timo Aila and Samuli Laine. 2009. Understanding the efficiency of ray traversal on GPUs. In Proceedings of the Conference on High Performance Graphics. ACM Press, New York, 145--149. Google Scholar
Digital Library
- Carsten Benthin, Ingo Wald, Sven Woop, Manfred Ernst, and William R. Mark. 2012. Combining single and packet ray tracing for arbitrary ray distributions on the Intel MIC architecture. IEEE Trans. Visual. Comput. Graph. 18, 9, 1438--1448. Google Scholar
Digital Library
- Jacco Bikker. 2007. Real-time ray tracing through the eyes of a game developer. In Proceedings of the IEEE/EG Symposium on Interactive Ray Tracing. 1--10. Google Scholar
Digital Library
- Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Comm. ACM 54, 5, 67--77. Google Scholar
Digital Library
- Solomon Boulos, David Edwards, J. Dylan Lacewell, Joe Kniss, Jan Kautz, Ingo Wald, and Peter Shirley. 2006. Interactive distribution ray tracing. Tech. rep., No UUSCI-2006-022, SCI Institute, University of Utah.Google Scholar
- Byn Choi, Rakesh Komuravelli, Victor Lu, Hyojin Sung, Robert L. Bocchino, Sarita V. Adve, and John C. Hart. 2010. Parallel SAH k-d tree construction. In Proceedings of the Conference on High Performance Graphics. 77--86. Google Scholar
Digital Library
- Robert L. Cook, Thomas Porter, and Loren Carpenter. 1984. Distributed ray tracing. In Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'84). ACM Press, New York, 137--145. Google Scholar
Digital Library
- Peter Djeu, Warren A. Hunt, Rui Wang, Ikrima Elhassan, Gordon Stoll, and William R. Mark. 2011. Razor: An architecture for dynamic multiresolution ray tracing. ACM Trans. Graph. 30, 5, 115:1--115:26 pages. Google Scholar
Digital Library
- Michael J. Doyle, Colin Fowler, and Michael Manzke. 2013. A hardware unit for fast SAH-optimised BVH construction. ACM Trans. Graph. 32, 4. Google Scholar
Digital Library
- Venkatraman Govindaraju, Peter Djeu, Karthikeyan Sankaralingam, Mary Vernon, and William R. Mark. 2008. Toward a multicore architecture for real-time ray-tracing. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'08). 176--187. Google Scholar
Digital Library
- Christiaan Gribble and Alexis Naveros. 2013. GPU ray tracing with rayforce. In Proceedings of the ACM SIGGRAPH Posters. 98:1--98:1. Google Scholar
Digital Library
- Yan Gu, Yong He, Kayvon Fatahalian, and Guy Blelloch. 2013. Efficient BVH construction via approximate agglomerative clustering. In Proceedings of the 5th High-Performance Graphics Conference. 81--88. Google Scholar
Digital Library
- Hilbert Hagedoorn. 2012. Geforce GTX 680 review. Tech. rep., The guru of 3D. http://www.guru3d.com/articles.Google Scholar
- Ziyad S. Hakura and Anoop Gupta. 1997. The design and analysis of a cache architecture for texture mapping. SIGARCH Comput. Archit. News 25, 2, 108--120. Google Scholar
Digital Library
- Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. SIGARCH Comput. Archit. News 38, 3, 37--47. Google Scholar
Digital Library
- Jiri Havel and Adam Herout. 2010. Yet faster ray-triangle intersection (using SSE4). IEEE Trans. Visual. Comput. Graph. 16, 3, 434--438. Google Scholar
Digital Library
- Vlastimil Havran, Robert Herzog, and Hans-Peter Seidel. 2006. On the fast construction of spatial hierarchies for ray tracing. In Proceedings of the IEEE/EG Symposium on Interactive Ray Tracing. 71--80.Google Scholar
Cross Ref
- Qiming Hou, Xin Sun, Kun Zhou, Christian Lauterbach, and Dinesh Manocha. 2011. Memory-scalable GPU spatial hierarchy construction. IEEE Trans. Visual. Comput. Graph. 17, 3, 466--474. Google Scholar
Digital Library
- Warren Hunt, William R. Mark, and Gordon Stoll. 2006. Fast kd-tree construction with an adaptive error-bounded heuristic. In Proceedings of the IEEE/EG Symposium on Interactive Ray Tracing. 81--88.Google Scholar
Cross Ref
- Imgtec. 2013. Imagination technologies ships caustic series2 r2500 and r2100 ray tracing acceleration boards. http://www.imgtec.com/news/release/index.asp?NewsID=722.Google Scholar
- Thiago Ize and Charles D. Hansen. 2011. RTSAH traversal order for occlusion rays. Comput. Graph. Forum 30, 2, 297--305.Google Scholar
Cross Ref
- Yoon-Sig Kang, Jae-Ho Nah, Woo-Chan Park, and Sung-Bong Yang. 2013. gkDtree: A group-based parallel update kd-tree for interactive ray tracing. J. Syst. Archit. 59, 3, 166--175. Google Scholar
Digital Library
- Tero Karras. 2012. Maximizing parallelism in the construction of BVHs, octrees, and k-d trees. In Proceedings of the 4th ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics. 33--37. Google Scholar
Digital Library
- Tero Karras and Timo Aila. 2013. Fast parallel construction of highquality bounding volume hierarchies. In Proceedings of the 5th High-Performance Graphics Conference. 89--99. Google Scholar
Digital Library
- Alexander Keller, Tero Karras, Ingo Wald, Timo Aila, Samuli Laine, Jacco Bikker, Christiaan Gribble, Won-Jong Lee, and James Mccombe. 2013. Ray tracing is the future and ever will be.... In ACM SIGGRAPH Courses. Google Scholar
Digital Library
- Hong-Yun Kim, Young-Jun Kim, and Lee-Sup Kim. 2012. MRTP: Mobile ray tracing processor with reconfigurable stream multi-processors for high datapath utilization. IEEE J. Solid-State Circ. 47, 2, 518--535.Google Scholar
Cross Ref
- Hong-Yoon Kim, Young-Jun Kim, Jiehwan Oh, and Lee-Sup Kim. 2013. A reconfigurable SIMT processor for mobile ray tracing with contention reduction in shared memory. IEEE Trans. Circ. Syst. 60, 4, 938--950.Google Scholar
- Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2013. An energy and bandwidth efficient ray tracing architecture. In Proceedings of the 5th High-Performance Graphics Conference (HPG'13). 121--128. Google Scholar
Digital Library
- Daniel Kopta, Joseph Spjut, Erik Brunvand, and Al Davis. 2010. Efficient MIMD architectures for high-performance ray tracing. In Proceedings of the IEEE International Conference on Computer Design.Google Scholar
Cross Ref
- Hyuck-Joo Kwon, Jae-Ho Nah, Dinesh Manocha, and Woo-Chan Park. 2014. Effective traversal algorithms and hardware architecture for pyramidal inverse displacement mapping. Comput. Graph. 38, 140--149. Google Scholar
Digital Library
- Christian Lauterbach, Michael Garland, Shubhabrata Sengupta, David Luebke, and Dinesh Manocha. 2009. Fast BVH construction on GPUs. Comput. Graph. Forum 28, 2, 375--384.Google Scholar
Cross Ref
- Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah, Jin-Woo Kim, Youngsam Shin, Jaedon Lee, and Seok-Yoon Jung. 2012. SGRT: A scalable mobile GPU architecture based on ray tracing. InACM SIGGRAPH Talks. Google Scholar
Digital Library
- Won-Jong Lee, Youngsam Shin, Jaedon Lee, Jin-Woo Kim, Jae-Ho Nah, Seok-Yoon Jung, Shi-Hwa Lee, Hyun-Sang Park, and Tack-Don Han. 2013. SGRT: A mobile GPU architecture for real-time ray tracing. In Proceedings of the 5th High-Performance Graphics Conference. 109--119. Google Scholar
Digital Library
- Jonas Lext, Ulf Assarsson, and Tomas Moller. 2001. BART: A benchmark for animated ray tracing. IEEE Comput. Graph. Appl. 21, 2, 22--31. Google Scholar
Digital Library
- Aqeel Mahesri, Daniel Johnson, Neal Crago, and Sanjay J. Patel. 2008. Tradeoffs in designing accelerator architectures for visual computing. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. 164--175. Google Scholar
Digital Library
- Bochang Moon, Yongyoung Byun, Tae-Joon Kim, Pio Claudio, Hye-Sun Kim, Yun-Ji Ban, Seung Woo Nam, and Sung-Eui Yoon. 2010. Cache-oblivious ray reordering. ACM Trans. Graph. 29, 3, 28:1--28:10. Google Scholar
Digital Library
- Guy M. Morton. 1966. A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. IBM.Google Scholar
- Jae-Ho Nah, Yun-Hye Jung, Woo-Chan Park, and Tack-Don Han. 2012. Efficient ray sorting for the tracing of incoherent rays. IEICE Electron. Express 9, 9, 849--854.Google Scholar
Cross Ref
- Jae-Ho Nah, Yoon-Sig Kang, Kwang-Jo Lee, Shin-Jun Lee, Tack-Don Han, and Sung-Bong Yang. 2010. MobiRT: An implementation of OpenGL ES-Based CPU-GPU hybrid ray tracer for mobile devices. In ACM SIGGRAPH ASIA Sketches, Vol. 50, ACM Press, New York, 50:1--50:2. Google Scholar
Digital Library
- Jae-Ho Nah and Dinesh Manocha. 2014. SATO: Surface-area traversal order for shadow ray tracing. Comput. Graph. Forum. (preprint).Google Scholar
- Jae-Ho Nah, Jeong-Soo Park, Chanmin Park, Jin-Woo Kim, Yun-Hye Jung, Woo-Chan Park, and Tack-Don Han. 2011. T&I engine: Traversal and intersection engine for hardware accelerated ray tracing. ACM Trans. Graph. 30, 6, 160:1--160:10. Google Scholar
Digital Library
- Notebookcheck. 2013. Apple a7 smartphone SOC. http://www.notebook check.net/Apple-A7-Smartphone-SoC.103280.0.html.Google Scholar
- Nvidia. 2013. NVIDIA Tegra 4 family GPU architecture. Whitepaper http://www.nvidia.com/docs/IO/116757/Tegra_4_GPU_Whitepaper_FINALv2.pdf.Google Scholar
- Woo-Chan Park, Dong-Seok Kim, Jeong-Soo Park, Sang-Duk Kim, Hong-Sik Kim, and Tack-Don Han. 2011. The design of a texture mapping unit with effective mip-map level selection for real-time ray tracing. IEICE Electron. Express 8, 13, 1064--1070.Google Scholar
Cross Ref
- Woo-Chan Park, Jae-Ho Nah, Jeong-Soo Park, Kyung-Ho Lee, Dong-Seok Kim, Sang-Duk Kim, Jin-Hong Park, Cheong-Ghil Kim, Yoon-Sig Kang, Sung-Bong Yang, and Tack-Don Han. 2008. An FPGA implementation of Whitted-style ray tracing accelerator. In Proceedings of the IEEE Symposium on Interactive Ray Tracing. 187--187.Google Scholar
Cross Ref
- Steven G. Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock, David Luebke, David Mcallister, Morgan Mcguire, Keith Morley, Austin Robison, and Martin Stich. 2010. OptiX: A general purpose ray tracing engine. ACM Trans. Graph. 29, 4, 1--13. Google Scholar
Digital Library
- Matt Pharr and Greg Humphreys. 2010. Physically Based Rendering 2nd Ed. Morgan Kaufmann, San Fransisco, CA. Google Scholar
Digital Library
- Karthik Ramani, Christiaan P. Gribble, and Al Davis. 2009. StreamRay: A stream filtering architecture for coherent ray tracing. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). ACM Press, New York, 325--336. Google Scholar
Digital Library
- Alexander Reshetov, Alexei Soupikov, and Jim Hurley. 2005. Multi-level ray tracing algorithm. ACM Trans. Graph. 24, 3, 1176--1185. Google Scholar
Digital Library
- Jorg Schmittler, Sven Woop, Daniel Wagner, Wolfgang J. Paul, and Philipp Slusallek. 2004. Realtime ray tracing of dynamic scenes on an FPGA chip. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware. 95--106. Google Scholar
Digital Library
- Maxim Shevtsov, Alexei Soupikov, and Alexander Kapustin. 2007. Highly parallel fast kd-tree construction for interactive ray tracing of dynamic scenes. Comput. Graph. Forum 26, 3, 395--404.Google Scholar
Cross Ref
- Peter Shirley, Kelvin Sung, Erik Brunvand, Alan Davis, Steven Parker, and Solomon Boulos. 2008. Fast ray tracing and the potential effects on graphics and gaming courses. Comput. Graph. 32, 2, 260--267. Google Scholar
Digital Library
- Siliconarts. 2013. RaySort. http://www.siliconarts.co.kr.Google Scholar
- Josef Spjut, Andrew Kensler, Daniel Kopta, and Erik Brunvand. 2009. TRaX: A multicore hardware architecture for real-time ray tracing. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 28, 12, 1802--1815. Google Scholar
Digital Library
- Joseph Spjut, Daniel Kopta, Erik Brunvand, and Al Davis. 2012. A mobile accelerator architecture for ray tracing. In Proceedings of the 3rd Workshop on SoCs, Heterogeneous Architectures and Workloads (SHAW'12).Google Scholar
- Kevin Suffern. 2007. Ray Tracing from the Ground Up. A. K. Peters, Ltd. Google Scholar
Digital Library
- Synopsys. 2013. Power optimization in design compiler. http://www.synopsys.com/Tools/Implementation/RTLSynthesis/Pages/PowerCompi ler.aspx.Google Scholar
- Tony Tamasi. 2008. Evolution of computer graphics. http://www.nvidia.com/content/nvision2008/tech_presentations/Technology_Keynotes/NVISIO N08-Tech_Keynote-GPU.pdf.Google Scholar
- Art Tevs, Ivo Ihrke, and Hans-Peter Seidel. 2008. Maximum mipmaps for fast, accurate, and scalable dynamic height field rendering. In Proceedings of the Symposium on Interactive 3D Graphics and Games (I3D'07). ACM Press, New York, 183--190. Google Scholar
Digital Library
- Tsmc. 2012. 28nm technology. http://www.tsmc.com/english/dedicated Foundry/technology/28nm.htm.Google Scholar
- Eric Veach and Leonidas Guibas. 1994. Bidirectional estimators for light transport. In Proceedings of the Eurographics Rendering Workshop. 147--162.Google Scholar
- Carsten Wachter and Alexander Keller. 2006. Instant ray tracing: The bounding interval hierarchy. In Proceedings of the 17th Eurographics Workshop on Rendering. 139--149. Google Scholar
Digital Library
- Barry Wagner. 2013. The evolving mobile platform. http://www.jedec. org/sites/default/files/Barry%20Wagner_Mobile%20Forum_May_2013-Final-04232013.pdf.Google Scholar
- Ingo Wald. 2004. Realtime ray tracing and interactive global illumination. http://www.sci.utah.edu/~wald/PhD/wald_phd.pdf.Google Scholar
- Ingo Wald, Carsten Benthin, and Philipp Slusallek. 2003. Distributed interactive ray tracing of dynamic scenes. In Proceedings of the IEEE Symposium on Parallel and Large-Data Visualization and Graphics. 77--86. Google Scholar
Digital Library
- Ingo Wald, Solomon Boulos, and Peter Shirley. 2007. Ray tracing deformable scenes using dynamic bounding volume hierarchies. ACM Trans. Graph. 26, 1, 6:1--6:18. Google Scholar
Digital Library
- Ingo Wald and Vlastimil Havran. 2006. On building fast kd-trees for ray tracing, and on doing that in o(n log n). In Proceedings of the IEEE/EG Symposium on Interactive Ray Tracing. 61--69.Google Scholar
Cross Ref
- Ingo Wald, Thiago Ize, and Steven G. Parker. 2008. Fast, parallel, and asynchronous construction of BVHS for ray tracing animated scenes. Comput. Graph. 32, 1, 3--13. Google Scholar
Digital Library
- Ingo Wald, William R. Mark, Johannes Gunther, Solomon Boulos, Thiago Ize, Warren Hunt, Steven G. Parker, and Peter Shirley. 2009. State of the art in ray tracing animated scenes. Comput. Graph. Forum 28, 6, 1691--1722.Google Scholar
Cross Ref
- Ingo Wald, Philipp Slusallek, Carsten Benthin, and Markus Wagner. 2001. Interactive rendering with coherent ray tracing. Comput. Graph. Forum 20, 3, 153--164. Google Scholar
Digital Library
- Turner Whitted. 1980. An improved illumination model for shaded display. Comm. ACM 23, 6, 343--349. Google Scholar
Digital Library
- Sven Woop, Erik Brunvand, and Philipp Slusallek. 2006a. Estimating performance of a ray-tracing ASIC design. In Proceedings of the IEEE/EG Symposium on Interactive Ray Tracing. 7--14.Google Scholar
Cross Ref
- Sven Woop, Gerd Marmitt, and Philipp Slusallek. 2006b. B-KD trees for hardware accelerated ray tracing of dynamic scenes. In Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware (GH'06). ACM Press, New York, 67--77. Google Scholar
Digital Library
- Sven Woop, Jorg Schmittler, and Philipp Slusallek. 2005. RPU: A programmable ray processing unit for realtime ray tracing. ACM Trans. Graph. 24, 3, 434--444. Google Scholar
Digital Library
- Zhefeng Wu, Fukai Zhao, and Xinguo Liu. 2011. SAH KD-tree construction on GPU. In Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics (HPG'11). 71--78. Google Scholar
Digital Library
- Kun Zhou, Qiming Hou, Rui Wang, and Baining Guo. 2008. Real-time kd-tree construction on graphics hardware. ACM Trans. Graph. 27, 5, 1--11. Google Scholar
Digital Library
Index Terms
RayCore: A Ray-Tracing Hardware Architecture for Mobile Devices
Recommendations
Ray tracing-based interactive diffuse indirect illumination
Despite great efforts in recent years to accelerate global illumination computation, the real-time ray tracing of fully dynamic scenes to support photorealistic indirect illumination effects has yet to be achieved in computer graphics. In this paper, we ...
Complex Luminaires: Illumination and Appearance Rendering
Simulating a complex luminaire such as a chandelier is expensive and slow, even using state-of-the-art algorithms. A more practical alternative is to use precomputation to accelerate rendering. Prior approaches cached information on an aperture surface ...
RenderMan: An Advanced Path-Tracing Architecture for Movie Rendering
Special Issue On Production Rendering and Regular PapersPixar’s RenderMan renderer is used to render all of Pixar’s films and by many film studios to render visual effects for live-action movies. RenderMan started as a scanline renderer based on the Reyes algorithm, and it was extended over the years with ...





Comments