ABSTRACT
Recently, with the increasing demand for photorealistic graphics and the rapid advances in desktop CPUs/GPUs, real-time ray tracing has attracted considerable attention. Unfortunately, ray tracing in the current mobile environment is very difficult because of inadequate computing power, memory bandwidth, and flexibility in mobile GPUs. In this paper, we present a novel mobile GPU architecture called SGRT (Samsung reconfigurable GPU based on Ray Tracing) in which a fast compact hardware accelerator and a flexible programmable shader are combined. SGRT has two key features: 1) an area-efficient parallel pipelined traversal unit; and 2) flexible and high-performance kernels for shading and ray generation. Simulation results show that SGRT is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers. Our system is implemented on an FPGA platform, and mobile ray tracing is successfully demonstrated.
- Aila, T., Laine, S., and Karras, T. 2012. Understanding the efficiency of ray traversal on GPUs - kepler and fermi addendum. In Proceedings of ACM High Performance Graphics 2012, Posters. Google Scholar
Digital Library
- ARM, 2013. ARM flagship mobile GPU, Mali-T678. http://www.arm.com/products/multimedia/mali-graphics-plus-gpu-compute/mali-t678.php.Google Scholar
- ARM, 2013. The ARM NEON general-purpose SIMD engine. http://www.arm.com/products/processors/technologies/neon.php.Google Scholar
- Bakhoda, A., Yuan, G. L., Fung, W. W. L., Wong, H., and Aamodt, T. M. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), 163--174.Google Scholar
- Borkar, S., and Chien, A. A. 2011. The future of microprocessors. Communications of the ACM 54, 5 (May), 67--77. Google Scholar
Digital Library
- Caustic, 2013. Caustic series2 raytracing acceleration boards. https://caustic.com/series2.Google Scholar
- CUDA, 2013. NVIDIA CUDA 5. http://www.nvidia.com/object/cuda_home_new.html.Google Scholar
- Ernst, M. 2008. Multi bounding volume hierarchies. In Proceedings of IEEE/Eurographics Symposium on Interactive Ray Tracing 2008, 35--40.Google Scholar
Cross Ref
- Ernst, M. 2012. Embree: Photo-realistic ray tracing kernels. In ACM SIGGRAPH 2012,Exhibitor Technical Talk.Google Scholar
- Exynos, 2013. Samsung application processor. http://www.samsung.com/exynos.Google Scholar
- Garanzha, K., and Loop, C. 2010. Fast ray sorting and breadth-first packet traversal for GPU ray tracing. Computer Graphics Forum 29, 2, 289--298.Google Scholar
- Goma, S. R. 2011. A 3D camera solution for mobile platform. In Workshop on 3D Imaging.Google Scholar
- Gribble, C., and Ramani, K. 2008. Coherent ray tracing via stream filtering. In Proceedings of IEEE/Eurographics Symposium on Interactive Ray Tracing, 59--68.Google Scholar
- Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B. C., StephenRichardson, Kozyrakis, C., and Horowitz, M. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer architecture (ISCA), 37--47. Google Scholar
Digital Library
- HSA, 2013. Heterogeneous system architecture foundation. http://www.hsafoundation.com.Google Scholar
- iPad4, 2013. Inside the apple iPad4, A6X a very new beast! http://www.chipworks.com/blog/recentteardowns/2012/11/01/inside-the-apple-ipad-4-a6x-a-very-new-beast/.Google Scholar
- JEDEC, 2012. Low power double data rate 3 SDRAM (LPDDR3). http://www.jedec.org/sites/default/files/docs/JESD209-3.pdf.Google Scholar
- JEDEC, 2012. Wide I/O single data rate (Wide I/O SDR). http://www.jedec.org/sites/default/files/docs/JESD229.pdf.Google Scholar
- Kan, P., and Jaufmann, H. 2012. High-quality reflection, refraction, and caustics in augmented reality and their contribution to visual coherence. In Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR), 99--108. Google Scholar
Digital Library
- Kim, H.-Y., Kim, Y.-J., Oh, J., and Kim, L.-S. 2012. A reconfigurable SIMT processor for mobile ray tracing with contention reduction in shared memory. IEEE Transactions on Circuits and Systems 1, 99, 1--13.Google Scholar
- Kim, J.-W., Lee, W.-J., Lee, M.-W., and Han, T.-D. 2012. Parallel-pipeline-based traversal unit for hardware-accelerated ray tracing. In Proceedings of ACM SIGGRAPH Asia 2012, Posters. Google Scholar
Digital Library
- KISHONTI, 2013. CLbenchmark 1.1. http://clbenchmark.com.Google Scholar
- Kopta, D., Spjut, J., Davis, A., and Brunvand, E. 2010. Efficient MIMD architectures for high-performance ray tracing. In Proceedings of the 28th IEEE International Conference on Computer Design, 9--16.Google Scholar
- Lee, W.-J., Woo, S.-O., Kwon, K.-T., Son, S.-J., Min, K.-J., Lee, C.-H., Jang, K.-J., Park, C.-M., Jung, S.-Y., and Lee, S.-H. 2011. A scalable GPU architecture based on dynamically embedded reconfigurable processor. In Proceedings of ACM High Performance Graphics 2011, Posters.Google Scholar
- Lee, W.-J., Lee, S., Nah, J.-H., Kim, J.-W., Shin, Y., Lee, J., and Jung, S. 2012. SGRT: A scalable mobile GPU architecture based on ray tracing. In Proceedings of ACM SIGGRAPH 2012, Talks. Google Scholar
Digital Library
- Mahesri, A., Johnson, D., Crago, N., and Patel, S. 2008. Tradeoffs in designing accelerator architectures for visual computing. In Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, 164--175. Google Scholar
Digital Library
- Mei, B., Vernalde, S., Verkest, D., Man, H. D., and Lauwereins, R. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proceedings of the conference on Design, Automation and Test in Europe (DATE) 2003, 10296. Google Scholar
Digital Library
- Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of IEEE/ACM International Symposium on Microarchitecture, 3--14. Google Scholar
Digital Library
- Nah, J.-H., Kang, Y.-S., Lee, K.-J., Lee, S.-J., Han, T.-D., and Yang, S.-B. 2010. MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices. In Proceedings of ACM SIGGRAPH ASIA 2010 Sketches. Google Scholar
Digital Library
- Nah, J.-H., Park, J.-S., Park, C., Kim, J.-W., Jung, Y.-H., Park, W.-C., and Han, T.-D. 2011. T&I Engine: traversal and intersection engine for hardware accelerated ray tracing. ACM Transactions on Graphics 30, 6 (Dec). Google Scholar
Digital Library
- OpenCL, 2013. Khronos OpenCL. http://www.khronos.org/opencl/.Google Scholar
- OptiX, 2013. NVIDIA OptiX. http://www.nvidia.com/object/optix.html.Google Scholar
- Park, W.-C., Kim, D.-S., Park, J.-S., Kim, S.-D., Kim, H.-S., and Han, T.-D. 2011. The design of a texture mapping unit with effective mip-map level selection for real-time ray tracing. IEICE Electron. Express 8, 13 (July), 1064--1070.Google Scholar
Cross Ref
- Peddie, J. 2011. OpenGL ES and mobile trends - the next-generation processing units. In ACM SIGGRAPH 2011 Khronos OpenGL ES and Mobile BOF Meeting.Google Scholar
- Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the 27th annual international symposium on computer architecture, 128--138. Google Scholar
Digital Library
- Schmittler, J., Woop, S., Wagner, D., Paul, W. J., and Slusallek, P. 2004. Realtime ray tracing of dynamic scenes on an FPGA chip. In Proceedings of ACM SIGGRAPH/EUROGRAPHICS Graphics Hardware, 95--106. Google Scholar
Digital Library
- Slusallek, P. 2006. Hardware architectures for ray tracing. In ACM SIGGRAPH 2006 Course Notes.Google Scholar
- Snapdragon, 2013. Qualcomm application processor. http://www.qualcomm.com/snapdragon.Google Scholar
- Song, J. H., Lee, W. C., Kim, D. H., Kim, D.-H., and Lee, S. 2012. Low-power video decoding system using a reconfigurable processor. In Proceedings of IEEE International Conference on Consumer Electronics (ICCE) 2012, 532--533.Google Scholar
- Spjut, J., Kensler, A., Kopta, D., and Brunvand, E. 2009. TRaX: a multicore hardware architecture for real-time ray tracing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 12, 1802--1815. Google Scholar
Digital Library
- Spjut, J., Kopta, D., Brunvand, E., and Davis, A. 2012. A mobile accelerator architecture for ray tracing. In Proceedings of 3rd Workshop on SoCs, Heterogeneous Architectures and Workloads (SHAW-3).Google Scholar
- SYNOPSYS, 2013. Designware library. http://www.synopsys.com/IP/SoCInfrastructureIP/DesignWare/Pages/default.aspx.Google Scholar
- SYNOPSYS, 2013. HAPS-60 series of FPGA systems. http://www.synopsys.com/Systems/FPGABasedPrototyping/Pages/HAPS-60-series.aspx.Google Scholar
- Tegra, 2013. NVIDIA application processor. http://www.nvidia.com/object/tegra-4-processor.html.Google Scholar
- Tsakok, J. A. 2009. Faster incoherent rays: Multi-BVH ray stream tracing. In Proceedings of ACM High Performance Graphics 2009, 151--158. Google Scholar
Digital Library
- Wald, I., Ize, T., and Parker, S. 2008. Fast, parallel, and asynchronous construction of BVHs for ray tracing animated scenes. Computers & Graphics 32, 1, 3--13. Google Scholar
Digital Library
- Wald, I. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Sarrland University.Google Scholar
- Wald, I. 2007. On fast construction of SAH-based bounding volume hierarchies. In Proceedings of IEEE/Eurographics Symposium on Interactive Ray Tracing 2007, 33--40. Google Scholar
Digital Library
- Wald, I. 2012. Fast construction of SAH BVHs on the intel many integrated core (MIC) architecture. IEEE Transactions on Visualization and Computer Graphics 18, 1, 47--57. Google Scholar
Digital Library
- Woop, S., Schmittler, J., and Slusallek, P. 2005. RPU: a programmable ray processing unit for realtime ray tracing. ACM Transactions on Graphics 24, 3, 434--444. Google Scholar
Digital Library
- Woop, S. 2007. A Programmable Hardware Architecture for Realtime Ray Tracing of Coherent Dynamic Scenes. PhD thesis, Sarrland University.Google Scholar
- XILINX, 2013. Virtex-6 FPGA family. http://www.xilinx.com/products/silicon-devices/fpga/virtex-6/index.htm.Google Scholar
Index Terms
SGRT: a mobile GPU architecture for real-time ray tracing
Recommendations
Real-time ray tracing on future mobile computing platform
SA '13: SIGGRAPH Asia 2013 Symposium on Mobile Graphics and Interactive ApplicationsIn this work, we present a novel mobile computing platfom for mobile ray tracing in which a fast compact hardware accelerator and a flexible programmable shader are combined. Our platform has two key features: 1) an area-efficient parallel pipelined ...
Use of hardware Z-buffered rasterization to accelerate ray tracing
SAC '07: Proceedings of the 2007 ACM symposium on Applied computingRay tracing is a rendering technique for producing realistic 3D computer graphics. Compared to traditional scan-line rendering which is generally adopted by graphics pipeline, ray tracing can simulate more realistic global illumination, however, with ...
A hybrid GPU rasterized and ray traced rendering pipeline for real time rendering of per pixel effects
ICEC'12: Proceedings of the 11th international conference on Entertainment ComputingRendering in 3D games typically uses rasterization approaches in order to guarantee interactive frame rates, since ray tracing, a superior method for rendering photorealistic images, has greater computational cost. With the advent of massively parallel ...




Comments