skip to main content
10.1145/2492045.2492057acmconferencesArticle/Chapter ViewAbstractPublication PageshpgConference Proceedingsconference-collections
research-article

SGRT: a mobile GPU architecture for real-time ray tracing

Published:19 July 2013Publication History

ABSTRACT

Recently, with the increasing demand for photorealistic graphics and the rapid advances in desktop CPUs/GPUs, real-time ray tracing has attracted considerable attention. Unfortunately, ray tracing in the current mobile environment is very difficult because of inadequate computing power, memory bandwidth, and flexibility in mobile GPUs. In this paper, we present a novel mobile GPU architecture called SGRT (Samsung reconfigurable GPU based on Ray Tracing) in which a fast compact hardware accelerator and a flexible programmable shader are combined. SGRT has two key features: 1) an area-efficient parallel pipelined traversal unit; and 2) flexible and high-performance kernels for shading and ray generation. Simulation results show that SGRT is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers. Our system is implemented on an FPGA platform, and mobile ray tracing is successfully demonstrated.

References

  1. Aila, T., Laine, S., and Karras, T. 2012. Understanding the efficiency of ray traversal on GPUs - kepler and fermi addendum. In Proceedings of ACM High Performance Graphics 2012, Posters. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARM, 2013. ARM flagship mobile GPU, Mali-T678. http://www.arm.com/products/multimedia/mali-graphics-plus-gpu-compute/mali-t678.php.Google ScholarGoogle Scholar
  3. ARM, 2013. The ARM NEON general-purpose SIMD engine. http://www.arm.com/products/processors/technologies/neon.php.Google ScholarGoogle Scholar
  4. Bakhoda, A., Yuan, G. L., Fung, W. W. L., Wong, H., and Aamodt, T. M. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), 163--174.Google ScholarGoogle Scholar
  5. Borkar, S., and Chien, A. A. 2011. The future of microprocessors. Communications of the ACM 54, 5 (May), 67--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Caustic, 2013. Caustic series2 raytracing acceleration boards. https://caustic.com/series2.Google ScholarGoogle Scholar
  7. CUDA, 2013. NVIDIA CUDA 5. http://www.nvidia.com/object/cuda_home_new.html.Google ScholarGoogle Scholar
  8. Ernst, M. 2008. Multi bounding volume hierarchies. In Proceedings of IEEE/Eurographics Symposium on Interactive Ray Tracing 2008, 35--40.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ernst, M. 2012. Embree: Photo-realistic ray tracing kernels. In ACM SIGGRAPH 2012,Exhibitor Technical Talk.Google ScholarGoogle Scholar
  10. Exynos, 2013. Samsung application processor. http://www.samsung.com/exynos.Google ScholarGoogle Scholar
  11. Garanzha, K., and Loop, C. 2010. Fast ray sorting and breadth-first packet traversal for GPU ray tracing. Computer Graphics Forum 29, 2, 289--298.Google ScholarGoogle Scholar
  12. Goma, S. R. 2011. A 3D camera solution for mobile platform. In Workshop on 3D Imaging.Google ScholarGoogle Scholar
  13. Gribble, C., and Ramani, K. 2008. Coherent ray tracing via stream filtering. In Proceedings of IEEE/Eurographics Symposium on Interactive Ray Tracing, 59--68.Google ScholarGoogle Scholar
  14. Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B. C., StephenRichardson, Kozyrakis, C., and Horowitz, M. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer architecture (ISCA), 37--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. HSA, 2013. Heterogeneous system architecture foundation. http://www.hsafoundation.com.Google ScholarGoogle Scholar
  16. iPad4, 2013. Inside the apple iPad4, A6X a very new beast! http://www.chipworks.com/blog/recentteardowns/2012/11/01/inside-the-apple-ipad-4-a6x-a-very-new-beast/.Google ScholarGoogle Scholar
  17. JEDEC, 2012. Low power double data rate 3 SDRAM (LPDDR3). http://www.jedec.org/sites/default/files/docs/JESD209-3.pdf.Google ScholarGoogle Scholar
  18. JEDEC, 2012. Wide I/O single data rate (Wide I/O SDR). http://www.jedec.org/sites/default/files/docs/JESD229.pdf.Google ScholarGoogle Scholar
  19. Kan, P., and Jaufmann, H. 2012. High-quality reflection, refraction, and caustics in augmented reality and their contribution to visual coherence. In Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR), 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kim, H.-Y., Kim, Y.-J., Oh, J., and Kim, L.-S. 2012. A reconfigurable SIMT processor for mobile ray tracing with contention reduction in shared memory. IEEE Transactions on Circuits and Systems 1, 99, 1--13.Google ScholarGoogle Scholar
  21. Kim, J.-W., Lee, W.-J., Lee, M.-W., and Han, T.-D. 2012. Parallel-pipeline-based traversal unit for hardware-accelerated ray tracing. In Proceedings of ACM SIGGRAPH Asia 2012, Posters. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. KISHONTI, 2013. CLbenchmark 1.1. http://clbenchmark.com.Google ScholarGoogle Scholar
  23. Kopta, D., Spjut, J., Davis, A., and Brunvand, E. 2010. Efficient MIMD architectures for high-performance ray tracing. In Proceedings of the 28th IEEE International Conference on Computer Design, 9--16.Google ScholarGoogle Scholar
  24. Lee, W.-J., Woo, S.-O., Kwon, K.-T., Son, S.-J., Min, K.-J., Lee, C.-H., Jang, K.-J., Park, C.-M., Jung, S.-Y., and Lee, S.-H. 2011. A scalable GPU architecture based on dynamically embedded reconfigurable processor. In Proceedings of ACM High Performance Graphics 2011, Posters.Google ScholarGoogle Scholar
  25. Lee, W.-J., Lee, S., Nah, J.-H., Kim, J.-W., Shin, Y., Lee, J., and Jung, S. 2012. SGRT: A scalable mobile GPU architecture based on ray tracing. In Proceedings of ACM SIGGRAPH 2012, Talks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mahesri, A., Johnson, D., Crago, N., and Patel, S. 2008. Tradeoffs in designing accelerator architectures for visual computing. In Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, 164--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mei, B., Vernalde, S., Verkest, D., Man, H. D., and Lauwereins, R. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proceedings of the conference on Design, Automation and Test in Europe (DATE) 2003, 10296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of IEEE/ACM International Symposium on Microarchitecture, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nah, J.-H., Kang, Y.-S., Lee, K.-J., Lee, S.-J., Han, T.-D., and Yang, S.-B. 2010. MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices. In Proceedings of ACM SIGGRAPH ASIA 2010 Sketches. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nah, J.-H., Park, J.-S., Park, C., Kim, J.-W., Jung, Y.-H., Park, W.-C., and Han, T.-D. 2011. T&I Engine: traversal and intersection engine for hardware accelerated ray tracing. ACM Transactions on Graphics 30, 6 (Dec). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. OpenCL, 2013. Khronos OpenCL. http://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  32. OptiX, 2013. NVIDIA OptiX. http://www.nvidia.com/object/optix.html.Google ScholarGoogle Scholar
  33. Park, W.-C., Kim, D.-S., Park, J.-S., Kim, S.-D., Kim, H.-S., and Han, T.-D. 2011. The design of a texture mapping unit with effective mip-map level selection for real-time ray tracing. IEICE Electron. Express 8, 13 (July), 1064--1070.Google ScholarGoogle ScholarCross RefCross Ref
  34. Peddie, J. 2011. OpenGL ES and mobile trends - the next-generation processing units. In ACM SIGGRAPH 2011 Khronos OpenGL ES and Mobile BOF Meeting.Google ScholarGoogle Scholar
  35. Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the 27th annual international symposium on computer architecture, 128--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Schmittler, J., Woop, S., Wagner, D., Paul, W. J., and Slusallek, P. 2004. Realtime ray tracing of dynamic scenes on an FPGA chip. In Proceedings of ACM SIGGRAPH/EUROGRAPHICS Graphics Hardware, 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Slusallek, P. 2006. Hardware architectures for ray tracing. In ACM SIGGRAPH 2006 Course Notes.Google ScholarGoogle Scholar
  38. Snapdragon, 2013. Qualcomm application processor. http://www.qualcomm.com/snapdragon.Google ScholarGoogle Scholar
  39. Song, J. H., Lee, W. C., Kim, D. H., Kim, D.-H., and Lee, S. 2012. Low-power video decoding system using a reconfigurable processor. In Proceedings of IEEE International Conference on Consumer Electronics (ICCE) 2012, 532--533.Google ScholarGoogle Scholar
  40. Spjut, J., Kensler, A., Kopta, D., and Brunvand, E. 2009. TRaX: a multicore hardware architecture for real-time ray tracing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 12, 1802--1815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Spjut, J., Kopta, D., Brunvand, E., and Davis, A. 2012. A mobile accelerator architecture for ray tracing. In Proceedings of 3rd Workshop on SoCs, Heterogeneous Architectures and Workloads (SHAW-3).Google ScholarGoogle Scholar
  42. SYNOPSYS, 2013. Designware library. http://www.synopsys.com/IP/SoCInfrastructureIP/DesignWare/Pages/default.aspx.Google ScholarGoogle Scholar
  43. SYNOPSYS, 2013. HAPS-60 series of FPGA systems. http://www.synopsys.com/Systems/FPGABasedPrototyping/Pages/HAPS-60-series.aspx.Google ScholarGoogle Scholar
  44. Tegra, 2013. NVIDIA application processor. http://www.nvidia.com/object/tegra-4-processor.html.Google ScholarGoogle Scholar
  45. Tsakok, J. A. 2009. Faster incoherent rays: Multi-BVH ray stream tracing. In Proceedings of ACM High Performance Graphics 2009, 151--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wald, I., Ize, T., and Parker, S. 2008. Fast, parallel, and asynchronous construction of BVHs for ray tracing animated scenes. Computers & Graphics 32, 1, 3--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wald, I. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Sarrland University.Google ScholarGoogle Scholar
  48. Wald, I. 2007. On fast construction of SAH-based bounding volume hierarchies. In Proceedings of IEEE/Eurographics Symposium on Interactive Ray Tracing 2007, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wald, I. 2012. Fast construction of SAH BVHs on the intel many integrated core (MIC) architecture. IEEE Transactions on Visualization and Computer Graphics 18, 1, 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Woop, S., Schmittler, J., and Slusallek, P. 2005. RPU: a programmable ray processing unit for realtime ray tracing. ACM Transactions on Graphics 24, 3, 434--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Woop, S. 2007. A Programmable Hardware Architecture for Realtime Ray Tracing of Coherent Dynamic Scenes. PhD thesis, Sarrland University.Google ScholarGoogle Scholar
  52. XILINX, 2013. Virtex-6 FPGA family. http://www.xilinx.com/products/silicon-devices/fpga/virtex-6/index.htm.Google ScholarGoogle Scholar

Index Terms

  1. SGRT: a mobile GPU architecture for real-time ray tracing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HPG '13: Proceedings of the 5th High-Performance Graphics Conference
        July 2013
        149 pages
        ISBN:9781450321358
        DOI:10.1145/2492045

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        HPG '13 Paper Acceptance Rate15of44submissions,34%Overall Acceptance Rate15of44submissions,34%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader