skip to main content
article

Memory Considerations for Low Energy Ray Tracing

Authors Info & Claims
Published:01 February 2015Publication History
Skip Abstract Section

Abstract

We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases L1 hit rates and reduces off-chip memory energy substantially through better management of off-chip memory access patterns. To evaluate this model, we augment our architectural simulator with a detailed memory system simulation that includes accurate control, timing and power models for memory controllers and off-chip dynamic random-access memory . These details change the results significantly over previous simulations that used a simpler model of off-chip memory, indicating that this type of memory system simulation is important for realistic simulations that involve external memory. Secondly, we employ reconfigurable special-purpose pipelines that are constructed dynamically under program control. These pipelines use shared execution units that can be configured to support the common compute kernels that are the foundation of the ray tracing algorithm. This reduces the overhead incurred by on-chip memory and register accesses. These two synergistic features yield a ray tracing architecture that reduces energy by optimizing both on-chip and off-chip memory activity when compared to a more traditional approach.

References

  1. {AK10}¿ Aila T., Karras T.: Architecture considerations for tracing incoherent rays. In Proceedings of High Performance Graphics Saarbrucken, Germany, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {BEL*07}¿ Boulos S., Edwards D., Lacewell J. D., Kniss J., Kautz J., Shirley P., Wald I.: Packet-based whitted and distribution ray tracing. In Proceedings of Graphics Interface Montreal, Quebec, Canada, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {BFH12}¿ Brownlee C., Fogal T., Hansen C. D.: GLuRay: Enhanced ray tracing in existing scientific visualization applications using OpenGL interception. In Proceedings of EGPGV Cagliari, Italy, 2012, Eurographics, pp. pp.41-50.Google ScholarGoogle Scholar
  4. {BIH13}¿ Brownlee C., Ize T., Hansen C. D.: Image-parallel ray tracing using OpenGL interception. In Proceedings of EGPGV Girona, Spain, 2013, Eurographics, pp. pp.65-72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {BSP06}¿ Bigler J., Stephens A., Parker S. G.: Design for parallel interactive ray tracing systems. In Proceedings of Symposium on Interactive Ray Tracing IRT '06 Salt Lake City, UT, USA, 2006, pp. pp.187-196.Google ScholarGoogle ScholarCross RefCross Ref
  6. {BWB08}¿ Boulos S., Wald I., Benthin C.: Adaptive ray packet reordering. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  7. {CBS*12}¿ Chatterjee N., Balasubramonian R., Shevgoor M., Pugsley S., Udipi A., Shafiee A., Sudan K., Awasthi M., Chishti Z.: USIMM: the Utah SImulated Memory Module. Tech. Rep. UUCS-12-02, University of Utah, 2012. See also: "http://utaharch.blogspot.com/2012/02/usimm.html". Accessed 23 July 2014.Google ScholarGoogle Scholar
  8. {CLAL07}¿ Chang C.-H., Lohrmann P. J., Agu E. O., Lindeman R. W.: ENCORE: Energy-conscious rendering for mobile device. In Proceedings of GPGPU Boston, MA, USA, 2007.Google ScholarGoogle Scholar
  9. {CLF*03}¿ Christensen P. H., Laur D. M., Fong J., Wooten W. L., Batali D.: Ray differentials and multiresolution geometry caching for distribution ray tracing in complex scenes. In Proceedings of Eurographics 2003 Granada, Spain, 2003, pp. pp.543-552.Google ScholarGoogle ScholarCross RefCross Ref
  10. {Dal13}¿ Dally B.: The challenge of future high-performance computing. Celsius Lecture, Uppsala University, Uppsala, Sweden, 2013. "http://media.medfarm.uu.se/play/video/3261". Accessed 23 July 2014.Google ScholarGoogle Scholar
  11. {DHS04}¿ Dmitriev K., Havran V., Seidel H.-P.: Faster Ray Tracing with SIMD Shaft Culling. Tech. Rep. MPI-I-2004-4-006, Max-Planck-Institut für Informatik, December 2004.Google ScholarGoogle Scholar
  12. {DK00}¿ Dachille IXF., Kaufman A.: Gi-cube: An architecture for volumetric global illumination and rendering. In HWWS '00: Proceedings of ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware Interlaken, Switzerland, 2000, ACM, pp. pp.119-128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {GDS*08}¿ Govindaraju V., Djeu P., Sankaralingam K., Vernon M., Mark W. R.: Toward a multicore architecture for real-time ray-tracing. In Proceedings of IEEE/ACM Micro '08 Lake Como, Italy, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {GPSS07}¿ Günther J., Popov S., Seidel H.-P., Slusallek P.: Realtime ray tracing on GPU with BVH-based packet traversal. In Proceedings of Symposium on Interactive Ray Tracing IRT '07 Ulm, Germany, 2007, pp. pp.113-118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {GR08}¿ Gribble C., Ramani K.: Coherent ray tracing via stream filtering. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  16. {HDW*11}¿ Hapala M., Davidovic T., Wald I., Havran V., Slusallek P.: Efficient stack-less BVH traversal for ray tracing. In Proceedings of 27th Spring Conference of Computer Graphics SCCG 2011 Vinicne, Slovak Republic, 2011, pp. pp.29-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {HWR12}¿ HWRT: SimTRaX a cycle-accurate ray tracing architectural simulator and compiler. "http://code.google.com/p/simtrax/", 2012. Utah Hardware Ray Tracing Group. Accessed 23 July 2014.Google ScholarGoogle Scholar
  18. {IBH11}¿ Ize T., Brownlee C., Hansen C. D.: Real-time ray tracer for visualizing massive models on a cluster. In Proceedings of EGPGV Llandudno, UK, 2011, Eurographics, pp. pp.61-69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {Ima13}¿ Imagination Technologies: Caustic professional, 2013. "http://www.imgtec.com/caustic/". Accessed 23 July 2014.Google ScholarGoogle Scholar
  20. {IPD04}¿ Ibrahim A., Parker M., Davis A.: Energy efficient cluster co-processors. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing Montreal, Quebec, Canada, 2004.Google ScholarGoogle Scholar
  21. {JGDAM12}¿ Johnsson B., Ganestam P., Doggett M., Akenine-Möller T.: Power efficiency for software algorithms running on graphics processors. In EGGH-HPG '12: Proceedings of ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics Paris, France, 2012, Eurographics Association, pp. pp.67-75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {JNW10}¿ Jacob B., Ng S., Wang D.: Memory Systems: Cache, DRAM, Disk. Elsevier Science, Burlington, MA, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {Kaj86}¿ Kajiya J. T.: The rendering equation. In Proceedings of SIGGRAPH Dallas, TX, USA, 1986, pp. pp.143-150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {KJJ*09}¿ Kelm J. H., Johnson D. R., Johnson M. R., Crago N. C., Tuohy W., Mahesri A., Lumetta S. S., Frank M. I., Patel S. J.: Rigel: An architecture and scalable programming interface for a 1000-core accelerator. In Proceedings of ISCA '09 Austin TX, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {KKK12}¿ Kim H.-Y., Kim Y.-J., Kim L.-S.: MRTP: Mobile ray tracing processor with reconfigurable stream multi-processors for high datapath utilization. IEEE Journal of Solid-State Circuits Volume 47, Issue 2 February 2012, pp.518-535.Google ScholarGoogle ScholarCross RefCross Ref
  26. {KSBD10}¿ Kopta D., Spjut J., Brunvand E., Davis A.: Efficient MIMD architectures for high-performance ray tracing. In Proceedings of IEEE International Conference on Computer Design ICCD Amsterdam, The Netherlands, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  27. {KSBP08}¿ Kopta D., Spujt J., Brunvand E., Parker S.: Comparing incoherent ray performance of TRaX vs. Manta. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008, p. 183.Google ScholarGoogle ScholarCross RefCross Ref
  28. {KSS*13}¿ Kopta D., Shkurko K., Spjut J., Brunvand E., Davis A.: An energy and bandwidth efficient ray tracing architecture. In Proceedings of High-Performance Graphics HPG 2013 Anaheim, CA, USA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {Lai10}¿ Laine S.: Restart trail for stackless BVH traversal. In HPG '10: Proceedings of High Performance Graphics Saarbrucken, Germany, 2010, Eurographics Association, pp. pp.107-111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {LSL*13}¿ Lee W.-J., Shin Y., Lee J., Kim J.-W., Nah J.-H., Jung S., Lee S., Park H.-S., Han T.-D.: SGRT: A mobile GPU architecture for real-time ray tracing. In Proceedings of the 5th High-Performance Graphics Conference Anaheim, CA, USA, 2013, ACM, pp. pp.109-119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {MBJ07}¿ Muralimanohar N., Balasubramonian R., Jouppi N.: Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of MICRO '07 Chicago, IL, USA, 2007, pp. pp.3-14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. {MBK*10}¿ Moon B., Byun Y., Kim T.-J., Claudio P., Kim H.-S., Ban Y.-J., Nam S. W., Yoon S.-E.: Cache-oblivious ray reordering. ACM Transactions on Graphics Volume 29, Issue 3 July 2010, pp.28:1-28:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. {MDP04}¿ Mathew B., Davis A., Parker M.: A low power architecture for embedded perception processing. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Washington, DC, USA, 2004, pp. pp.46-56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {MLC06}¿ Mochocki B., Lahiri K., Cadambi S.: Power analysis of mobile 3D graphics. In DATE '06: Proceedings of Design, Automation and Test in Europe, 2006 Munich, Germany, 2006, vol. Volume 1, pp. pp.1 -6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. {MMAM07}¿ Mansson E., Munkberg J., Akenine-Moller T.: Deep coherent ray tracing. In Proceedings of Symposium on Interactive Ray Tracing IRT '07 Ulm, Germany, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. {MSC12}¿ MSC: 2012 memory scheduling championship, 2012. "http://www.cs.utah.edu/rajeev/jwac12/". Accessed 23 July 2014.Google ScholarGoogle Scholar
  37. {MT97}¿ Möller T., Trumbore B.: Fast, minimum storage ray triangle intersection. Journal of Graphics Tools Volume 2, Issue 1 October 1997, pp.21-28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. {NCQ13}¿ Nair P., Chou C.-C., Qureshi M. K.: A case for refresh pausing in DRAM memory systems. In HPCA '13: Proceedings of IEEE Symposium on High Performance Computer Architecture HPCA Washington, DC, USA, 2013, IEEE Computer Society, pp. pp.627-638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. {NFLM07}¿ Navratil P., Fussell D., Lin C., Mark W.: Dynamic ray scheduling for improved system performance. In Proceedings of Symposium on Interactive Ray Tracing IRT '07 Ulm, Germany, 2007.Google ScholarGoogle Scholar
  40. {ORM08}¿ Overbeck R., Ramamoorthi R., Mark W. R.: Large ray packets for real-time whitted ray tracing. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008, pp. pp.41-48.Google ScholarGoogle ScholarCross RefCross Ref
  41. {PBD*10}¿ Parker S. G., Bigler J., Dietrich A., Friedrich H., Hoberock J., Luebke D., McAllister D., McGuire M., Morley K., Robison A., Stich M.: OptiX: A general purpose ray tracing engine. In ACM SIGGRAPH 2010 papers 2010, SIGGRAPH '10, ACM, pp. pp.66:1-66:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. {PH96}¿ Pharr M., Hanrahan P.: Geometry caching for ray-tracing displacement maps. In Proceedings of Eurographics Rendering Workshop Porto, Portugal, 1996, Springer, pp. pp.31-40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. {PKGH97}¿ Pharr M., Kolb C., Gershbein R., Hanrahan P.: Rendering complex scenes with memory-coherent ray tracing. In Proceedings of SIGGRAPH '97 Los Angeles, CA, USA, 1997, pp. pp.101-108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. {PLS10a}¿ Pool J., Lastra A., Singh M.: An energy model for graphics processing units. In Proceedings of 2010 IEEE International Conference on Computer Design ICCD Amsterdam, The Netherlands, 2010, pp. pp.409 -416.Google ScholarGoogle ScholarCross RefCross Ref
  45. {PLS10b}¿ Pool J., Lastra A., Singh M.: A per-unit breakdown of the energy consumption in a graphics processing unit. In Proceedings of International Conference on Computer Design ICCD Amsterdam, The Netherlands, 2010.Google ScholarGoogle Scholar
  46. {PLS11}¿ Pool J., Lastra A., Singh M.: Power-gated arithmetic circuits for energy-precision tradeoffs in mobile graphics processing units. Journal of Low Power Eletronic Design Volume 7, Issue 2 April 2011, 148-162.Google ScholarGoogle ScholarCross RefCross Ref
  47. {Ram12}¿ <sc>Ramani</sc> K., : CoGenE: An Automated Design Framework for Domain Specific Architectures. PhD thesis, University of Utah, 2012.Google ScholarGoogle Scholar
  48. {RD07}¿ Ramani K., Davis A.: Application driven embedded system design: A face recognition case study. In Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems CASES Salzburg, Austria, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. {RG09}¿ Ramani K., Gribble C.: StreamRay: A stream filtering architecture for coherent ray tracing. In Proceedings of ASPLOS '09 Washington, DC, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. {RSH05}¿ Reshetov A., Soupikov A., Hurley J.: Multi-level ray tracing algorithm. ACM Transactions on Graphics SIGGRAPH '05 Volume 24, Issue 3 July 2005, pp.1176-1185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. {SCL05}¿ Steinhurst J., Coombe G., Lastra A.: Reordering for cache conscious photon mapping. In GI '05: Proceedings of Graphics Interface 2005 Victoria, British Columbia, Canada, 2005, Canadian Human-Computer Communications Society, pp. pp.97-104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. {SCS*08}¿ Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., Junkins S., Lake A., Sugerman J., Cavin R., Espasa R., Grochowski E., Juan T., Hanrahan P.: Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics Volume 27, Issue 3 August 2008, 18:1-18:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. {She13}¿ Shebanow M.: An evolution of mobile graphics. Keynote talk, HPG 2013, 2013. "http://highperformancegraphics.org/wp-content/uploads/Shebanow-Keynote.pdf". Accessed 23 July 2014.Google ScholarGoogle Scholar
  54. {Sil13}¿ Silicon Arts Coproration: RayCore Series 1000, 2013. "http://www.siliconarts.co.kr/gpu-ip". Accessed 23 July 2014.Google ScholarGoogle Scholar
  55. {SKBD12}¿ Spjut J., Kopta D., Brunvand E., Davis A.: A mobile accelerator architecture for ray tracing. In Proceedings of 3rd Workshop on SoCs, Heterogeneous Architectures and Workloads SHAW-3 New Orleans, LA, USA, 2012.Google ScholarGoogle Scholar
  56. {SKKB09}¿ Spjut J., Kensler A., Kopta D., Brunvand E.: TRaX: A multicore hardware architecture for real-time ray tracing. IEEE Transactions on Computer-Aided Design Volume 28, Issue 12 2009, pp.1802 - 1815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. {Smi98}¿ Smits B.: Efficiency issues for ray tracing. Journal of Graphics Tools Volume 3, Issue 2 February 1998, pp.1-14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. {SP10}¿ Silpa B., Panda P.: Introducing energy efficiency into graphics processors. In Proceedings of 2010 International Symposium on Electronic System Design ISED Bhubaneswar, India, 2010, p. 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. {SSKN07}¿ Shevtsov M., Soupikov A., Kapustin A., Novorod N.: Ray-triangle intersection algorithm for modern CPU architectures. In Procedings of GraphiCon'2007 Moscow, Russia, 2007.Google ScholarGoogle Scholar
  60. {Tsa09}¿ Tsakok J. A.: Faster incoherent rays: Multi-BVH ray stream tracing. In Proceedings of High Performance Graphics New Orleans, LA, USA, 2009, ACM, pp. pp.151-158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. {WBB08}¿ Wald I., Benthin C., Boulos S.: Getting rid of packets - efficient SIMD single-ray traversal using multi-branching BVHs. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008, pp. pp.49-57.Google ScholarGoogle ScholarCross RefCross Ref
  62. {WBMS05}¿ Williams A., Barrus S., Morley R. K., Shirley P.: An efficient and robust ray-box intersection algorithm. Journal of Graphics Tools Volume 10, Issue 1 2005, pp.49-54.Google ScholarGoogle ScholarCross RefCross Ref
  63. {WFWB13}¿ <sc>Woop</sc> S., Feng L., Wald I., Benthin C., : Embree ray tracing kernels for CPUs and the Xeon Phi architecture. In SIGGRAPH Talks 2013, p. pp.44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. {Whi80}¿ Whitted T.: An improved illumination model for shaded display. Communications of the ACM Volume 23, Issue 6 1980, pp.343-349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. {WSBW01}¿ Wald I., Slusallek P., Benthin C., Wagner M.: Interactive rendering with coherent ray tracing. Computer Graphics Forum EUROGRAPHICS '01 Volume 20, Issue 3 September 2001, pp.153-164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. {WWB*14}¿ Wald I., Woop S., Benthin C., Johnson G. S., Ernst M.: Embree-A kernel framework for efficient CPU ray tracing. In to appear ACM SIGGRAPH 2014 papers 2014, SIGGRAPH '14, ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access