Abstract
We propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases L1 hit rates and reduces off-chip memory energy substantially through better management of off-chip memory access patterns. To evaluate this model, we augment our architectural simulator with a detailed memory system simulation that includes accurate control, timing and power models for memory controllers and off-chip dynamic random-access memory . These details change the results significantly over previous simulations that used a simpler model of off-chip memory, indicating that this type of memory system simulation is important for realistic simulations that involve external memory. Secondly, we employ reconfigurable special-purpose pipelines that are constructed dynamically under program control. These pipelines use shared execution units that can be configured to support the common compute kernels that are the foundation of the ray tracing algorithm. This reduces the overhead incurred by on-chip memory and register accesses. These two synergistic features yield a ray tracing architecture that reduces energy by optimizing both on-chip and off-chip memory activity when compared to a more traditional approach.
- {AK10}¿ Aila T., Karras T.: Architecture considerations for tracing incoherent rays. In Proceedings of High Performance Graphics Saarbrucken, Germany, 2010. Google Scholar
Digital Library
- {BEL*07}¿ Boulos S., Edwards D., Lacewell J. D., Kniss J., Kautz J., Shirley P., Wald I.: Packet-based whitted and distribution ray tracing. In Proceedings of Graphics Interface Montreal, Quebec, Canada, 2007. Google Scholar
Digital Library
- {BFH12}¿ Brownlee C., Fogal T., Hansen C. D.: GLuRay: Enhanced ray tracing in existing scientific visualization applications using OpenGL interception. In Proceedings of EGPGV Cagliari, Italy, 2012, Eurographics, pp. pp.41-50.Google Scholar
- {BIH13}¿ Brownlee C., Ize T., Hansen C. D.: Image-parallel ray tracing using OpenGL interception. In Proceedings of EGPGV Girona, Spain, 2013, Eurographics, pp. pp.65-72. Google Scholar
Digital Library
- {BSP06}¿ Bigler J., Stephens A., Parker S. G.: Design for parallel interactive ray tracing systems. In Proceedings of Symposium on Interactive Ray Tracing IRT '06 Salt Lake City, UT, USA, 2006, pp. pp.187-196.Google Scholar
Cross Ref
- {BWB08}¿ Boulos S., Wald I., Benthin C.: Adaptive ray packet reordering. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008.Google Scholar
Cross Ref
- {CBS*12}¿ Chatterjee N., Balasubramonian R., Shevgoor M., Pugsley S., Udipi A., Shafiee A., Sudan K., Awasthi M., Chishti Z.: USIMM: the Utah SImulated Memory Module. Tech. Rep. UUCS-12-02, University of Utah, 2012. See also: "http://utaharch.blogspot.com/2012/02/usimm.html". Accessed 23 July 2014.Google Scholar
- {CLAL07}¿ Chang C.-H., Lohrmann P. J., Agu E. O., Lindeman R. W.: ENCORE: Energy-conscious rendering for mobile device. In Proceedings of GPGPU Boston, MA, USA, 2007.Google Scholar
- {CLF*03}¿ Christensen P. H., Laur D. M., Fong J., Wooten W. L., Batali D.: Ray differentials and multiresolution geometry caching for distribution ray tracing in complex scenes. In Proceedings of Eurographics 2003 Granada, Spain, 2003, pp. pp.543-552.Google Scholar
Cross Ref
- {Dal13}¿ Dally B.: The challenge of future high-performance computing. Celsius Lecture, Uppsala University, Uppsala, Sweden, 2013. "http://media.medfarm.uu.se/play/video/3261". Accessed 23 July 2014.Google Scholar
- {DHS04}¿ Dmitriev K., Havran V., Seidel H.-P.: Faster Ray Tracing with SIMD Shaft Culling. Tech. Rep. MPI-I-2004-4-006, Max-Planck-Institut für Informatik, December 2004.Google Scholar
- {DK00}¿ Dachille IXF., Kaufman A.: Gi-cube: An architecture for volumetric global illumination and rendering. In HWWS '00: Proceedings of ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware Interlaken, Switzerland, 2000, ACM, pp. pp.119-128. Google Scholar
Digital Library
- {GDS*08}¿ Govindaraju V., Djeu P., Sankaralingam K., Vernon M., Mark W. R.: Toward a multicore architecture for real-time ray-tracing. In Proceedings of IEEE/ACM Micro '08 Lake Como, Italy, 2008. Google Scholar
Digital Library
- {GPSS07}¿ Günther J., Popov S., Seidel H.-P., Slusallek P.: Realtime ray tracing on GPU with BVH-based packet traversal. In Proceedings of Symposium on Interactive Ray Tracing IRT '07 Ulm, Germany, 2007, pp. pp.113-118. Google Scholar
Digital Library
- {GR08}¿ Gribble C., Ramani K.: Coherent ray tracing via stream filtering. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008.Google Scholar
Cross Ref
- {HDW*11}¿ Hapala M., Davidovic T., Wald I., Havran V., Slusallek P.: Efficient stack-less BVH traversal for ray tracing. In Proceedings of 27th Spring Conference of Computer Graphics SCCG 2011 Vinicne, Slovak Republic, 2011, pp. pp.29-34. Google Scholar
Digital Library
- {HWR12}¿ HWRT: SimTRaX a cycle-accurate ray tracing architectural simulator and compiler. "http://code.google.com/p/simtrax/", 2012. Utah Hardware Ray Tracing Group. Accessed 23 July 2014.Google Scholar
- {IBH11}¿ Ize T., Brownlee C., Hansen C. D.: Real-time ray tracer for visualizing massive models on a cluster. In Proceedings of EGPGV Llandudno, UK, 2011, Eurographics, pp. pp.61-69. Google Scholar
Digital Library
- {Ima13}¿ Imagination Technologies: Caustic professional, 2013. "http://www.imgtec.com/caustic/". Accessed 23 July 2014.Google Scholar
- {IPD04}¿ Ibrahim A., Parker M., Davis A.: Energy efficient cluster co-processors. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing Montreal, Quebec, Canada, 2004.Google Scholar
- {JGDAM12}¿ Johnsson B., Ganestam P., Doggett M., Akenine-Möller T.: Power efficiency for software algorithms running on graphics processors. In EGGH-HPG '12: Proceedings of ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics Paris, France, 2012, Eurographics Association, pp. pp.67-75. Google Scholar
Digital Library
- {JNW10}¿ Jacob B., Ng S., Wang D.: Memory Systems: Cache, DRAM, Disk. Elsevier Science, Burlington, MA, USA, 2010. Google Scholar
Digital Library
- {Kaj86}¿ Kajiya J. T.: The rendering equation. In Proceedings of SIGGRAPH Dallas, TX, USA, 1986, pp. pp.143-150. Google Scholar
Digital Library
- {KJJ*09}¿ Kelm J. H., Johnson D. R., Johnson M. R., Crago N. C., Tuohy W., Mahesri A., Lumetta S. S., Frank M. I., Patel S. J.: Rigel: An architecture and scalable programming interface for a 1000-core accelerator. In Proceedings of ISCA '09 Austin TX, USA, 2009. Google Scholar
Digital Library
- {KKK12}¿ Kim H.-Y., Kim Y.-J., Kim L.-S.: MRTP: Mobile ray tracing processor with reconfigurable stream multi-processors for high datapath utilization. IEEE Journal of Solid-State Circuits Volume 47, Issue 2 February 2012, pp.518-535.Google Scholar
Cross Ref
- {KSBD10}¿ Kopta D., Spjut J., Brunvand E., Davis A.: Efficient MIMD architectures for high-performance ray tracing. In Proceedings of IEEE International Conference on Computer Design ICCD Amsterdam, The Netherlands, 2010.Google Scholar
Cross Ref
- {KSBP08}¿ Kopta D., Spujt J., Brunvand E., Parker S.: Comparing incoherent ray performance of TRaX vs. Manta. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008, p. 183.Google Scholar
Cross Ref
- {KSS*13}¿ Kopta D., Shkurko K., Spjut J., Brunvand E., Davis A.: An energy and bandwidth efficient ray tracing architecture. In Proceedings of High-Performance Graphics HPG 2013 Anaheim, CA, USA, 2013. Google Scholar
Digital Library
- {Lai10}¿ Laine S.: Restart trail for stackless BVH traversal. In HPG '10: Proceedings of High Performance Graphics Saarbrucken, Germany, 2010, Eurographics Association, pp. pp.107-111. Google Scholar
Digital Library
- {LSL*13}¿ Lee W.-J., Shin Y., Lee J., Kim J.-W., Nah J.-H., Jung S., Lee S., Park H.-S., Han T.-D.: SGRT: A mobile GPU architecture for real-time ray tracing. In Proceedings of the 5th High-Performance Graphics Conference Anaheim, CA, USA, 2013, ACM, pp. pp.109-119. Google Scholar
Digital Library
- {MBJ07}¿ Muralimanohar N., Balasubramonian R., Jouppi N.: Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of MICRO '07 Chicago, IL, USA, 2007, pp. pp.3-14. Google Scholar
Digital Library
- {MBK*10}¿ Moon B., Byun Y., Kim T.-J., Claudio P., Kim H.-S., Ban Y.-J., Nam S. W., Yoon S.-E.: Cache-oblivious ray reordering. ACM Transactions on Graphics Volume 29, Issue 3 July 2010, pp.28:1-28:10. Google Scholar
Digital Library
- {MDP04}¿ Mathew B., Davis A., Parker M.: A low power architecture for embedded perception processing. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Washington, DC, USA, 2004, pp. pp.46-56. Google Scholar
Digital Library
- {MLC06}¿ Mochocki B., Lahiri K., Cadambi S.: Power analysis of mobile 3D graphics. In DATE '06: Proceedings of Design, Automation and Test in Europe, 2006 Munich, Germany, 2006, vol. Volume 1, pp. pp.1 -6. Google Scholar
Digital Library
- {MMAM07}¿ Mansson E., Munkberg J., Akenine-Moller T.: Deep coherent ray tracing. In Proceedings of Symposium on Interactive Ray Tracing IRT '07 Ulm, Germany, 2007. Google Scholar
Digital Library
- {MSC12}¿ MSC: 2012 memory scheduling championship, 2012. "http://www.cs.utah.edu/rajeev/jwac12/". Accessed 23 July 2014.Google Scholar
- {MT97}¿ Möller T., Trumbore B.: Fast, minimum storage ray triangle intersection. Journal of Graphics Tools Volume 2, Issue 1 October 1997, pp.21-28. Google Scholar
Digital Library
- {NCQ13}¿ Nair P., Chou C.-C., Qureshi M. K.: A case for refresh pausing in DRAM memory systems. In HPCA '13: Proceedings of IEEE Symposium on High Performance Computer Architecture HPCA Washington, DC, USA, 2013, IEEE Computer Society, pp. pp.627-638. Google Scholar
Digital Library
- {NFLM07}¿ Navratil P., Fussell D., Lin C., Mark W.: Dynamic ray scheduling for improved system performance. In Proceedings of Symposium on Interactive Ray Tracing IRT '07 Ulm, Germany, 2007.Google Scholar
- {ORM08}¿ Overbeck R., Ramamoorthi R., Mark W. R.: Large ray packets for real-time whitted ray tracing. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008, pp. pp.41-48.Google Scholar
Cross Ref
- {PBD*10}¿ Parker S. G., Bigler J., Dietrich A., Friedrich H., Hoberock J., Luebke D., McAllister D., McGuire M., Morley K., Robison A., Stich M.: OptiX: A general purpose ray tracing engine. In ACM SIGGRAPH 2010 papers 2010, SIGGRAPH '10, ACM, pp. pp.66:1-66:13. Google Scholar
Digital Library
- {PH96}¿ Pharr M., Hanrahan P.: Geometry caching for ray-tracing displacement maps. In Proceedings of Eurographics Rendering Workshop Porto, Portugal, 1996, Springer, pp. pp.31-40. Google Scholar
Digital Library
- {PKGH97}¿ Pharr M., Kolb C., Gershbein R., Hanrahan P.: Rendering complex scenes with memory-coherent ray tracing. In Proceedings of SIGGRAPH '97 Los Angeles, CA, USA, 1997, pp. pp.101-108. Google Scholar
Digital Library
- {PLS10a}¿ Pool J., Lastra A., Singh M.: An energy model for graphics processing units. In Proceedings of 2010 IEEE International Conference on Computer Design ICCD Amsterdam, The Netherlands, 2010, pp. pp.409 -416.Google Scholar
Cross Ref
- {PLS10b}¿ Pool J., Lastra A., Singh M.: A per-unit breakdown of the energy consumption in a graphics processing unit. In Proceedings of International Conference on Computer Design ICCD Amsterdam, The Netherlands, 2010.Google Scholar
- {PLS11}¿ Pool J., Lastra A., Singh M.: Power-gated arithmetic circuits for energy-precision tradeoffs in mobile graphics processing units. Journal of Low Power Eletronic Design Volume 7, Issue 2 April 2011, 148-162.Google Scholar
Cross Ref
- {Ram12}¿ <sc>Ramani</sc> K., : CoGenE: An Automated Design Framework for Domain Specific Architectures. PhD thesis, University of Utah, 2012.Google Scholar
- {RD07}¿ Ramani K., Davis A.: Application driven embedded system design: A face recognition case study. In Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems CASES Salzburg, Austria, 2007. Google Scholar
Digital Library
- {RG09}¿ Ramani K., Gribble C.: StreamRay: A stream filtering architecture for coherent ray tracing. In Proceedings of ASPLOS '09 Washington, DC, USA, 2009. Google Scholar
Digital Library
- {RSH05}¿ Reshetov A., Soupikov A., Hurley J.: Multi-level ray tracing algorithm. ACM Transactions on Graphics SIGGRAPH '05 Volume 24, Issue 3 July 2005, pp.1176-1185. Google Scholar
Digital Library
- {SCL05}¿ Steinhurst J., Coombe G., Lastra A.: Reordering for cache conscious photon mapping. In GI '05: Proceedings of Graphics Interface 2005 Victoria, British Columbia, Canada, 2005, Canadian Human-Computer Communications Society, pp. pp.97-104. Google Scholar
Digital Library
- {SCS*08}¿ Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., Junkins S., Lake A., Sugerman J., Cavin R., Espasa R., Grochowski E., Juan T., Hanrahan P.: Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics Volume 27, Issue 3 August 2008, 18:1-18:15. Google Scholar
Digital Library
- {She13}¿ Shebanow M.: An evolution of mobile graphics. Keynote talk, HPG 2013, 2013. "http://highperformancegraphics.org/wp-content/uploads/Shebanow-Keynote.pdf". Accessed 23 July 2014.Google Scholar
- {Sil13}¿ Silicon Arts Coproration: RayCore Series 1000, 2013. "http://www.siliconarts.co.kr/gpu-ip". Accessed 23 July 2014.Google Scholar
- {SKBD12}¿ Spjut J., Kopta D., Brunvand E., Davis A.: A mobile accelerator architecture for ray tracing. In Proceedings of 3rd Workshop on SoCs, Heterogeneous Architectures and Workloads SHAW-3 New Orleans, LA, USA, 2012.Google Scholar
- {SKKB09}¿ Spjut J., Kensler A., Kopta D., Brunvand E.: TRaX: A multicore hardware architecture for real-time ray tracing. IEEE Transactions on Computer-Aided Design Volume 28, Issue 12 2009, pp.1802 - 1815. Google Scholar
Digital Library
- {Smi98}¿ Smits B.: Efficiency issues for ray tracing. Journal of Graphics Tools Volume 3, Issue 2 February 1998, pp.1-14. Google Scholar
Digital Library
- {SP10}¿ Silpa B., Panda P.: Introducing energy efficiency into graphics processors. In Proceedings of 2010 International Symposium on Electronic System Design ISED Bhubaneswar, India, 2010, p. 10. Google Scholar
Digital Library
- {SSKN07}¿ Shevtsov M., Soupikov A., Kapustin A., Novorod N.: Ray-triangle intersection algorithm for modern CPU architectures. In Procedings of GraphiCon'2007 Moscow, Russia, 2007.Google Scholar
- {Tsa09}¿ Tsakok J. A.: Faster incoherent rays: Multi-BVH ray stream tracing. In Proceedings of High Performance Graphics New Orleans, LA, USA, 2009, ACM, pp. pp.151-158. Google Scholar
Digital Library
- {WBB08}¿ Wald I., Benthin C., Boulos S.: Getting rid of packets - efficient SIMD single-ray traversal using multi-branching BVHs. In Proceedings of Symposium on Interactive Ray Tracing IRT '08 Los Angeles, CA, USA, 2008, pp. pp.49-57.Google Scholar
Cross Ref
- {WBMS05}¿ Williams A., Barrus S., Morley R. K., Shirley P.: An efficient and robust ray-box intersection algorithm. Journal of Graphics Tools Volume 10, Issue 1 2005, pp.49-54.Google Scholar
Cross Ref
- {WFWB13}¿ <sc>Woop</sc> S., Feng L., Wald I., Benthin C., : Embree ray tracing kernels for CPUs and the Xeon Phi architecture. In SIGGRAPH Talks 2013, p. pp.44. Google Scholar
Digital Library
- {Whi80}¿ Whitted T.: An improved illumination model for shaded display. Communications of the ACM Volume 23, Issue 6 1980, pp.343-349. Google Scholar
Digital Library
- {WSBW01}¿ Wald I., Slusallek P., Benthin C., Wagner M.: Interactive rendering with coherent ray tracing. Computer Graphics Forum EUROGRAPHICS '01 Volume 20, Issue 3 September 2001, pp.153-164.Google Scholar
Digital Library
- {WWB*14}¿ Wald I., Woop S., Benthin C., Johnson G. S., Ernst M.: Embree-A kernel framework for efficient CPU ray tracing. In to appear ACM SIGGRAPH 2014 papers 2014, SIGGRAPH '14, ACM. Google Scholar
Digital Library
Recommendations
HART: A Hybrid Architecture for Ray Tracing Animated Scenes
We present a hybrid architecture, inspired by asynchronous BVH construction [1], for ray tracing animated scenes. Our hybrid architecture utilizes heterogeneous hardware resources: dedicated ray-tracing hardware for BVH updates and ray traversal and a CPU ...
Hardware challenges for ray tracing and radiosity algorithms
EGGH'92: Proceedings of the Seventh Eurographics conference on Graphics HardwareComputer graphics algorithms and graphics hardware have mainly been developed along two lines: real-time display and realistic display. Real-time display has been achieved by developing dedicated hardware for projective, depth-buffer display algorithms. ...
Memory sharing for interactive ray tracing on clusters
Parallel graphics and visualizationWe present recent results in the application of distributed shared memory to image parallel ray tracing on clusters. Image parallel rendering is traditionally limited to scenes that are small enough to be replicated in the memory of each node, because ...




Comments