Abstract
In this paper, we question the premise that graphics hardware uses a post-transform cache to avoid redundant vertex shader invocations. A large body of existing work on optimizing indexed triangle sets for rendering speed is based upon this widely-accepted assumption. We conclusively show that this assumption does not hold up on modern graphics hardware. We design and conduct experiments that demonstrate the behavior of current hardware of all major vendors to be inconsistent with the presence of a common post-transform cache. Our results strongly suggest that modern hardware rather relies on a batch-based approach, most likely for reasons of scalability. A more thorough investigation based on these initial experiments allows us to partially uncover the actual strategies implemented on graphics processors today. We reevaluate existing mesh optimization algorithms in light of these new findings and present a new mesh optimization algorithm designed from the ground up to target architectures that rely on batch-based vertex reuse. In an extensive evaluation, we measure and compare the real-world performance of various optimization algorithms on modern hardware. Our results show that some established algorithms still perform well. However, if the batching strategy of the target architecture is known, our approach can significantly outperform these previous state-of-the-art methods.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU
- Joshua Barczak. 2016. Vertex Cache Measurement. http://www.joshbarczak.com/blog/?p=1231 Retrieved: June 4th, 2018.Google Scholar
- Jatin Chhugani and Subodh Kumar. 2007. Geometry Engine Optimization: Cache Friendly Compressed Representation of Geometry. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (I3D '07). ACM, New York, NY, USA, 9--16. Google Scholar
Digital Library
- Mike M. Chow. 1997. Optimized Geometry Compression for Real-time Rendering. In Proceedings of the 8th Conference on Visualization '97 (VIS '97). IEEE Computer Society Press, Los Alamitos, CA, USA, 347-ff. http://dl.acm.org/citation.cfm?id=266989.267103 Google Scholar
Digital Library
- Michael Deering. 1995. Geometry Compression. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95). ACM, New York, NY, USA, 13--20. Google Scholar
Digital Library
- Francine Evans, Steven Skiena, and Amitabh Varshney. 1996. Optimizing Triangle Strips for Fast Rendering. In Proceedings of the 7th Conference on Visualization '96 (VIS '96). IEEE Computer Society Press, Los Alamitos, CA, USA, 319--326. http://dl.acm.org/citation.cfm?id=244979.245626 Google Scholar
Digital Library
- Tom Forsyth. 2006. Linear-speed vertex cache optimisation. https://tomforsyth1000.github.io/papers/fast_vert_cache_opt.htmlGoogle Scholar
- Fabian Giesen. 2011. A trip through the Graphics Pipeline 2011. https://fgiesen.wordpress.com/2011/07/03/a-trip-through-the-graphics-pipeline-2011-part-3/ Retrieved: June 4th, 2018.Google Scholar
- Songfang Han and Pedro V. Sander. 2016. Triangle Reordering for Reduced Overdraw in Animated Scenes. In Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '16). ACM, New York, NY, USA, 23--27. Google Scholar
Digital Library
- Hugues Hoppe. 1999. Optimization of Mesh Locality for Transparent Vertex Caching. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 269--276. Google Scholar
Digital Library
- Intel Corporation 2013. Developer's Guide for Intel® Processor Graphics For 4th Generation Intel® Core™ Processors. Intel Corporation.Google Scholar
- Martin Isenburg and Peter Lindstrom. 2005. Streaming meshes. In IEEE Visualization. 231- 238.Google Scholar
- Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele Paolo Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs/1804.06826 (2018). arXiv:1804.06826 http://arxiv.org/abs/1804.06826Google Scholar
- Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, and Markus Steinberger. 2018. On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing. Proc. ACM Comput. Graph. Interact. Tech. 1, 2, Article 28 (Aug. 2018), 17 pages. Google Scholar
Digital Library
- Christoph Kubisch. 2015. Life of a triangle -- NVIDIA's logical pipeline. Technical Report. NVIDIA Corporation. https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipelineGoogle Scholar
- Gang Lin and Thomas P. Y. Yu. 2006. An improved vertex caching scheme for 3D mesh rendering. IEEE TVCG 12, 4 (July 2006), 640--648. Google Scholar
Digital Library
- Tim Purcell. 2010. Fast Tessellated Rendering on the Fermi GF100. In High Performance Graphics Conf., Hot 3D presentation.Google Scholar
- Guennadi Riguer. 2006. The Radeon X1000 Series Programming Guide.Google Scholar
- Pedro V. Sander, Diego Nehab, and Joshua Barczak. 2007. Fast Triangle Reordering for Vertex Locality and Reduced Overdraw. ACM Trans. Graph. 26, 3, Article 89 (July 2007). Google Scholar
Digital Library
- Jeremy W. Sheaffer, David Luebke, and Kevin Skadron. 2004. A Flexible Simulation Framework for Graphics Architectures. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware (HWWS '04). ACM, New York, NY, USA, 85--94. Google Scholar
Digital Library
- Marc Tchiboukdjian, Vincent Danjean, and Bruno Raffin. 2008. A Fast Cache-Oblivious Mesh Layout with Theoretical Guarantees. In International Workshop on Super Visualization (IWSV'08). Kos, Greece. https://hal.inria.fr/inria-00436053Google Scholar
- Marc Tchiboukdjian, Vincent Danjean, and Bruno Raffin. 2010. Binary Mesh Partitioning for Cache-Efficient Visualization. IEEE TVCG 16, 5 (Sept 2010), 815--828. Google Scholar
Digital Library
- Huy T. Vo, Claudio T. Silva, Luiz F. Scheidegger, and Valerio Pascucci. 2012. Simple and Efficient Mesh Layout with Space-Filling Curves. Journal of Graphics Tools 16, 1 (2012), 25--39.Google Scholar
Cross Ref
- Po-Han Wang, Chia-Lin Yang, Yen-Ming Chen, and Yu-Jung Cheng. 2011. Power Gating Strategies on GPUs. ACM Trans. Archit. Code Optim. 8, 3, Article 13 (Oct. 2011), 25 pages. Google Scholar
Digital Library
- Sung-eui Yoon and Peter Lindstrom. 2007. Random-Accessible Compressed Triangle Meshes. IEEE TVCG 13, 6 (Nov 2007), 1536--1543. Google Scholar
Digital Library
Index Terms
Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU
Recommendations
On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing
Due to its flexibility, compute mode is becoming more and more attractive as a way to implement many of the algorithms part of a state-of-the-art rendering pipeline. A key problem commonly encountered in graphics applications is streaming vertex and ...
Use of hardware Z-buffered rasterization to accelerate ray tracing
SAC '07: Proceedings of the 2007 ACM symposium on Applied computingRay tracing is a rendering technique for producing realistic 3D computer graphics. Compared to traditional scan-line rendering which is generally adopted by graphics pipeline, ray tracing can simulate more realistic global illumination, however, with ...
GREEN Cache: Exploiting the Disciplined Memory Model of OpenCL on GPUs
As various graphics processing unit architectures are deployed across broad computing spectrum from a hand-held or embedded device to a high-performance computing server, OpenCL becomes the de facto standard programming environment for general-purpose ...






Comments