skip to main content
research-article

Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU

Published:24 August 2018Publication History
Skip Abstract Section

Abstract

In this paper, we question the premise that graphics hardware uses a post-transform cache to avoid redundant vertex shader invocations. A large body of existing work on optimizing indexed triangle sets for rendering speed is based upon this widely-accepted assumption. We conclusively show that this assumption does not hold up on modern graphics hardware. We design and conduct experiments that demonstrate the behavior of current hardware of all major vendors to be inconsistent with the presence of a common post-transform cache. Our results strongly suggest that modern hardware rather relies on a batch-based approach, most likely for reasons of scalability. A more thorough investigation based on these initial experiments allows us to partially uncover the actual strategies implemented on graphics processors today. We reevaluate existing mesh optimization algorithms in light of these new findings and present a new mesh optimization algorithm designed from the ground up to target architectures that rely on batch-based vertex reuse. In an extensive evaluation, we measure and compare the real-world performance of various optimization algorithms on modern hardware. Our results show that some established algorithms still perform well. However, if the batching strategy of the target architecture is known, our approach can significantly outperform these previous state-of-the-art methods.

Skip Supplemental Material Section

Supplemental Material

References

  1. Joshua Barczak. 2016. Vertex Cache Measurement. http://www.joshbarczak.com/blog/?p=1231 Retrieved: June 4th, 2018.Google ScholarGoogle Scholar
  2. Jatin Chhugani and Subodh Kumar. 2007. Geometry Engine Optimization: Cache Friendly Compressed Representation of Geometry. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (I3D '07). ACM, New York, NY, USA, 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mike M. Chow. 1997. Optimized Geometry Compression for Real-time Rendering. In Proceedings of the 8th Conference on Visualization '97 (VIS '97). IEEE Computer Society Press, Los Alamitos, CA, USA, 347-ff. http://dl.acm.org/citation.cfm?id=266989.267103 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael Deering. 1995. Geometry Compression. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95). ACM, New York, NY, USA, 13--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Francine Evans, Steven Skiena, and Amitabh Varshney. 1996. Optimizing Triangle Strips for Fast Rendering. In Proceedings of the 7th Conference on Visualization '96 (VIS '96). IEEE Computer Society Press, Los Alamitos, CA, USA, 319--326. http://dl.acm.org/citation.cfm?id=244979.245626 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tom Forsyth. 2006. Linear-speed vertex cache optimisation. https://tomforsyth1000.github.io/papers/fast_vert_cache_opt.htmlGoogle ScholarGoogle Scholar
  7. Fabian Giesen. 2011. A trip through the Graphics Pipeline 2011. https://fgiesen.wordpress.com/2011/07/03/a-trip-through-the-graphics-pipeline-2011-part-3/ Retrieved: June 4th, 2018.Google ScholarGoogle Scholar
  8. Songfang Han and Pedro V. Sander. 2016. Triangle Reordering for Reduced Overdraw in Animated Scenes. In Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '16). ACM, New York, NY, USA, 23--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hugues Hoppe. 1999. Optimization of Mesh Locality for Transparent Vertex Caching. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 269--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Intel Corporation 2013. Developer's Guide for Intel® Processor Graphics For 4th Generation Intel® Core™ Processors. Intel Corporation.Google ScholarGoogle Scholar
  11. Martin Isenburg and Peter Lindstrom. 2005. Streaming meshes. In IEEE Visualization. 231- 238.Google ScholarGoogle Scholar
  12. Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele Paolo Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. CoRR abs/1804.06826 (2018). arXiv:1804.06826 http://arxiv.org/abs/1804.06826Google ScholarGoogle Scholar
  13. Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, and Markus Steinberger. 2018. On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing. Proc. ACM Comput. Graph. Interact. Tech. 1, 2, Article 28 (Aug. 2018), 17 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christoph Kubisch. 2015. Life of a triangle -- NVIDIA's logical pipeline. Technical Report. NVIDIA Corporation. https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipelineGoogle ScholarGoogle Scholar
  15. Gang Lin and Thomas P. Y. Yu. 2006. An improved vertex caching scheme for 3D mesh rendering. IEEE TVCG 12, 4 (July 2006), 640--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tim Purcell. 2010. Fast Tessellated Rendering on the Fermi GF100. In High Performance Graphics Conf., Hot 3D presentation.Google ScholarGoogle Scholar
  17. Guennadi Riguer. 2006. The Radeon X1000 Series Programming Guide.Google ScholarGoogle Scholar
  18. Pedro V. Sander, Diego Nehab, and Joshua Barczak. 2007. Fast Triangle Reordering for Vertex Locality and Reduced Overdraw. ACM Trans. Graph. 26, 3, Article 89 (July 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeremy W. Sheaffer, David Luebke, and Kevin Skadron. 2004. A Flexible Simulation Framework for Graphics Architectures. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware (HWWS '04). ACM, New York, NY, USA, 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Marc Tchiboukdjian, Vincent Danjean, and Bruno Raffin. 2008. A Fast Cache-Oblivious Mesh Layout with Theoretical Guarantees. In International Workshop on Super Visualization (IWSV'08). Kos, Greece. https://hal.inria.fr/inria-00436053Google ScholarGoogle Scholar
  21. Marc Tchiboukdjian, Vincent Danjean, and Bruno Raffin. 2010. Binary Mesh Partitioning for Cache-Efficient Visualization. IEEE TVCG 16, 5 (Sept 2010), 815--828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Huy T. Vo, Claudio T. Silva, Luiz F. Scheidegger, and Valerio Pascucci. 2012. Simple and Efficient Mesh Layout with Space-Filling Curves. Journal of Graphics Tools 16, 1 (2012), 25--39.Google ScholarGoogle ScholarCross RefCross Ref
  23. Po-Han Wang, Chia-Lin Yang, Yen-Ming Chen, and Yu-Jung Cheng. 2011. Power Gating Strategies on GPUs. ACM Trans. Archit. Code Optim. 8, 3, Article 13 (Oct. 2011), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sung-eui Yoon and Peter Lindstrom. 2007. Random-Accessible Compressed Triangle Meshes. IEEE TVCG 13, 6 (Nov 2007), 1536--1543. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Computer Graphics and Interactive Techniques
        Proceedings of the ACM on Computer Graphics and Interactive Techniques  Volume 1, Issue 2
        August 2018
        223 pages
        EISSN:2577-6193
        DOI:10.1145/3273023
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 August 2018
        Published in pacmcgit Volume 1, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!