skip to main content
research-article

On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

Authors Info & Claims
Published:24 August 2018Publication History
Skip Abstract Section

Abstract

Due to its flexibility, compute mode is becoming more and more attractive as a way to implement many of the algorithms part of a state-of-the-art rendering pipeline. A key problem commonly encountered in graphics applications is streaming vertex and geometry processing. In a typical triangle mesh, the same vertex is on average referenced six times. To avoid redundant computation during rendering, a post-transform cache is traditionally employed to reuse vertex processing results. However, such a vertex cache can generally not be implemented efficiently in software and does not scale well as parallelism increases. We explore alternative strategies for reusing per-vertex results on-the-fly during massively-parallel software geometry processing. Given an input stream divided into batches, we analyze the effectiveness of sorting, hashing, and intra-thread-group communication for identifying and exploiting local reuse potential. We design and present four vertex reuse strategies tailored to modern GPU architectures. We demonstrate that, in a variety of applications, these strategies not only achieve effective reuse of vertex processing results, but can boost performance by up to 2-3x compared to a naïve approach. Curiously, our experiments also show that our batch-based approaches exhibit behavior similar to the OpenGL implementation on current graphics hardware.

Skip Supplemental Material Section

Supplemental Material

References

  1. Jatin Chhugani and Subodh Kumar. 2007. Geometry Engine Optimization: Cache Friendly Compressed Representation of Geometry. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games (I3D '07). ACM, New York, NY, USA, 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mike M. Chow. 1997. Optimized Geometry Compression for Real-time Rendering. In Proceedings of the 8th Conference on Visualization '97 (VIS '97). IEEE Computer Society Press, Los Alamitos, CA, USA, 347-ff. http://dl.acm.org/citation.cfm?id=266989.267103 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jonathan Cohen, Amitabh Varshney, Dinesh Manocha, Greg Turk, Hans Weber, Pankaj Agarwal, Frederick Brooks, and William Wright. 1996. Simplification Envelopes. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96). ACM, New York, NY, USA, 119--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael Deering. 1995. Geometry Compression. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95). ACM, New York, NY, USA, 13--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Matthew Eldridge, Homan Igehy, and Pat Hanrahan. 2000. Pomegranate: A Fully Scalable Graphics Architecture. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 443--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Francine Evans, Steven Skiena, and Amitabh Varshney. 1996. Optimizing Triangle Strips for Fast Rendering. In Proceedings of the 7th Conference on Visualization '96 (VIS '96). IEEE Computer Society Press, Los Alamitos, CA, USA, 319--326. http://dl.acm.org/citation.cfm?id=244979.245626 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tom Forsyth. 2006. Linear-speed vertex cache optimisation.Google ScholarGoogle Scholar
  8. Ulrich Haar and Sebastian Aaltonen. 2015. GPU-Driven Rendering Pipelines. SIGGRAPH 2015: Advances in Real-time Rendering in Games Talk.Google ScholarGoogle Scholar
  9. Hugues Hoppe. 1999. Optimization of Mesh Locality for Transparent Vertex Caching. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 269--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael Kenzel, Bernhard Kerbl, Dieter Schmalstieg, and Markus Steinberger. 2018. A High-Performance Software Graphics Pipeline Architecture for the GPU. ACM Trans. Graph. 37, 4, Article 140 (Nov. 2018), 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bernhard Kerbl, Michael Kenzel, Elena Ivanchenko, Dieter Schmalstieg, and Markus Steinberger. 2018. Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU. Proc. ACM Comput. Graph. Interact. Tech. 1, 2, Article 29 (Aug. 2018), 16 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jon M Kleinberg. 2000. Navigation in a small world. Nature 406, 6798 (2000), 845.Google ScholarGoogle Scholar
  13. Christoph Kubisch. 2015. Life of a triangle -- NVIDIA's logical pipeline. Technical Report. NVIDIA Corporation. https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipelineGoogle ScholarGoogle Scholar
  14. Christoph Kubisch and Pierre Boudier. 2016. GPU-Driven Rendering. GTC Talk.Google ScholarGoogle Scholar
  15. Samuli Laine and Tero Karras. 2011. High-performance Software Rasterization on GPUs. In Proc. High Performance Graphics (HPG '11). 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Lin and T. P. Y. Yu. 2006. An improved vertex caching scheme for 3D mesh rendering. IEEE Transactions on Visualization and Computer Graphics 12, 4 (July 2006), 640--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fang Liu, Meng-Cheng Huang, Xue-Hui Liu, and En-Hua Wu. 2010. FreePipe: A Programmable Parallel Rendering Architecture for Efficient Multi-fragment Effects. In Proc. I3D (I3D '10). 75--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Charles Loop. 1987. Smooth Subdivision Surfaces Based on Triangles. Ph.D. Dissertation.Google ScholarGoogle Scholar
  19. Steven Molnar, Michael Cox, David Ellsworth, and Henry Fuchs. 1994. A Sorting Classification of Parallel Rendering. IEEE Comput. Graph. Appl. 14, 4 (July 1994), 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. NVIDIA. 2016. CUDA C Programming Guide. NVIDIA Corporation.Google ScholarGoogle Scholar
  21. Anjul Patney, Stanley Tzeng, Kerry A. Seitz, Jr., and John D. Owens. 2015. Piko: A Framework for Authoring Programmable Graphics Pipelines. ACM Trans. Graph. 34, 4, Article 147 (July 2015), 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Karl Pearson. 1905. The problem of the random walk. Nature 72, 1867 (1905), 342.Google ScholarGoogle Scholar
  23. Tim Purcell. 2010. Fast Tessellated Rendering on the Fermi GF100. In High Performance Graphics Conf., Hot 3D presentation. Guennadi Riguer. 2006. The Radeon X1000 Series Programming Guide.Google ScholarGoogle Scholar
  24. Pedro V. Sander, Diego Nehab, and Joshua Barczak. 2007. Fast Triangle Reordering for Vertex Locality and Reduced Overdraw. ACM Trans. Graph. 26, 3, Article 89 (July 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Martin Sattlecker and Markus Steinberger. 2015. Reyes Rendering on the GPU. In Proceedings of the 31st Spring Conference on Computer Graphics (SCCG '15). ACM, New York, NY, USA, 31--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jeremy W. Sheaffer, David Luebke, and Kevin Skadron. 2004. A Flexible Simulation Framework for Graphics Architectures. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware (HWWS '04). ACM, New York, NY, USA, 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Markus Steinberger, Bernhard Kainz, Bernhard Kerbl, Stefan Hauswiesner, Michael Kenzel, and Dieter Schmalstieg. 2012. Softshell: Dynamic Scheduling on GPUs. ACM Trans. Graph. 31, 6, Article 161 (Nov. 2012), 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, and Dieter Schmalstieg. 2014. Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU. ACM Trans. Graph. 33, 6, Article 228 (Nov. 2014), 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Po-Han Wang, Chia-Lin Yang, Yen-Ming Chen, and Yu-Jung Cheng. 2011. Power Gating Strategies on GPUs. ACM Trans. Archit. Code Optim. 8, 3, Article 13 (Oct. 2011), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Graham Wihlidal. 2016. Optimizing the Graphics Pipeline with Compute. GDC Talk.Google ScholarGoogle Scholar
  31. Kun Zhou, Xin Huang, Weiwei Xu, Baining Guo, and Heung-Yeung Shum. 2007. Direct Manipulation of Subdivision Surfaces on GPUs. ACM Trans. Graph. 26, 3, Article 91 (July 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Computer Graphics and Interactive Techniques
        Proceedings of the ACM on Computer Graphics and Interactive Techniques  Volume 1, Issue 2
        August 2018
        223 pages
        EISSN:2577-6193
        DOI:10.1145/3273023
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 August 2018
        Published in pacmcgit Volume 1, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!