skip to main content
research-article

Compacted CPU/GPU Data Compression via Modified Virtual Address Translation

Published:26 August 2020Publication History
Skip Abstract Section

Abstract

We propose a method to reduce the footprint of compressed data by using modified virtual address translation to permit random access to the data. This extends our prior work on using page translation to perform automatic decompression and deswizzling upon accesses to fixed rate lossy or lossless compressed data.

Our compaction method allows a virtual address space the size of the uncompressed data to be used to efficiently access variable-size blocks of compressed data. Compression and decompression take place between the first and second level caches, which allows fast access to uncompressed data in the first level cache and provides data compaction at all other levels of the memory hierarchy. This improves performance and reduces power relative to compressed but uncompacted data.

An important property of our method is that compression, decompression, and reallocation are automatically managed by the new hardware without operating system intervention and without storing compression data in the page tables. As a result, although some changes are required in the page manager, it does not need to know the specific compression algorithm and can use a single memory allocation unit size.

We tested our method with two sample CPU algorithms. When performing depth buffer occlusion tests, our method reduces the memory footprint by 3.1x. When rendering into textures, our method reduces the footprint by 1.69x before rendering and 1.63x after. In both cases, the power and cycle time are better than for uncompacted compressed data, and significantly better than for accessing uncompressed data.

Skip Supplemental Material Section

Supplemental Material

3406177.mp4

Presentation Video

References

  1. ARM. 2017. Arm Frame Buffer Compression. https://developer.arm.com/architectures/media-architectures/afbcGoogle ScholarGoogle Scholar
  2. Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J Rossbach, and Onur Mutlu. 2018. Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors. ACM SIGOPS Operating Systems Review 52, 1 (2018), 27--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Beeple. 2015. Cinema 4D Project Files. https://www.beeple-crap.com/resourcesGoogle ScholarGoogle Scholar
  4. Chris Brennan. 2016. Delta Color Compression Overview. https://gpuopen.com/dcc-overview/Google ScholarGoogle Scholar
  5. Raymond Chen. 2003. Why is address space allocation granularity 64K? https://devblogs.microsoft.com/oldnewthing/20031008-00/?p=42223Google ScholarGoogle Scholar
  6. Magnus Ekman and Per Stenstrom. 2005. A robust main-memory compression scheme. In ACM SIGARCH Computer Architecture News, Vol. 33. 74--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael J Freedman. 2000. The compression cache: Virtual memory compression for handheld computers. (2000).Google ScholarGoogle Scholar
  8. Narayanan Ganapathy and Curt Schimmel. 1998. General Purpose Operating System Support for Multiple Page Sizes.. In USENIX Annual Technical Conference. 91--104.Google ScholarGoogle Scholar
  9. Martina K. Hartmeier. 2016. Software vs. GPU Rasterization in Chromium. https://software.intel.com/en-us/articles/software-vs-gpu-rasterization-in-chromiumGoogle ScholarGoogle Scholar
  10. Intel. 2016. OpenCL™ 2.0 Shared Virtual Memory Overview. https://software.intel.com/en-us/articles/opencl-20-shared-virtual-memory-overviewGoogle ScholarGoogle Scholar
  11. Konstantine I Iourcha, Krishna S Nayak, and Zhou Hong. 1999. System and method for fixed-rate block-based image compression with inferred pixel values. US Patent 5,956,431.Google ScholarGoogle Scholar
  12. Raghavendra Kanakagiri, Biswabandan Panda, and Madhu Mutyam. 2017. MBZip: Multiblock data compression. ACM Transactions on Architecture and Code Optimization (TACO) 14, 4 (2017), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yousef A Khalidi, Madhusudhan Talluri, Michael N Nelson, and Dock Williams. 1993. Virtual memory support for multiple page sizes. In Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III. IEEE, 104--109.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kiefer Kuah. 2016. Software Occlusion Culling. https://software.intel.com/content/www/us/en/develop/articles/software-occlusion-culling.htmlGoogle ScholarGoogle Scholar
  15. Didier Le Gall. 1991. MPEG: A video compression standard for multimedia applications. Commun. ACM 34, 4 (1991), 46--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Morgan McGuire. 2017. Computer Graphics Archive. https://casual-effects.com/dataGoogle ScholarGoogle Scholar
  17. Microsoft. 2018. Large-Page Support. https://docs.microsoft.com/en-us/windows/win32/memory/large-page-supportGoogle ScholarGoogle Scholar
  18. Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2013. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 172--184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mark E Russinovich and David A Solomon. 2004. Microsoft Windows Internals: Microsoft Windows Server (TM) 2003, Windows XP, and Windows 2000 (Pro-Developer). Microsoft Press.Google ScholarGoogle Scholar
  20. Larry Seiler, Daqi Lin, and Cem Yuksel. 2020. Automatic GPU Data Compression and Address Swizzling for CPUs via Modified Virtual Address Translation. In Symposium on Interactive 3D Graphics and Games (I3D 2020) (San Francisco, CA, USA). ACM Press, New York, NY, USA, 10. https://doi.org/10.1145/3384382.3384533Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Frederick G Walls and Alexander Sandy MacInnis. 2016. VESA display stream compression for television and cinema applications. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6, 4 (2016), 460--470.Google ScholarGoogle ScholarCross RefCross Ref
  22. Vinson Young, Sanjay Kariyappa, and Moinuddin K Qureshi. 2018. CRAM: Efficient Hardware-Based Memory Compression for Bandwidth Enhancement. arXiv preprint arXiv:1807.07685 (2018).Google ScholarGoogle Scholar

Index Terms

  1. Compacted CPU/GPU Data Compression via Modified Virtual Address Translation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!