Abstract
We propose a method to reduce the footprint of compressed data by using modified virtual address translation to permit random access to the data. This extends our prior work on using page translation to perform automatic decompression and deswizzling upon accesses to fixed rate lossy or lossless compressed data.
Our compaction method allows a virtual address space the size of the uncompressed data to be used to efficiently access variable-size blocks of compressed data. Compression and decompression take place between the first and second level caches, which allows fast access to uncompressed data in the first level cache and provides data compaction at all other levels of the memory hierarchy. This improves performance and reduces power relative to compressed but uncompacted data.
An important property of our method is that compression, decompression, and reallocation are automatically managed by the new hardware without operating system intervention and without storing compression data in the page tables. As a result, although some changes are required in the page manager, it does not need to know the specific compression algorithm and can use a single memory allocation unit size.
We tested our method with two sample CPU algorithms. When performing depth buffer occlusion tests, our method reduces the memory footprint by 3.1x. When rendering into textures, our method reduces the footprint by 1.69x before rendering and 1.63x after. In both cases, the power and cycle time are better than for uncompacted compressed data, and significantly better than for accessing uncompressed data.
Supplemental Material
- ARM. 2017. Arm Frame Buffer Compression. https://developer.arm.com/architectures/media-architectures/afbcGoogle Scholar
- Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J Rossbach, and Onur Mutlu. 2018. Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors. ACM SIGOPS Operating Systems Review 52, 1 (2018), 27--44.Google Scholar
Digital Library
- Beeple. 2015. Cinema 4D Project Files. https://www.beeple-crap.com/resourcesGoogle Scholar
- Chris Brennan. 2016. Delta Color Compression Overview. https://gpuopen.com/dcc-overview/Google Scholar
- Raymond Chen. 2003. Why is address space allocation granularity 64K? https://devblogs.microsoft.com/oldnewthing/20031008-00/?p=42223Google Scholar
- Magnus Ekman and Per Stenstrom. 2005. A robust main-memory compression scheme. In ACM SIGARCH Computer Architecture News, Vol. 33. 74--85.Google Scholar
Digital Library
- Michael J Freedman. 2000. The compression cache: Virtual memory compression for handheld computers. (2000).Google Scholar
- Narayanan Ganapathy and Curt Schimmel. 1998. General Purpose Operating System Support for Multiple Page Sizes.. In USENIX Annual Technical Conference. 91--104.Google Scholar
- Martina K. Hartmeier. 2016. Software vs. GPU Rasterization in Chromium. https://software.intel.com/en-us/articles/software-vs-gpu-rasterization-in-chromiumGoogle Scholar
- Intel. 2016. OpenCL™ 2.0 Shared Virtual Memory Overview. https://software.intel.com/en-us/articles/opencl-20-shared-virtual-memory-overviewGoogle Scholar
- Konstantine I Iourcha, Krishna S Nayak, and Zhou Hong. 1999. System and method for fixed-rate block-based image compression with inferred pixel values. US Patent 5,956,431.Google Scholar
- Raghavendra Kanakagiri, Biswabandan Panda, and Madhu Mutyam. 2017. MBZip: Multiblock data compression. ACM Transactions on Architecture and Code Optimization (TACO) 14, 4 (2017), 1--29.Google Scholar
Digital Library
- Yousef A Khalidi, Madhusudhan Talluri, Michael N Nelson, and Dock Williams. 1993. Virtual memory support for multiple page sizes. In Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III. IEEE, 104--109.Google Scholar
Cross Ref
- Kiefer Kuah. 2016. Software Occlusion Culling. https://software.intel.com/content/www/us/en/develop/articles/software-occlusion-culling.htmlGoogle Scholar
- Didier Le Gall. 1991. MPEG: A video compression standard for multimedia applications. Commun. ACM 34, 4 (1991), 46--58.Google Scholar
Digital Library
- Morgan McGuire. 2017. Computer Graphics Archive. https://casual-effects.com/dataGoogle Scholar
- Microsoft. 2018. Large-Page Support. https://docs.microsoft.com/en-us/windows/win32/memory/large-page-supportGoogle Scholar
- Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2013. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 172--184.Google Scholar
Digital Library
- Mark E Russinovich and David A Solomon. 2004. Microsoft Windows Internals: Microsoft Windows Server (TM) 2003, Windows XP, and Windows 2000 (Pro-Developer). Microsoft Press.Google Scholar
- Larry Seiler, Daqi Lin, and Cem Yuksel. 2020. Automatic GPU Data Compression and Address Swizzling for CPUs via Modified Virtual Address Translation. In Symposium on Interactive 3D Graphics and Games (I3D 2020) (San Francisco, CA, USA). ACM Press, New York, NY, USA, 10. https://doi.org/10.1145/3384382.3384533Google Scholar
Digital Library
- Frederick G Walls and Alexander Sandy MacInnis. 2016. VESA display stream compression for television and cinema applications. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6, 4 (2016), 460--470.Google Scholar
Cross Ref
- Vinson Young, Sanjay Kariyappa, and Moinuddin K Qureshi. 2018. CRAM: Efficient Hardware-Based Memory Compression for Bandwidth Enhancement. arXiv preprint arXiv:1807.07685 (2018).Google Scholar
Index Terms
Compacted CPU/GPU Data Compression via Modified Virtual Address Translation
Recommendations
Automatic GPU Data Compression and Address Swizzling for CPUs via Modified Virtual Address Translation
I3D '20: Symposium on Interactive 3D Graphics and GamesWe describe how to modify hardware page translation to enable CPU software access to compressed and swizzled GPU data arrays as if they were decompressed and stored in row-major order. In a shared memory system, this allows CPU to directly access the ...
Conditional Entropy Coding of VQ Indexes for Image Compression
DCC '97: Proceedings of the Conference on Data CompressionVector quantization (VQ) is a source coding methodology with provable rate-distortion optimality. However, despite more than two decades of intensive research, VQ theoretical promise is yet to be fully realized in image compression practice. Restricted ...
Lossless Compression Using Efficient Encoding of Bitmasks
ISVLSI '09: Proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSILossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression ...






Comments