Editorial Notes
The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected Version of Record was published on March 28, 2022. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this citation page.
Abstract
In this article, we introduce L2C, a hybrid lossy/lossless compression scheme applicable both to the memory subsystem and I/O traffic of a processor chip. L2C employs general-purpose lossless compression and combines it with state-of-the-art lossy compression to achieve compression ratios up to 16:1 and to improve the utilization of chip’s bandwidth resources. Compressing memory traffic yields lower memory access time, improving system performance, and energy efficiency. Compressing I/O traffic offers several benefits for resource-constrained systems, including more efficient storage and networking. We evaluate L2C as a memory compressor in simulation with a set of approximation-tolerant applications. L2C improves baseline execution time by an average of 50% and total system energy consumption by 16%. Compared to the lossy and lossless current state-of-the-art memory compression approaches, L2C improves execution time by 9% and 26%, respectively, and reduces system energy costs by 3% and 5%, respectively. I/O compression efficacy is evaluated using a set of real-life datasets. L2C achieves compression ratios of up to 10.4:1 for a single dataset and on average about 4:1, while introducing no more than 0.4% error.
Supplemental Material
Available for Download
Version of Record for "L2C: Combining Lossy and Lossless Compression on Memory and I/O" by Eldstål-Ahrens et al., ACM Transactions on Embedded Computing Systems, Volume 21, No. 1 (TECS 21:1).
- [1] . 2012. The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze the Future 2007, 2012 (2012), 1–16.Google Scholar
- [2] . 2015. Pim-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In ISCA. ACM/IEEE, 336–348. Google Scholar
Digital Library
- [3] . 2016. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA. ACM/IEEE, 105–117. Google Scholar
Digital Library
- [4] . 2011. On the memory system requirements of future scientific applications: Four case-studies. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’11). 159–170. Google Scholar
Digital Library
- [5] . 2009. Scaling the bandwidth wall: Challenges in and Avenues for CMP Scaling. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09).371–382.Google Scholar
Digital Library
- [6] . 2015. Nvidia tegra x1: Nvidia’s New Mobile Superchip. Whitepaper.Google Scholar
- [7] . 2012. Texture caches. MICRO 32, 3 (2012), 136–141. Google Scholar
Digital Library
- [8] . 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). ACM/IEEE, 329–340. Google Scholar
Digital Library
- [9] . 2019. AVR: Reducing memory traffic with approximate value reconstruction. In Proceedings of the 48th International Conference on Parallel Processing. 1–10. Google Scholar
Digital Library
- [10] . 2020. MemSZ: Squeezing Memory Traffic with Lossy Compression. ACM Trans. Archit. Code Optim. 17, 4 (2020), 40:1–40:25.Google Scholar
Digital Library
- [11] . 2004. Frequent pattern compression: A significance-based compression scheme for L2 caches. University of Wisconsin-Madison Department of Computer Sciences, Tech. Rep.Google Scholar
- [12] . 2010. C-pack: A. High-performance microprocessor cache compression algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18, 8 (2010) 1196–1208. Google Scholar
Digital Library
- [13] . 2012. Base-delta-immediate compression. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 377–388. Google Scholar
Digital Library
- [14] . 2014. SC2: a statistical compression cache scheme. SIGARCH Comput. Archit. News 42, 3 (2014), 145–156.
DOI: https://doi.org/10.1145/2678373.2665696 Google ScholarDigital Library
- [15] . 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods. In MICRO. IEEE, 38–49. Google Scholar
Digital Library
- [16] . 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). 1–9. Google Scholar
Digital Library
- [17] . 1997. Lossy Compression of Scientific Data via Wavelets and Vector Quantization. Ph.D. Dissertation, University of Washington, 1997.Google Scholar
- [18] . 2017. A hybrid data compression scheme for power reduction in wireless sensors for IoT. IEEE Trans. Biomed. Circ. Syst. 11, 2 (2017), 245–254.Google Scholar
Cross Ref
- [19] . 2020. Data compression accelerator on IBM POWER9 and z15 processors : Industrial product. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). IEEE, 1–14. Google Scholar
Digital Library
- [20] . 2016. Fast error-bounded lossy HPC data compression with SZ. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). IEEE, 730–739.Google Scholar
Cross Ref
- [21] . 2002. An adaptive data compression scheme for memory traffic minimization in processor-based systems. In ISCAS, vol. 4. IEEE, IV–IV.Google Scholar
- [22] . 2011. Decoupled zero-compressed memory. Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC’11). ACM, 77–86. Google Scholar
Digital Library
- [23] . 2017. Transparent dual memory compression architecture. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT’17). 206–218.Google Scholar
Cross Ref
- [24] . 2017. Odd-ecc: on-demand dram error correcting codes. In Proceedings of the International Symposium on Memory Systems. 96–111. Google Scholar
Digital Library
- [25] . 2013. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In MICRO. IEEE, 172–184.Google Scholar
Digital Library
- [26] . 2014. Memzip: Exploring unconventional benefits from memory compression. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 638–649.Google Scholar
Cross Ref
- [27] . 2017. Mbzip. ACM Trans. Archit. Code Optim. 14, 4 (2017), 1–29. Google Scholar
Digital Library
- [28] . 2018. Attaché: Towards ideal memory compression by mitigating metadata bandwidth overheads. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’18). IEEE, 326–338. Google Scholar
Digital Library
- [29] . 2018. Compresso: Pragmatic main memory compression. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 546–558. Google Scholar
Digital Library
- [30] . 2014. Adaptive lossless entropy compressors for tiny iot devices. IEEE Trans. Wireless Commun. 13, 2 (2014), 1088–1100.Google Scholar
Cross Ref
- [31] . 2018. An efficient lossless compression algorithm for electrocardiogram signals. In Proceedings of the 26th European Signal Processing Conference (EUSIPCO’18). 777–781.Google Scholar
Cross Ref
- [32] . 2019. Generalized deduplication: Lossless compression for large amounts of small iot data. In European Wireless 2019; 25th European Wireless Conference. 1–5.Google Scholar
- [33] . 2018. Sprintz: Time series compression for the internet of things. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3 (Sep. 2018). [Online]. Available: https://doi.org/10.1145/3264903 Google Scholar
Digital Library
- [34] . 2017. Data compression for energy efficient iot solutions. In Proceedings of the 25th Telecommunication Forum (TELFOR’17). 1–4.Google Scholar
Cross Ref
- [35] . 2020. Performance evaluation of data compression algorithms for iot-based smart water network management applications. Journal of Applied Science & Process Engineering. 7, 2 (2020), 554–563.Google Scholar
Cross Ref
- [36] . 2017. Lossy compression on iot big data by exploiting spatiotemporal correlation. In Proceedings of the IEEE High Performance Extreme Computing Conference. 1–7.Google Scholar
Cross Ref
- [37] . 2019. CORAD: Correlation-aware compression of massive time series using sparse dictionary coding. In Proceedings of the IEEE International Conference on Big Data (Big Data’19). 2289–2298.Google Scholar
Cross Ref
- [38] . 2015. Adaptive sensor data compression in iot systems: Sensor data analytics based approach. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 5515–5519.Google Scholar
- [39] . 2015. IoT data compression: Sensor-agnostic approach. In Proceedings of the Data Compression Conference. IEEE, 303–312. Google Scholar
Digital Library
- [40] . 2013. Neural acceleration for general-purpose approximate programs. In Micro. IEEE, 449–460. Google Scholar
Digital Library
- [41] . 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Trans. Comput. 54, 7 (2005), 922–927. Google Scholar
Digital Library
- [42] . 2010. Relax: An architectural framework for software recovery of hardware faults. In ISCA. ACM/IEEE, 497–508. Google Scholar
Digital Library
- [43] . 2017. Approximate storage of compressed and encrypted videos. In ASPLOS. 361–373. Google Scholar
Digital Library
- [44] . 2014. Approximate storage in solid-state memories. ACM Trans. Comput. Syst. 32, 3 (2014), 9. Google Scholar
Digital Library
- [45] . 2014. Load value approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 127–139. Google Scholar
Digital Library
- [46] . 2014. Rollback-free value prediction with approximate loads. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. 493–494. Google Scholar
Digital Library
- [47] . 2016. RFVP: Rollback-free value prediction with safe-to-approximate loads. ACM Trans. Archit. Code Optim. 12, 4 (2016), 62. Google Scholar
Digital Library
- [48] . 2016. Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 1–13. Google Scholar
Digital Library
- [49] . 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 325–334. Google Scholar
Digital Library
- [50] . 2017. A framework for automated and controlled floating-point accuracy reduction in graphics applications on gpus. ACM Trans. Archit. Code Optim. 14, 4 (2017), 1–25. Google Scholar
Digital Library
- [51] . 2020. A GPU register file using static data compression. In Proceedings of the 49th International Conference on Parallel Processing (ICPP’20). ACM.Google Scholar
Digital Library
- [52] . 2016. Proteus: Exploiting numerical precision variability in deep neural networks. In Proceedings of the 2016 International Conference on Supercomputing. ACM, 1–12. Google Scholar
Digital Library
- [53] . 2020. Approximate memory compression. IEEE Trans. VLSI Syst. 28, 4 (2020), 980–991.Google Scholar
Cross Ref
- [54] . 2019. SLC: Memory access granularity aware selective lossy compression for GPUs. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’19). IEEE, 1184–1189.Google Scholar
Cross Ref
- [55] . 2015. Doppelganger: A cache for approximate computing. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 50–61. Google Scholar
Digital Library
- [56] . 2011. EnerJ: Approximate data types for safe and general low-power computation. ACM SIGPLAN Notices 46, 6 (2011), 164–174. Google Scholar
Digital Library
- [57] . 2005. A robust main-memory compression scheme. SIGARCH Comput. Archit. News 33 (2005), 74–85. Google Scholar
Digital Library
- [58] . 1994. Decoupled sectored caches: Conciliating low tag implementation cost and low miss ratio. In Proceedings of 21 International Symposium on Computer Architecture. ACM/IEEE, 384–393. Google Scholar
Digital Library
- [59] . 2005. Pin: building customized program analysis tools with dynamic instrumentation. In ACM SIGPLAN Notices, vol. 40. 190–200. Google Scholar
Digital Library
- [60] . 2010. Interval simulation: Raising the level of abstraction in architectural simulation. The Sixteenth International Symposium on High-performance Computer Architecture (HPCA’10). 1–12.Google Scholar
- [61] . 2011. Dramsim2: A cycle accurate memory system simulator. IEEE Comput. Archit. Lett. 10, 1 (2011), 16–19. Google Scholar
Digital Library
- [62] . 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). IEEE, 469–480.Google Scholar
Digital Library
- [63] . 2008. Cacti 6.0: A tool to model large caches. HP lab., vol. 27. 22–31.Google Scholar
- [64] . 2004. Parallel Programming in C with MPI and OpenMP. McGraw-Hill Inc. Google Scholar
Digital Library
- [65] . 2003. Minimal entropic kinetic models for hydrodynamics. Europhys. Lett. 63, 6 (2003), 798–804.Google Scholar
Cross Ref
- [66] . 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (2006), 1–17. 10.1145/1186736.1186737 Google Scholar
Digital Library
- [67] . 2018. Flash4 user’s guide. [Online]. Available http://flash.uchicago.edu/site/flashcode/user_support/flash4_ug_4p6.pdf.Google Scholar
- [68] . 2018. 1D K-Means, Open Source. [Online]. Available https://github.com/eldstal/kmeans.Google Scholar
- [69] . 2016. Swedish topological survey hdb 50+ västra götaland, zone 63_3. [Online]. Available: https://www.lantmateriet.se/.Google Scholar
- [70] . 2019. JPL SMAP Level 3 CAP Sea Surface Salinity Standard Mapped Image 8-Day Running Mean v4.3 Validated Dataset. Available https://doi.org/10.5067/smp43-3tpcs.Google Scholar
- [71] . 1998. Skyview: The multi-wavelength sky on the Internet. In Proceedings of the Symposium of the International Astronomical Union, Vol. 179. 465–466.
DOI: https://doi.org/10.1017/s0074180900129316Google ScholarCross Ref
- [72] . 2020. STRÅNG - a mesoscale model for solar radiation. [Online]. Available http://strang.smhi.se/.Google Scholar
- [73] . 2018. Latent factors limiting the performance of semg-interfaces. Sensors 18, 4 (2018), 1122.Google Scholar
Cross Ref
- [74] . 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101, 23 (2000), e215–e220.Google Scholar
Cross Ref
- [75] . 2016. Electricity, water, and natural gas consumption of a residential house in canada from 2012 to 2014. Sci. Data 3, 160037 (2016), 1–12.Google Scholar
- [76] . 2008. On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens. Actuat. B 129, 2 (2008), 750–757.Google Scholar
Cross Ref
- [77] . 2015. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sens. Actuat. B 215 (2015), 618–629.Google Scholar
Cross Ref
- [78] . 2015. Condition monitoring of a complex hydraulic system using multivariate statistics. In Proceedings of the IEEE International Instrumentation and Measurement Technology Conference (I2MTC’15). IEEE, 210–215.Google Scholar
Cross Ref
- [79] . 2017. Axbench: A multiplatform benchmark suite for approximate computing. IEEE Des. Test 34, 2 (2017), 60–68.
DOI: https://doi.org/10.1109/mdat.2016.2630270Google ScholarCross Ref
Index Terms
L2C: Combining Lossy and Lossless Compression on Memory and I/O
Recommendations
MemSZ: Squeezing Memory Traffic with Lossy Compression
This article describes Memory Squeeze (MemSZ), a new approach for lossy general-purpose memory compression. MemSZ introduces a low latency, parallel design of the Squeeze (SZ) algorithm offering aggressive compression ratios, up to 16:1 in our ...
Temporal Lossless and Lossy Compression in Wireless Sensor Networks
Energy efficiency is one of the most critical issues in the design and deployment of Wireless Sensor Networks (WSNs). Data compression is an important approach to reducing energy consumption of data gathering in multihop sensor networks. Existing ...
Lossless and lossy compression of text images by soft pattern matching
DCC '96: Proceedings of the Conference on Data CompressionWe present a method for both lossless and lossy compression of bilevel images that consist mostly of printed or typed text. The key feature of the method is soft pattern matching, a way of making use of the information in previously encountered ...






Comments