Abstract
In this work, we examine how transparent block-level compression in the I/O path can improve both the space efficiency and performance of online storage. We present ZBD, a block-layer driver that transparently compresses and decompresses data as they flow between the file-system and storage devices. Our system provides support for variable-size blocks, metadata caching, and persistence, as well as block allocation and cleanup. ZBD targets maintaining high performance, by mitigating compression and decompression overheads that can have a significant impact on performance by leveraging modern multicore CPUs through explicit work scheduling. We present two case-studies for compression. First, we examine how our approach can be used to increase the capacity of SSD-based caches, thus increasing their cost-effectiveness. Then, we examine how ZBD can improve the efficiency of online disk-based storage systems.
We evaluate our approach in the Linux kernel on a commodity server with multicore CPUs, using PostMark, SPECsfs2008, TPC-C, and TPC-H. Preliminary results show that transparent online block-level compression is a viable option for improving effective storage capacity, it can improve I/O performance up to 80% by reducing I/O traffic and seek distance, and has a negative impact on performance, up to 34%, only when single-thread I/O latency is critical. In particular, for SSD-based caching, our results indicate that, in line with current technology trends, compressed caching trades off CPU utilization for performance and enhances SSD efficiency as a storage cache up to 99%.
- Adaptec, Inc. 2009. MaxIQ SSD cache performance. White paper. www.adaptec.com/en-US/products/CloudComputing/-MaxIQ/SSD-Cache-Performance/index.htm.Google Scholar
- Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J. D., Manasse, M., and Panigrahy, R. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70. Google Scholar
Digital Library
- Aleph One Ltd, Embedded Debian. 2002. Yaffs: A NAND-Flash Filesystem.Google Scholar
- Appel, A. W. and Li, K. 1991. Virtual memory primitives for user programs. SIGPLAN Notes 26, 4, 96--107. Google Scholar
Digital Library
- Ayers, L. 1997. E2compr: Transparent file compression for Linux. http://e2compr.sourceforge.net/.Google Scholar
- Bobbarjung, D. R., Jagannathan, S., and Dubnicki, C. 2006. Improving duplicate elimination in storage systems. Trans. Storage 2, 4, 424--448. Google Scholar
Digital Library
- Burrows, M., Jerian, C., Lampson, B., and Mann, T. 1992. On-line data compression in a log-structured file system. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’92). ACM, New York, 2--9. Google Scholar
Digital Library
- Cate, V. and Gross, T. 1991. Combining the concepts of compression and Caching for two-level filesystem. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’91). ACM, New York, 200--211. Google Scholar
Digital Library
- Coffing, C. and Brown, J. H. 1997. A survey of modern file compression techniques. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.9847.Google Scholar
- Cormack, G. V. 1985. Data compression on a database system. Comm. ACM 28, 12, 1336--1342. Google Scholar
Digital Library
- Deutsch, L. P. and Gailly, J.-L. 1996. ZLIB Compressed Data Format Specification version 3.3. Internet RFC 1950. Google Scholar
Digital Library
- Dirik, C. and Jacob, B. 2009. The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization. In Proceedings of the ISCA’09. ACM, 279--289. Google Scholar
Digital Library
- Douglis, F. 1992. On the role of compression in distributed systems. In Proceedings of the ACM SIGOPS, EW 5. 1--6. Google Scholar
Digital Library
- Douglis, F. 1993. The compression cache: Using on-line compression to extend physical memory. In Proceedings of the Winter USENIX Conference. 519--529.Google Scholar
- Engel, J. and Mertens, R. 2006. LogFS - finally a scalable flash file system. http://lazybastard.org/ joern/logfs1.pdf.Google Scholar
- Fusion-io. 2007. Fusion-IO’s solid state storage: A new standard for enterprise-class reliability. http://www.fusionio.com.Google Scholar
- Gupta, N. 2010. Compcache: Compressed in-memory swap device for Linux. http://code.google.com/p/compcache.Google Scholar
- Katcher, J. 1997. PostMark: A new file system benchmark. http:// www.netapp.com/ tech_library/3022.html.Google Scholar
- Kgil, T. and Trevor, M. 2006. Flashcache: A NAND flash memory file cache for low power web servers. In Proceedings of the CASES’06. ACM, 103--112. Google Scholar
Digital Library
- Kim, H. and Ahn, S. 2008. BPLRU: A buffer management scheme for improving random writes in flash storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). USENIX Association, Berkeley, CA, 1--14. Google Scholar
Digital Library
- Lee, S.-W., Moon, B., Park, C., Kim, J.-M., and Kim, S.-W. 2008. A case for flash memory SSD in enterprise database applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, 1075--1086. Google Scholar
Digital Library
- Lelewer, D. A. and Hirschberg, D. S. 1987. Data compression. ACM Comput. Surv. 19, 3, 261--296. Google Scholar
Digital Library
- Leventhal, A. 2008. Flash storage memory. Comm. ACM 51, 7, 47--51. Google Scholar
Digital Library
- Lougher, P. and Lougher, R. 2008. SquashFS. http://squashfs.sourceforge.net.Google Scholar
- Makatos, T., Klonatos, Y., Marazakis, M., Flouris, M. D., and Bilas, A. 2010a. Using transparent compression to improve SSD-based I/O caches. In Proceedings of the 5th European Conference on Computer Systems (EuroSys’10). ACM, New York, NY, 1--14. Google Scholar
Digital Library
- Makatos, T., Klonatos, Y., Marazakis, M., Flouris, M. D., and Bilas, A. 2010b. ZBD: Using transparent compression at the block level to increase storage space efficiency. In Proceedings of the IEEE International Workshop on Storage Network Architecture and Parallel I/Os. 61--70. Google Scholar
Digital Library
- Manber, U. 1994. Finding similar files in a large file system. In Proceedings of the USENIX Winter 1994 Technical Conference (WTEC’94). USENIX Association, 2--2. Google Scholar
Digital Library
- Meisner, D., Gold, B. T., and Wenisch, T. F. 2009. POWERNAP: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). ACM, New York, 205--216. Google Scholar
Digital Library
- Microsoft Corporation. 2008. Understanding NTFS Compression. http://blogs.msdn.com/ntdebugging/archive/2008/05/20/-understanding-ntfs-compression.aspx.Google Scholar
- Microsoft Corporation. 2009. Best practices for NTFS compression in Windows. support.microsoft.com/default.aspx?scid=kb;en-us;Q251186.Google Scholar
- Microsoft Corporation. 2010. Explore the features: Windows ReadyBoost. www.microsoft.com/windows/windows-vista/features/readyboost.aspx.Google Scholar
- Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., and Rowstron, A. 2009. Migrating server storage to SSDS: Analysis of tradeoffs. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys’09). ACM, New York, 145--158. Google Scholar
Digital Library
- Ng, W. K. and Ravishankar, C. V. 1997. Block-oriented compression techniques for large statistical databases. IEEE Trans. Knowl. Data Eng. 9, 2, 314--328. Google Scholar
Digital Library
- North American Systems International, Inc. FalconStor HotZone - Maximize the performance of your SAN. http://www.nasi.com/hotZone.php.Google Scholar
- Oberhumer, M. F. X. J. 2008. LZO--A real-time data compression library. http://www.oberhumer.com/opensource/lzo/.Google Scholar
- Oracle Corporation and Sun Microsystems, Inc. 2009. Oracle Solaris ZFS. http://www.oracle.com/us/products/servers-storage/storage/storage-software/031857.htm.Google Scholar
- Poess, M. and Potapov, D. 2003. Data compression in oracle. In Proceedings of the 29th VLDB Conference. Google Scholar
Digital Library
- Rajimwale, A., Prabhakaran, V., and Davis J. D. 2009. Block management in solid-state devices. In Proceedings of the USENIX Annual Technical Conference. Google Scholar
Digital Library
- Rizzo, L. 1997. A very fast algorithm for RAM compression. SIGOPS Oper. Syst. Rev. 31, 2, 36--45. Google Scholar
Digital Library
- Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google Scholar
Digital Library
- Russel, P. 2002. The compressed loopback device. http://www.knoppix.net/wiki/Cloop.Google Scholar
- Savage, S. 2006. CBD compressed block device, new embedded block device. http://lwn.net/Articles/168725.Google Scholar
- Smith, M. E. G. and Storer, J. A. 1985. Parallel algorithms for data compression. J. ACM 32, 2, 344--373. Google Scholar
Digital Library
- SPEC. 2008a. SPECsfs2008: SPEC’s benchmark designed to evaluate the speed and request-handling capabilities of file servers utilizing the NFSv3 and CIFS protocols. http://www.spec.org/sfs2008/.Google Scholar
- SPEC. 2008b. SPECsfs2008_cifs published results, as of Nov-10-2009. http://www.spec.org/sfs2008/results/-sfs2008.html.Google Scholar
- SPEC. 2009. SPECmail2009 published results, as of Nov-06-2009. http://www.spec.org/mail2009/results/-specmail_ent2009.html.Google Scholar
- Svoboda, M. 2010. FuseCompress, a mountable Linux file system which transparently compress its content. http://miio.net/wordpress/projects/fusecompress/.Google Scholar
- Thomas, C. and Wong, M. 2007. Database Test 2 (DBT-2), an OLTP transactional performance test. http://osdldbt.sourceforge.net/.Google Scholar
- TPC. 1997. Overview of the TPC benchmark C: The order-entry benchmark. http://www.tpc.org/tpcc/default.asp.Google Scholar
- TPC. 2009a. Top ten non-clustered TPC-H published results by performance. http://tpc.org/tpch/results/tpch_perf_results.asp?resulttype=noncluster.Google Scholar
- TPC. 2009b. TPC-H: An ad-hoc, decision support benchmark. www.tpc.org/tpch.Google Scholar
- Welch, T. A. 1984. A technique for high-performance data compression. IEEE Computer 17, 6, 8--19. Google Scholar
Digital Library
- Wilson, P. R., Kaplan, S. F., and Smaragdakis, Y. 1999. The case for compressed caching in virtual memory systems. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, 101--116. Google Scholar
Digital Library
- Woodhouse, D. 2001. JFFS: The Journalling Flash File System. http://www.csie.nctu.edu.tw/~ijsung/documents/jffs2.pdf.Google Scholar
- Yang, L., Dick, R. P., Lekatsas, H., and Chakradhar, S. 2005. Crames: Compressed ram for embedded systems. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). ACM, New York, 93--98. Google Scholar
Digital Library
- Zhu, B., Li, K., and Patterson, H. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). USENIX Association, Berkeley, CA, 1--14. Google Scholar
Digital Library
- Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337--343. Google Scholar
Digital Library
Index Terms
Transparent Online Storage Compression at the Block-Level
Recommendations
Using transparent compression to improve SSD-based I/O caches
EuroSys '10: Proceedings of the 5th European conference on Computer systemsFlash-based solid state drives (SSDs) offer superior performance over hard disks for many workloads. A prominent use of SSDs in modern storage systems is to use these devices as a cache in the I/O path. In this work, we examine how transparent, online I/...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
LSM-Tree Managed Storage for Large-Scale Key-Value Store
Key-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend block storage, and persisting their clustered data through a block manager, usually a file system. In general, a file system is expected to not only ...






Comments