Abstract
Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the compressor can easily exploit ample block-level parallelism, it is much more difficult to extract such coarse-grain parallelism from the decompressor because a block boundary cannot be located until decompression of the previous block is completed. This paper presents novel algorithms to efficiently predict block boundaries and a runtime system that enables efficient block-level parallel decompression, called SDM. The SDM execution model features speculative pipelining with three stages: Scanner, Decompressor, and Merger. The scanner stage employs a high-confidence prediction algorithm that finds compressed block boundaries without fully decompressing individual blocks. This information is communicated to the parallel decompressor stage in which multiple blocks are decompressed in parallel. The decompressed blocks are merged in order by the merger stage to produce the final output. The SDM runtime is specialized to execute this pipeline correctly and efficiently on resource-constrained embedded platforms. With SDM we effectively parallelize three production-grade variable-length decompression algorithms?zlib, bzip2, and H.264?with maximum speedups of 2.50× and 8.53× (and geometric mean speedups of 1.96× and 4.04×) on 4-core and 36-core embedded platforms, respectively.
- bzip2 and libbzip. http://bzip2.org/.Google Scholar
- gzip homepage. http://www.gzip.org/.Google Scholar
- H.264: Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264/.Google Scholar
- JPEG homepage. http://www.jpeg.org/jpeg/.Google Scholar
- The Linux Information Project. http://linfo.org/.Google Scholar
- Mozilla Developer Network. https://developer.mozilla.org/.Google Scholar
- Parallel bzip2. http://compression.ca/pbzip2/.Google Scholar
- A parallel implementation of gzip. http://zlib.net/pigz/}.Google Scholar
- Portable Network Graphics. http://www.libpng.org/pub/png/.Google Scholar
- Samsung Exynos 4 Quad. http://www.samsung.com/exynos/.Google Scholar
- The Linux Kernel Archives. http://www.kernel.org/.Google Scholar
- Tilera TILE-Gx processor family. http://www.tilera.com/.Google Scholar
- Vorbis audio compression. http://xiph.org/vorbis/.Google Scholar
- YUV CIF reference videos. http://trace.eas.asu.edu/yuv/}.Google Scholar
- zlib: A massively spiffy yet delicately unobtrusive compression library. http://zlib.net/.Google Scholar
- A. Bilas, J. Fritts, and J. P. Singh. Real-time parallel MPEG-2 decoding in software. In Proc. of IPPS, 1997. Google Scholar
Digital Library
- M. T. Biskup. Guaranteed synchronization of Huffman codes. In Proc. of Data Compression Conference (DCC), 2008. Google Scholar
Digital Library
- A. Gurhanli, C. C.-P. Chen, and S.-H. Hung. Coarse grain parallelization of H.264 video decoder and memory bottleneck in multi-core architectures. International Journal of Computer Theory and Engineering, 2011.Google Scholar
- S. T. Klein and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. Computer Journal, 2003.Google Scholar
Cross Ref
- P. P. C. Lee, T. Bu, and G. Chandranmenon. A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring. In Proc. of IPDPS, 2010.Google Scholar
Cross Ref
- W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In Proc. of PPoPP, 2006. Google Scholar
Digital Library
- J. Mankin, D. Kaeli, and J. Ardini. Software transactional memory for multicore embedded systems. In Proc. of LCTES, 2009. Google Scholar
Digital Library
- P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In Proc. of ISCA, 1999.Google Scholar
Cross Ref
- J. Nikara, S. Vassiliadis, J. Takala, M. Sima, and P. Liuha. Parallel multiple-symbol variable-length decoding. In Proc. of ICCD, 2002. Google Scholar
Digital Library
- A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proc. of ASPLOS, 2010. Google Scholar
Digital Library
- E. Raman, N. Vachharajani, R. Rangan, and D. I. August. Spice: speculative parallel iteration chunk execution. In Proc. of CGO, 2008. Google Scholar
Digital Library
- Standard Performance Evaluation Corporation. http://www.spec.org/.Google Scholar
- J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002. Google Scholar
Digital Library
- C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In Proc. of ISMM, 2010. Google Scholar
Digital Library
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proc. of MICRO, 2008. Google Scholar
Digital Library
- Z. Zhao, B. Wu, and X. She. Speculative parallelization needs rigor: Probabilistic analysis for optimal speculation of finite state machine applications. In Proc. of PACT, 2012. Google Scholar
Digital Library
- C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of MICRO, 2002. Google Scholar
Digital Library
- J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor., 23(3):337--343, Sept. 2006. Google Scholar
Digital Library
Index Terms
Practical speculative parallelization of variable-length decompression algorithms
Recommendations
Practical speculative parallelization of variable-length decompression algorithms
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsVariable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the ...
Practical speculative parallelization of variable-length decompression algorithms
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsVariable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the ...
Codec Design for Variable-Length to Fixed-Length Data Conversion for H.263
IIH-MSP '06: Proceedings of the 2006 International Conference on Intelligent Information Hiding and MultimediaA codec (encoder-decoder) design for interfacing variable-length and fixed-length data conversion is proposed in this paper. The poor memory efficiency of the variable-length compression approach can be avoided while its advantages can be preserved. The ...







Comments