skip to main content
research-article

A Prefetching Scheme Exploiting both Data Layout and Access History on Disk

Published:01 August 2013Publication History
Skip Abstract Section

Abstract

Prefetching is an important technique for improving effective hard disk performance. A prefetcher seeks to accurately predict which data will be requested and load it ahead of the arrival of the corresponding requests. Current disk prefetch policies in major operating systems track access patterns at the level of file abstraction. While this is useful for exploiting application-level access patterns, for two reasons file-level prefetching cannot realize the full performance improvements achievable by prefetching. First, certain prefetch opportunities can only be detected by knowing the data layout on disk, such as the contiguous layout of file metadata or data from multiple files. Second, nonsequential access of disk data (requiring disk head movement) is much slower than sequential access, and the performance penalty for mis-prefetching a randomly located block, relative to that of a sequential block, is correspondingly greater.

To overcome the inherent limitations of prefetching at logical file level, we propose to perform prefetching directly at the level of disk layout, and in a portable way. Our technique, called DiskSeen, is intended to be supplementary to, and to work synergistically with, any present file-level prefetch policies. DiskSeen tracks the locations and access times of disk blocks and, based on analysis of their temporal and spatial relationships, seeks to improve the sequentiality of disk accesses and overall prefetching performance. It also implements a mechanism to minimize mis-prefetching, on a per-application basis, to mitigate the corresponding performance penalty.

Our implementation of the DiskSeen scheme in the Linux 2.6 kernel shows that it can significantly improve the effectiveness of prefetching, reducing execution times by 20%--60% for microbenchmarks and real applications such as grep, CVS, and TPC-H. Even for workloads specifically designed to expose its weaknesses, DiskSeen incurs only minor performance loss.

References

  1. Baek, S. H. and Park, K. H. 2008. Prefetching with adaptive cache culling for striped disk arrays. In Proceedings of the USENIX Annual Technical Conference (ATC’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Butt, A. R., Gniady, C., and Hu, Y. C. 2005. The performance impact of kernel prefetching on buffer cache replacement algorithms. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’05). 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cao, P., Felten, E. W., and Li, K. 1994. Application-controlled file caching policies. In Proceedings of the USENIX Summer Technical Conference (USTC’94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cao, P., Felten, E. W., Karlin, A. R., and Li, K. 1996. Implementation and performance of integrated application controlled file caching, prefetching, and disk scheduling. ACM Trans. Comput. Syst. 14, 4, 311--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chang, F. and Gibson, G. A. 1999. Automatic i/o hint generation through speculative execution. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI’99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, X. and Zhang, X. 2003. A popularity-based prediction model for web prefetching. IEEE Comput. 36, 3, 63--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Diaz, P. and Cintra, M. 2009. Stream chaining: Exploiting multiple levels of correlation in data prefetching. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ding, X., Jiang, S., Chen, F., Davis, K., and Zhang, X. 2007. DiskSeen: Exploiting disk layout and access history to enhance i/o prefetch. In Proceedings of the USENIX Annual Technical Conference (USENIX’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’99). 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Faser, K. and Chang, F. 2003. Operating system i/o speculation: How two invocations are faster than one. In Proceedings of the USENIX Annual Technical Conference (USENIX’03). 325--338.Google ScholarGoogle Scholar
  11. Ganger, G. R. and Kaashoek, M. F. 1997. Embedded inodes and explicit grouping: Exploiting disk bandwidth for small files. In Proceedings of USENIX Annual Technical Conference (USENIX’97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gill, B. S. and Bathen, L. A. D. 2007. AMP: Adaptive multi-stream prefetching in a shared cache. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Griffioen, J. and Appleton, R. 1994. Reducing file system latency using a predictive approach. In Proceedings of the USENIX Summer Technical Conference (USTC’94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Huang, H., Hung, W., and Shin, K. G. 2005. FS2: Dynamic data replication in free disk space for improving disk performance and energy consumption. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP’05). 263--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jiang, S., Ding, X., Chen, F., Tan, E., and Zhang, X. 2005. DULO: An effective buffer cache management scheme to exploit both temporal and spatial locality. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kroeger, T. M. and Long, D. D. E. 2001. Design and implementation of a predictive file prefetching algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX’01). 105--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Li, Z., Chen, Z., Srinivasan, S. M., and Zhou, Y. 2004. C-Miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04). 173--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Liang, S., Jiang, S., and Zhang, X. 2007. STEP: Sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In Proceedings of 27th IEEE International Conference on Distributed Computing Systems (ICDCS’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. LXR. 2013. Linux cross-reference. http://lxr.linux.no/.Google ScholarGoogle Scholar
  20. Mckusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S. 1984. A fast file system for unix. ACM Trans. Comput. Syst. 2, 3, 181--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mowry, T. C., Demke, A. K., and Krieger, O. 1996. Automatic compiler-inserted i/o prefetching for out-of-core applications. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI’96). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. MPI-IO. 2013. MPI-2: Extensions to the message-passing interface. http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.Google ScholarGoogle Scholar
  23. Pai, R., Pulavarty, B., and Cao, M. 2004. Linux 2.6 performance improvement through readahead optimization. In Proceedings of the Linux Symposium.Google ScholarGoogle Scholar
  24. Papathanasiou, A. E. and Scott, M. L. 2005. Aggressive prefetching: An idea whose time has come. In Proceedings of the 10th Workshop on Hot Topics in Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP’95). 79--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Schindler, J. and Ganger, G. R. 2000. Automated disk drive characterization. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’00). 112--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Schindler, J., Griffin, J. L., Lumb, C. R., and Ganger, G. R. 2002. Track-aligned extents: Matching access patterns to disk drive characteristics. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Schlosser, S. W., Schindler, J., Papadomanolakis, S., Shao, M., Ailamaki, A., Faloutsos, C., and Ganger, G. R. 2005. On multidimensional data and modern disks. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Schmuck, F. and Haskin, R. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Smith, A. J. 1978. Sequentiality and prefetching in database systems. ACM Trans. Database Syst. 3, 3, 223--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tomkins, A., Patterson, R. H., and Gibson, G. 1997. Informed multi-process prefetching and caching. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’97). 100--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vogels, W. 1999. File system usage in windows nt 4.0. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP’99). 93--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. WebStone. 2013. WebStone --- The benchmark for web servers. http://www.mindcraft.com/benchmarks/webstone/.Google ScholarGoogle Scholar
  34. Xu, Y. and Jiang, S. 2011. A scheduling framework that makes any disk schedulers non-work-conserving solely based on request characteristics. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhang, X., Davis, K., and Jiang, S. 2010. IOrchestrator: Improving the performance of multi-node i/o systems via inter-server coordination. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’10). 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Prefetching Scheme Exploiting both Data Layout and Access History on Disk

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 9, Issue 3
        August 2013
        97 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/2501620
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 August 2013
        • Revised: 1 April 2013
        • Accepted: 1 April 2013
        • Received: 1 September 2012
        Published in tos Volume 9, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!