skip to main content
10.1145/1807085.1807124acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Cache-oblivious hashing

Authors Info & Claims
Published:06 June 2010Publication History

ABSTRACT

The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t_q=1+1/2Ω(b) disk accesses for any load factor α bounded away from $1$. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases.

We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t_q = 1 + O(αb). Then we demonstrate that it is possible to obtain t_q = 1 + 1/2Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t_q=1+O(αb), which is exactly what linear probing achieves.

References

  1. P. Afshani, C. Hamilton, and N. Zeh. Cache-oblivious range reporting with optimal queries requires superlinear space. In Proc. Annual Symposium on Computational Geometry, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. A. Bender, G. S. Brodal, R. Fagerberg, D. Ge, S. He, H. Hu, J. Iacono, and A. López-Ortiz. The cost of cache-oblivious searching. In Proc. IEEE Symposium on Foundations of Computer Science, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. SIAM J. Comput., 35(2):341--358, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. S. Brodal and R. Fagerberg. On the limits of cache-obliviousness. In Proc. ACM Symposium on Theory of Computing, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Carter and M. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18:143--154, 1979.Google ScholarGoogle Scholar
  7. E. Demaine. Cache-oblivious algorithms and data structures. In EEF Summer School on Massive Datasets. Springer Verlag, 2002.Google ScholarGoogle Scholar
  8. R. Fagin, J. Nievergelt, N. Pippenger, and H. Strong. Extendible hashing--a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3):315--344, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. L. Fredman, J. Komlos, and E. Szemeredi. Storing a sparse table with o(1) worst-case access time. In Proc. 23rd Annu. IEEE Sympos. Found. Comput. Sci., pages 165--170, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. IEEE Symposium on Foundations of Computer Science, pages 285--298, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. H. Gonnet and P.-Å. Larson. External hashing with limited internal storage. Journal of the ACM, 35(1):161--184, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. He and Q. Luo. Cache-oblivious databases: Limitations and opportunities. ACM Transactions on Database Systems, 33(2), article 8, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. S. Jensen and R. Pagh. Optimality in external memory hashing. Algorithmica, 52(3):403--411, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading, MA, 1973.Google ScholarGoogle Scholar
  15. P.-Å. Larson. Dynamic hash tables. Communications of the ACM, 31(4):446--457, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P.-Å Larson. Linear hashing with separators--a dynamic hashing scheme achieving one-access retrieval. ACM Transactions on Database Systems, 3(3):366--388, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Litwin. Linear hashing: a new tool for file and table addressing. In Proc. International Conference on Very Large Data Bases, pages 212--223, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Mitzenmacher and S. Vadhan. Why simple hash functions work: Exploiting the entropy in a data stream. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Pagh, R. Pagh, and M. RuÇzic. Linear probing with constant independence. In Proc. ACM Symposium on Theory of Computing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Pagh and F. F. Rodler. Cuckoo hashing. Journal of Algorithms, 51:122--144, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Tenenbaum. Introduction to analytic and probabilistic number theory. Cambridge Univ Press, 1995.Google ScholarGoogle Scholar
  23. E. Verbin and Q. Zhang. The limits of buffering: A tight lower bound for dynamic membership in the external memory model. In Proc. ACM Symposium on Theory of Computing, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Wei, K. Yi, and Q. Zhang. Dynamic external hashing: The limit of buffering. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache-oblivious hashing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        June 2010
        350 pages
        ISBN:9781450300339
        DOI:10.1145/1807085

        Copyright © 2010 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 June 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate476of1,835submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!