skip to main content
10.1145/1559795.1559815acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Similarity caching

Published:29 June 2009Publication History

ABSTRACT

We introduce the similarity caching problem, a variant of classical caching in which an algorithm can return an element from the cache that is similar, but not necessarily identical, to the query element. We are motivated by buffer management questions in approximate nearest-neighbor applications, especially in the context of caching targeted advertisements on the web. Formally, we assume the queries lie in a metric space, with distance function d(.,.). A query p is considered a cache hit if there is a point q in the cache that is sufficiently close to p, i.e., for a threshold radius r, we have d(p,q) ≤ r. The goal is then to minimize the number of cache misses, vis-à-vis the optimal algorithm. As with classical caching, we use the competitive ratio to measure the performance of different algorithms.

While similarity caching is a strict generalization of classical caching, we show that unless the algorithm is allowed extra power (either in the size of the cache or the threshold r) over the optimal offline algorithm, the problem is intractable. We then proceed to quantify the hardness as a function of the complexity of the underlying metric space. We show that the problem becomes easier as we proceed from general metric spaces to those of bounded doubling dimension, and to Euclidean metrics. Finally, we investigate several extensions of the problem: dependence of the threshold r on the query and a smoother trade-off between the cache-miss cost and the query-query similarity.

References

  1. ]]K.S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbor" meaningful? In Proc. 7th International Conference on Database Theory, pages 217--235, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ]]A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ]]M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Incremental clustering and dynamic information retrieval. SIAM Journal on Computing, 33:1417--1440, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ]]M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In Proc. 35th Annual ACM Symposium on Theory of Computing, pages 30--39, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ]]H.T. Chou and D.J. DeWitt. An evaluation of buffer management strategies for relational database systems. Algorithmica, 1(3):311--336, 1986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ]]M. Chrobak and L.L. Larmore. Metrical service systems: Deterministic strategies. Manuscript.Google ScholarGoogle Scholar
  7. ]]W. Effelsberg and T. Härder. Principles of database buffer management. ACM Transactions on Database Systems, 9(4):560--595, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. ]]F. Falchi, C. Lucchese, S. Orlando, R. Perego, and F. Rabitti. A Metric Cache for Similarity Search. In Proc. 6th Workshop on Large-Scale Distributed Systems for Information Retrieval, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ]]T. Feder and D.H. Greene. Optimal algorithms for approximate clustering. In Proc. 20th Annual ACM Symposium on Theory of Computing, pages 434--444, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. ]]E. Feuerstein. Uniform service system with k-server. In Proc. 3rd Latin American Symposium on Theoretical Informatics, pages 23--32, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. ]]M. Flammini and G. Nicosia. On multicriteria online problems. In Proc. 8th Annual European Symposium on Algorithms, pages 191--201, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. ]]A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proc. 25th International Conference on Very Large Data Bases, pages 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. ]]T.F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  14. ]]P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symposium on the Theory of Computing, pages 604--613, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ]]S. Irani and S.S. Seiden. Randomized algorithms for metrical task systems. Theoretical Computer Science, 194(1-2):163--182, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. ]]R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proc. 12th International Conference on World Wide Web, pages 19--28, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. ]]S. Pandey, A. Broder, F. Chierichetti, V. Josifovski, R. Kumar, and S. Vassilvitskii. Nearest-neighbor caching for content-match applications. In Proc. 18th International Conference on World Wide Web, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ]]R. Raz and S. Safra. A sub-constant error-probability low-degree test, and sub-constant error-probability PCP characterization of NP. In Proc. 29th Annual ACM Symposium on Theory of Computing, pages 475--484, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. ]]G.M. Sacco and M. Schkolnick. Buffer management in relational database systems. ACM Transactions on Database Systems, 11(4):473--498, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ]]A. Sharp. Thoughts on the competitive ratio. Manuscript.Google ScholarGoogle Scholar
  21. ]]A. Silberschatz and P.B. Galvin. Operating System Concepts. John Wiley & Sons, Inc., New York, NY, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. ]]D.D. Sleator and R.E. Tarjan. Amortized efficiency of list update and paging rules. Communications of the ACM, 28:202--208, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. ]]M. Stonebraker. Operating system support for database management. Communications of the ACM, 24(7):412--418, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Similarity caching

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
            June 2009
            298 pages
            ISBN:9781605585536
            DOI:10.1145/1559795
            • General Chair:
            • Jan Paredaens,
            • Program Chair:
            • Jianwen Su

            Copyright © 2009 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 June 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate476of1,835submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!