skip to main content
10.1145/1989284.1989290acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Pan-private algorithms via statistics on sketches

Published:13 June 2011Publication History

ABSTRACT

Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious break-in or insider curiosity or by fiat by the government or law).

We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known pan-private algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.

References

  1. G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). TKDE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Dwork. Differential privacy in new settings. In SODA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of lp decoding. In STOC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Dwork, M. Naor, T. Pitassi, G. Rothblum, and S. Yekhanin. Pan-Private streaming algorithms. In ICS, 2010.Google ScholarGoogle Scholar
  8. C. Dwork, T. Pitassi, M. Naor, and G. Rothblum. Differential privacy under continual observation. In STOC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private combinatorial optimization. In SODA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. In Journal of ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Kane, J. Nelson, and D. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. McGregor, I. Mironov, T. Pitassi, O. Reingold, K. Talwar, and S. Vadhan. Limits of two-party differential privacy. In FOCS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Mcsherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. Mironov, O. Pandey, O. Reingold, and S. Vadhan. Computational differential privacy. In CRYPTO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, 2010. In progress, Chapter 1 online at academic2.american.edu/~jpnolan.Google ScholarGoogle Scholar
  16. S. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 1965.Google ScholarGoogle Scholar

Index Terms

  1. Pan-private algorithms via statistics on sketches

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
            June 2011
            332 pages
            ISBN:9781450306607
            DOI:10.1145/1989284

            Copyright © 2011 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 June 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate476of1,835submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!