ABSTRACT
Consider fully dynamic data, where we track data as it gets inserted and deleted. There are well developed notions of private data analyses with dynamic data, for example, using differential privacy. We want to go beyond privacy, and consider privacy together with security, formulated recently as pan-privacy by Dwork et al. (ICS 2010). Informally, pan-privacy preserves differential privacy while computing desired statistics on the data, even if the internal memory of the algorithm is compromised (say, by a malicious break-in or insider curiosity or by fiat by the government or law).
We study pan-private algorithms for basic analyses, like estimating distinct count, moments, and heavy hitter count, with fully dynamic data. We present the first known pan-private algorithms for these problems in the fully dynamic model. Our algorithms rely on sketching techniques popular in streaming: in some cases, we add suitable noise to a previously known sketch, using a novel approach of calibrating noise to the underlying problem structure and the projection matrix of the sketch; in other cases, we maintain certain statistics on sketches; in yet others, we define novel sketches. We also present the first known lower bounds explicitly for pan privacy, showing our results to be nearly optimal for these problems. Our lower bounds are stronger than those implied by differential privacy or dynamic data streaming alone and hold even if unbounded memory and/or unbounded processing time are allowed. The lower bounds use a noisy decoding argument and exploit a connection between pan-private algorithms and data sanitization.
- G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). TKDE, 2003. Google Scholar
Digital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 2005. Google Scholar
Digital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003. Google Scholar
Digital Library
- C. Dwork. Differential privacy in new settings. In SODA, 2010. Google Scholar
Digital Library
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google Scholar
Digital Library
- C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of lp decoding. In STOC, 2007. Google Scholar
Digital Library
- C. Dwork, M. Naor, T. Pitassi, G. Rothblum, and S. Yekhanin. Pan-Private streaming algorithms. In ICS, 2010.Google Scholar
- C. Dwork, T. Pitassi, M. Naor, and G. Rothblum. Differential privacy under continual observation. In STOC, 2010. Google Scholar
Digital Library
- A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private combinatorial optimization. In SODA, 2010. Google Scholar
Digital Library
- P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. In Journal of ACM, 2006. Google Scholar
Digital Library
- D. Kane, J. Nelson, and D. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, 2010. Google Scholar
Digital Library
- A. McGregor, I. Mironov, T. Pitassi, O. Reingold, K. Talwar, and S. Vadhan. Limits of two-party differential privacy. In FOCS, 2010. Google Scholar
Digital Library
- F. Mcsherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Google Scholar
Digital Library
- I. Mironov, O. Pandey, O. Reingold, and S. Vadhan. Computational differential privacy. In CRYPTO, 2009. Google Scholar
Digital Library
- J. P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, 2010. In progress, Chapter 1 online at academic2.american.edu/~jpnolan.Google Scholar
- S. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 1965.Google Scholar
Index Terms
Pan-private algorithms via statistics on sketches
Recommendations
Differentially Private Ensemble Classifiers for Data Streams
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data MiningLearning from continuous data streams via classification/regression is prevalent in many domains. Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge. We present a ...
Improving the Utility of Differentially Private Data Releases via k-Anonymity
TRUSTCOM '13: Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and CommunicationsA common view in some data anonymization literature is to oppose the "old'' k-anonymity model to the "new'' differential privacy model, which offers more robust privacy guarantees. However, the utility of the masked results provided by differential ...
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...






Comments