ABSTRACT
The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems of statistical disclosure control in contingency table release. However, all current techniques for reporting contingency tables fall short on at leas one of privacy, accuracy, and consistency (among multiple released tables). We propose a solution that provides strong guarantees for all three desiderata simultaneously.
Our approach can be viewed as a special case of a more general approach for producing synthetic data: Any privacy-preserving mechanism for contingency table release begins with raw data and produces a (possibly inconsistent) privacy-preserving set of marginals. From these tables alone-and hence without weakening privacy--we will find and output the "nearest" consistent set of marginals. Interestingly, this set is no farther than the tables of the raw data, and consequently the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself.
The privacy mechanism of [20] gives the strongest known privacy guarantees, with very little error. Combined with the techniques of the current paper, we therefore obtain excellent privacy, accuracy, and consistency among the tables. Moreover, our techniques are surprisingly efficient. Our techniques apply equally well to the logical cousin of the contingency table, the OLAP cube.
- Special Issue on Statistical Disclosure Control, volume 14(4) of Journal of Official Statistics. 1998.Google Scholar
- D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In PODS. ACM, 2001. Google Scholar
Digital Library
- R. Agrawal and R. Srikant. Privacy-preserving data mining. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, SIGMOD Conference, pages 439--450. ACM, 2000. Google Scholar
Digital Library
- R. Agrawal, R. Srikant, and D. Thomas. Privacy preserving OLAP. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 251--262, New York, NY, USA, 2005. ACM Press. Google Scholar
Digital Library
- M. Bacharach. Matrix rounding problems. Management Science, 9:732--742, 1966.Google Scholar
Digital Library
- A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In C. Li, editor, PODS, pages 128--138. ACM, 2005. Google Scholar
Digital Library
- J. Castro. Quadratic interior-point methods in statistical disclosure control. Computational Management Science, 2:pages 107--121, 2005.Google Scholar
- J. Castro. Minimum-distance controlled perturbation methods for large-scale tabular data protection. Euorpean Journal of Operantional Research, 171:39--52, 2006.Google Scholar
Cross Ref
- S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In J. Kilian, editor, TCC, volume 3378 of Lecture Notes in Computer Science, pages 363--385. Springer, 2005. Google Scholar
Digital Library
- S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On privacy-preserving histograms. In Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), Arlington, Virginia, 2005. AUAI Press.Google Scholar
- L. Cox, J. Kelly, and R. Patil. Balancing quality and confidentiality in multivariate tabular data. Privacy in Statistical Databases, 3080:87--98, 2004.Google Scholar
Cross Ref
- T. Dalenius. Towards a methodology for statistical disclosure control. Statistisk. tidskrift, 3:213--225, 1977.Google Scholar
- R. A. Dandekar and L. Cox. Synthetic tabular data: An alternative to complementary cell suppression, 2002. manuscript, energy Information Administration, US Department of Energy.Google Scholar
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In Milo {26}, pages 202--210. Google Scholar
Digital Library
- A. Dobra and S. Fienberg. Bounding entries in multi-way contingency tables given a set of marginal totals, 2002. Proceedings of Conference on Foundation of Statistical Inference and its Applicaitons.Google Scholar
- J. Domingo-Ferrer and V. Torra. A critique of the sensitivity rules usually employed for statistical table protection. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):545--556, 2002. Google Scholar
Digital Library
- G. Duncan. Confidentiality and statistical disclosure limitation. In N. Smelser and P. Baltes, editors, International Encyclopedia of the Social and Behavioral Sciences. Elsevier, 2001.Google Scholar
- C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1--12. Springer, 2006. Google Scholar
Digital Library
- C. Dwork, D. Lee, and F. McSherry. Privacy preserving histogram case study, 2007. Manuscript.Google Scholar
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevi and T. Rabin, editors, TCC, volume 3876 of Lecture Notes in Computer Science, pages 265--284. Springer, 2006. Google Scholar
Digital Library
- C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of LP decoding. In Proceedings of the 39th annual Symposium on the Theory of Computation., 2007. Google Scholar
Digital Library
- C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In M. K. Franklin, editor, CRYPTO, volume 3152 of Lecture Notes in Computer Science, pages 528--544. Springer, 2004.Google Scholar
- A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In Milo {26}, pages 211--222. Google Scholar
Digital Library
- I. Fellegi. On the question of statistical confidentiality. Journal of the American Statistical Association, pages 7--18, 1972.Google Scholar
Cross Ref
- M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization (Algorithms and Combinatorics). Springer, December 1994.Google Scholar
- T. Milo, editor. Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9--12, 2003, San Diego, CA, USA. ACM, 2003.Google Scholar
- J. Kelly, A. Assad, and B. Golden. The controlled rounding problem: Relaxations and complexity issues. OR Spektrum, 12:129--138, 1990.Google Scholar
Cross Ref
- T. W. Körner. Fourier Analysis. Cambridge University Press, Cambridge, UK, 1988.Google Scholar
- J. A. D. Loera and S. Onn. All rational polytopes are transportation polytopes and all polytopal integer sets are contingency tables. Proceedings of the 10th Ann. Math. Prog. Soc. Symp. Integ. Prog. Combin. Optim., LNCS, 3064:338--351, 2004.Google Scholar
Cross Ref
- J. A. D. Loera and S. Onn. The complexity of three-way statistical tables. SIAM J. Comput., 33:819--836, 2004. Google Scholar
Digital Library
- J. A. D. Loera and S. Onn. Markov bases of three-way tables are arbitrarily complicated. J. Symb. Comput., 41(2):173--181, 2006. Google Scholar
Digital Library
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In L. Liu, A. Reuter, K. -Y. Whang, and J. Zhang, editors, ICDE, page 24. IEEE Computer Society, 2006. Google Scholar
Digital Library
- D. A. Robertson and R. Ethier. Cell suppression: Experience and theory. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 8--20. Springer, 2002. Google Scholar
Digital Library
- D. Rubin. Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9:461--469, 1993.Google Scholar
- P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression.Google Scholar
- L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):571--588, 2002. Google Scholar
Digital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google Scholar
Digital Library
Index Terms
Privacy, accuracy, and consistency too: a holistic solution to contingency table release
Recommendations
Impossibility results for RFID privacy notions
Transactions on computational science XIRFID systems have become increasingly popular and are already used in many real-life applications. Although very useful, RFIDs introduce privacy risks since they carry identifying information that can be traced. Hence, several RFID privacy models have ...
A framework for efficient data anonymization under privacy and accuracy constraints
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-...
RFID privacy: relation between two notions, minimal condition, and efficient construction
CCS '09: Proceedings of the 16th ACM conference on Computer and communications securityPrivacy of RFID systems is receiving increasing attention in the RFID community. Basically, there are two kinds of RFID privacy notions: one based on the indistinguishability of two tags, denoted as ind-privacy, and the other based on the ...






Comments