skip to main content
10.1145/1265530.1265569acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Privacy, accuracy, and consistency too: a holistic solution to contingency table release

Published:11 June 2007Publication History

ABSTRACT

The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems of statistical disclosure control in contingency table release. However, all current techniques for reporting contingency tables fall short on at leas one of privacy, accuracy, and consistency (among multiple released tables). We propose a solution that provides strong guarantees for all three desiderata simultaneously.

Our approach can be viewed as a special case of a more general approach for producing synthetic data: Any privacy-preserving mechanism for contingency table release begins with raw data and produces a (possibly inconsistent) privacy-preserving set of marginals. From these tables alone-and hence without weakening privacy--we will find and output the "nearest" consistent set of marginals. Interestingly, this set is no farther than the tables of the raw data, and consequently the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself.

The privacy mechanism of [20] gives the strongest known privacy guarantees, with very little error. Combined with the techniques of the current paper, we therefore obtain excellent privacy, accuracy, and consistency among the tables. Moreover, our techniques are surprisingly efficient. Our techniques apply equally well to the logical cousin of the contingency table, the OLAP cube.

References

  1. Special Issue on Statistical Disclosure Control, volume 14(4) of Journal of Official Statistics. 1998.Google ScholarGoogle Scholar
  2. D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In PODS. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant. Privacy-preserving data mining. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, SIGMOD Conference, pages 439--450. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal, R. Srikant, and D. Thomas. Privacy preserving OLAP. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 251--262, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bacharach. Matrix rounding problems. Management Science, 9:732--742, 1966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In C. Li, editor, PODS, pages 128--138. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Castro. Quadratic interior-point methods in statistical disclosure control. Computational Management Science, 2:pages 107--121, 2005.Google ScholarGoogle Scholar
  8. J. Castro. Minimum-distance controlled perturbation methods for large-scale tabular data protection. Euorpean Journal of Operantional Research, 171:39--52, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In J. Kilian, editor, TCC, volume 3378 of Lecture Notes in Computer Science, pages 363--385. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On privacy-preserving histograms. In Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), Arlington, Virginia, 2005. AUAI Press.Google ScholarGoogle Scholar
  11. L. Cox, J. Kelly, and R. Patil. Balancing quality and confidentiality in multivariate tabular data. Privacy in Statistical Databases, 3080:87--98, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Dalenius. Towards a methodology for statistical disclosure control. Statistisk. tidskrift, 3:213--225, 1977.Google ScholarGoogle Scholar
  13. R. A. Dandekar and L. Cox. Synthetic tabular data: An alternative to complementary cell suppression, 2002. manuscript, energy Information Administration, US Department of Energy.Google ScholarGoogle Scholar
  14. I. Dinur and K. Nissim. Revealing information while preserving privacy. In Milo {26}, pages 202--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Dobra and S. Fienberg. Bounding entries in multi-way contingency tables given a set of marginal totals, 2002. Proceedings of Conference on Foundation of Statistical Inference and its Applicaitons.Google ScholarGoogle Scholar
  16. J. Domingo-Ferrer and V. Torra. A critique of the sensitivity rules usually employed for statistical table protection. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):545--556, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Duncan. Confidentiality and statistical disclosure limitation. In N. Smelser and P. Baltes, editors, International Encyclopedia of the Social and Behavioral Sciences. Elsevier, 2001.Google ScholarGoogle Scholar
  18. C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1--12. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Dwork, D. Lee, and F. McSherry. Privacy preserving histogram case study, 2007. Manuscript.Google ScholarGoogle Scholar
  20. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevi and T. Rabin, editors, TCC, volume 3876 of Lecture Notes in Computer Science, pages 265--284. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of LP decoding. In Proceedings of the 39th annual Symposium on the Theory of Computation., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In M. K. Franklin, editor, CRYPTO, volume 3152 of Lecture Notes in Computer Science, pages 528--544. Springer, 2004.Google ScholarGoogle Scholar
  23. A. V. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In Milo {26}, pages 211--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Fellegi. On the question of statistical confidentiality. Journal of the American Statistical Association, pages 7--18, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization (Algorithms and Combinatorics). Springer, December 1994.Google ScholarGoogle Scholar
  26. T. Milo, editor. Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9--12, 2003, San Diego, CA, USA. ACM, 2003.Google ScholarGoogle Scholar
  27. J. Kelly, A. Assad, and B. Golden. The controlled rounding problem: Relaxations and complexity issues. OR Spektrum, 12:129--138, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  28. T. W. Körner. Fourier Analysis. Cambridge University Press, Cambridge, UK, 1988.Google ScholarGoogle Scholar
  29. J. A. D. Loera and S. Onn. All rational polytopes are transportation polytopes and all polytopal integer sets are contingency tables. Proceedings of the 10th Ann. Math. Prog. Soc. Symp. Integ. Prog. Combin. Optim., LNCS, 3064:338--351, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  30. J. A. D. Loera and S. Onn. The complexity of three-way statistical tables. SIAM J. Comput., 33:819--836, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. A. D. Loera and S. Onn. Markov bases of three-way tables are arbitrarily complicated. J. Symb. Comput., 41(2):173--181, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In L. Liu, A. Reuter, K. -Y. Whang, and J. Zhang, editors, ICDE, page 24. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. A. Robertson and R. Ethier. Cell suppression: Experience and theory. In J. Domingo-Ferrer, editor, Inference Control in Statistical Databases, volume 2316 of Lecture Notes in Computer Science, pages 8--20. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Rubin. Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9:461--469, 1993.Google ScholarGoogle Scholar
  35. P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression.Google ScholarGoogle Scholar
  36. L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):571--588, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy, accuracy, and consistency too: a holistic solution to contingency table release

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2007
      328 pages
      ISBN:9781595936851
      DOI:10.1145/1265530

      Copyright © 2007 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!