skip to main content
research-article

Differentially Private k-Nearest Neighbor Missing Data Imputation

Published:09 April 2022Publication History
Skip Abstract Section

Abstract

Using techniques employing smooth sensitivity, we develop a method for \( k \)-nearest neighbor missing data imputation with differential privacy. This requires bounding the number of data incomplete tuples that can have their data complete “donor” changed by making a single addition or deletion to the dataset. The multiplicity of a single individual’s impact on an imputed dataset necessarily means our mechanisms require the addition of more noise than mechanisms that ignore missing data, but we show empirically that this is significantly outweighed by the bias reduction from imputing missing data.

REFERENCES

  1. [1] III John C. Bailar and Bailar Barbara A.. 1978. Comparison of two procedures for imputing missing survey values. In Proceedings of the Survey Research Methods Section. American Statistical Association, 462467. Retrieved from http://www.asasrms.org/Proceedings.Google ScholarGoogle Scholar
  2. [2] Barrington Linda. 1997. Estimating earnings poverty in 1939: A comparison of orshansky-method and price-indexed definitions of poverty. The Review of Economics and Statistics 79, 3 (1997), 406414. Retrieved from http://www.jstor.org/stable/2951387.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Chawla Shuchi, Dwork Cynthia, McSherry Frank, Smith Adam, and Wee Hoeteck. 2005. Toward privacy in public databases. In Proceedings of the Theory of Cryptography Conference. IACR, Cambridge, MA, 363385. Retrieved from http://www.iacr.org/cryptodb/archive/2005/TCC/3614/3614.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Chiang Fei and Gairola Dhruv. 2018. InfoClean: Protecting sensitive information in data cleaning. Journal of Data and Information Quality 9, 4(2018), 22. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Dwork Cynthia, McSherry Frank, Nissim Kobi, and Smith Adam. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference.265284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Fletcher Sam and Islam Md Zahidul. 2017. Differentially private random decision forests using smooth sensitivity. Expert Systems with Applications 78(2017), 1631. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Ge Chang, Ilyas Ihab F., He Xi, and Machanavajjhala Ashwin. 2018. Private Exploration Primitives for Data Cleaning. Technical Report.Google ScholarGoogle Scholar
  8. [8] Gonem Alon and Gilad-Bachrach Ram. 2018. Smooth sensitivity based approach for differentially private PCA. In Proceedings of the Algorithmic Learning Theory. 438450. Retrieved from http://proceedings.mlr.press/v83/gonem18a.html.Google ScholarGoogle Scholar
  9. [9] Huang Yu, Milani Mostafa, and Chiang Fei. 2018. PACAS: Privacy-Aware, data cleaning-as-a-service. In Proceedings of the 2018 IEEE International Conference on Big Data.10231030. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Jagannathan Geetha and Wright Rebecca N.. 2007. Privacy-preserving imputation of missing data. Data & Knowledge Engineering 65, 1 (2007), 4056. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Jain Prateek, Thakkar Om Dipakbhai, and Thakurta Abhradeep. 2018. Differentially private matrix completion revisited. In Proceedings of the 35th International Conference on Machine Learning.Dy Jennifer G. and Krause Andreas (Eds.), PMLR, 22202229. Retrieved from http://proceedings.mlr.press/v80/jain18b.html.Google ScholarGoogle Scholar
  12. [12] Kalton Graham and Kasprzyk Daniel. 1982. Imputing for missing survey responses. In Proceedings of the Survey Research Methods Section. American Statistical Association, 2233. Retrieved from http://www.asasrms.org/Proceedings.Google ScholarGoogle Scholar
  13. [13] Kapralov Michael and Talwar Kunal. 2013. On differentially private low rank approximation. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms.13951414. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Krishnan Sanjay, Wang Jiannan, Franklin Michael J., Goldberg Ken, and Kraska Tim. 2016. PrivateClean: Data cleaning and differential privacy. In Proceedings of the 2016 International Conference on Management of Data. ACM, 937951. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] McKenna Ryan, Miklau Gerome, Hay Michael, and Machanavajjhala Ashwiin. 2018. Optimizing error of high-dimensional statistical queries under differential privacy. Proceedings of the VLDB Endowment 11, 10 (2018), 12061219. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] McSherry Frank and Mironov Ilya. 2009. Differentially-private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 627636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Center Minnesota Population. 2018. Introduction to data editing and allocation. Retrieved from https://usa.ipums.org/usa/flags.shtml.Google ScholarGoogle Scholar
  18. [18] Nicoletti Cheti and Peracchi Franco. 2006. The effects of income imputation on microanalyses: Evidence from the european community household panel. Journal of the Royal Statistical Society Series A 169, 3(2006), 625646. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Nissim Kobbi, Raskhodnikova Sofya, and Smith Adam. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing. 7584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Nissim Kobbi, Raskhodnikova Sofya, and Smith Adam. 2011. Smooth sensitivity and sampling in private data analysis. (May 17 2011). Retrieved from https://cs-people.bu.edu/ads22/pubs/NRS07/NRS07-full-draft-v1.pdf.Google ScholarGoogle Scholar
  21. [21] Okada Rina, Fukuchi Kazuto, Kakizaki Kazuya, and Sakuma Jun. 2015. Differentially private analysis of outliers. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases.458473. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Qardaji Wahbeh H., Yang Weining, and Li Ninghui. 2014. PriView: Practical differentially private release of marginal contingency tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 14351446. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Ruggles Steven, Flood Sarah, Goeken Ronald, Grover Josiah, Meyer Erin, Pacas Jose, and Sobek Matthew. 2018. IPUMS USA: Version 8.0 Extract of 1940 Census for U.S. Census Bureau Disclosure Avoidance Research. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] United States Census Bureau 2014. American Community Survey Design and Methodology (January 2014). Technical Report Version 2.0. United States Census Bureau. Retrieved from https://www.census.gov/programs-surveys/acs/methodology/design-and-methodology.html.Google ScholarGoogle Scholar
  25. [25] Wang Yue and Wu Xintao. 2013. Preserving differential privacy in degree-correlation based graph generation. Transactions on Data Privacy 6, (22013), 127145. Retrieved from http://www.tdp.cat/issues11/abs.a113a12.php.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Zafarani Farzad and Clifton Chris. 2021. Differentially private naive bayes classifier using smooth sensitivity. Proceedings on Privacy Enhancing Technologies 2021, 4 (2021), 406–419.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Differentially Private k-Nearest Neighbor Missing Data Imputation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Privacy and Security
        ACM Transactions on Privacy and Security  Volume 25, Issue 3
        August 2022
        288 pages
        ISSN:2471-2566
        EISSN:2471-2574
        DOI:10.1145/3530305
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 April 2022
        • Online AM: 29 March 2022
        • Accepted: 1 December 2021
        • Revised: 1 November 2021
        • Received: 1 July 2021
        Published in tops Volume 25, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!