Abstract
Inspired by the proliferation of data-analysis tasks, recent research in program synthesis has had a strong focus on enabling users to specify data-analysis programs through intuitive specifications, like examples and natural language. However, with the ever-increasing threat to privacy through data analysis, we believe it is imperative to reimagine program synthesis technology in the presence of formal privacy constraints.
In this paper, we study the problem of automatically synthesizing randomized, differentially private programs, where the user can provide the synthesizer with a constraint on the privacy of the desired algorithm. We base our technique on a linear dependent type system that can track the resources consumed by a program, and hence its privacy cost. We develop a novel type-directed synthesis algorithm that constructs randomized differentially private programs. We apply our technique to the problems of synthesizing database-like queries as well as recursive differential privacy mechanisms from the literature.
Supplemental Material
Available for Download
Supplementary material containing proofs of theorems stated in the paper.
- Aws Albarghouthi and Justin Hsu. 2018. Synthesizing coupling proofs of differential privacy. PACMPL 2, POPL (2018), 58:1–58:30. Google Scholar
Digital Library
- Apple. Accessed 11-11-2017. Differential privacy. https://images.apple.com/privacy/docs/Differential_Privacy_Overview.pdf . (Accessed 11-11-2017).Google Scholar
- Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016. A program logic for union bounds. In The 43rd International Colloquium on Automata, Languages and Programming . Rome, Italy.Google Scholar
- US Census Bureau. Accessed 11-11-2017. On The Map. https://onthemap.ces.census.gov/ . (Accessed 11-11-2017).Google Scholar
- Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, and Shayak Sen. 2017. Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). ACM, New York, NY, USA, 1193–1210. Google Scholar
Digital Library
- Arthur Azevedo de Amorim, Marco Gaboardi, Emilio Jesús Gallego Arias, and Justin Hsu. 2014. Really Natural Linear Indexed Type Checking. In Proceedings of the 26th 2014 International Symposium on Implementation and Application of Functional Languages, IFL ’14, Boston, MA, USA, October 1-3, 2014. 5:1–5:12. Google Scholar
Digital Library
- Arthur Azevedo de Amorim, Marco Gaboardi, Justin Hsu, Shin-ya Katsumata, and Ikram Cherigui. 2017. A Semantic Account of Metric Preservation. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). 545–556. Google Scholar
Digital Library
- Leonardo Mendonça de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In TACAS.Google Scholar
- Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 (2014), 211–407. Google Scholar
Digital Library
- Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, USA, 259–268. Google Scholar
Digital Library
- Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 422–436. Google Scholar
Digital Library
- John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In PLDI. Google Scholar
Digital Library
- Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-Directed Synthesis: A TypeTheoretic Interpretation. In POPL. Google Scholar
Digital Library
- Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C. Pierce. 2013. Linear dependent types for differential privacy. In The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, Rome, Italy - January 23 - 25, 2013. 357–370. Google Scholar
Digital Library
- Sumit Gulwani, William R. Harris, and Rishabh Singh. 2012. Spreadsheet data manipulation using examples. CACM 8 (2012). Google Scholar
Digital Library
- Anupam Gupta, Aaron Roth, and Jonathan Ullman. 2012. Iterative constructions and private data release. In Theory of cryptography conference. Springer, 339–356. Google Scholar
Digital Library
- Samuel Haney, Ashwin Machanavajjhala, John M Abowd, Matthew Graham, Mark Kutzbach, and Lars Vilhuber. 2017. Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1339–1354. Google Scholar
Digital Library
- Moritz Hardt, Katrina Ligett, and Frank Mcsherry. 2012. A Simple and Practical Algorithm for Differentially Private Data Release. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2339–2347. http://papers.nips.cc/paper/4548-a-simple-and-practical-algorithm-fordifferentially-private-data-release.pdf Google Scholar
Digital Library
- Lauren Kirchner Jeff Larson, Surya Mattu and Julia Angwin. {n. d.}. How We Analyzed the COMPAS Recidivism Algorithm. ({n. d.}). https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm/ Accessed: 2017-11-15.Google Scholar
- Noah M. Johnson, Joseph P. Near, and Dawn Xiaodong Song. 2018. Practical Differential Privacy for SQL Queries Using Elastic Sensitivity. VLDB. http://arxiv.org/abs/1706.09479 Google Scholar
Digital Library
- Vu Le and Sumit Gulwani. 2014. FlashExtract: a framework for data extraction by examples. In PLDI. Google Scholar
Digital Library
- M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
- Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on. IEEE, 94–103. Google Scholar
Digital Library
- Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 19–30. Google Scholar
Digital Library
- Anders Miltner, Kathleen Fisher, Benjamin C. Pierce, David Walker, and Steve Zdancewic. 2018. Synthesizing bijective lenses. PACMPL 2, POPL (2018), 1:1–1:30. Google Scholar
Digital Library
- Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially Private Join Queries over Distributed Databases.. In OSDI. 149–162. Google Scholar
Digital Library
- Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed program synthesis. In PLDI. Google Scholar
Digital Library
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). 522–538. Google Scholar
Digital Library
- Oleksander Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. In OOPSLA. Google Scholar
Digital Library
- Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2014. Calibrating data to sensitivity in private data analysis: a platform for differentially-private analysis of weighted datasets. Proceedings of the VLDB Endowment 7, 8 (2014), 637–648. Google Scholar
Digital Library
- Patrick Maxim Rondon, Ming Kawaguchi, and Ranjit Jhala. 2008. Liquid types. In PLDI.Google Scholar
- Indrajit Roy, Srinath TV Setty, Ann Kilzer, Vitaly Shmatikov, and Emmett Witchel. 2010. Airavat: Security and privacy for MapReduce.. In NSDI, Vol. 10. 297–312. Google Scholar
Digital Library
- Calvin Smith and Aws Albarghouthi. 2016. MapReduce Program Synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). 326–340. Google Scholar
Digital Library
- Calvin Smith, Justin Hsu, and Aws Albarghouthi. 2019. Trace Abstraction Modulo Probability. Proc. ACM Program. Lang. 3, POPL, Article 39 (Jan. 2019), 31 pages. Google Scholar
Digital Library
- Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017a. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 452–466. Google Scholar
Digital Library
- Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017b. Synthesis of data completion scripts using finite tree automata. PACMPL 1, OOPSLA (2017), 62:1–62:26. Google Scholar
Digital Library
- Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. OOPSLA. Google Scholar
Digital Library
- Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In OSDI. Google Scholar
Digital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. Google Scholar
Digital Library
- Sai Zhang and Yuyin Sun. 2013. Automatically synthesizing SQL queries from input-output examples. In ASE, Ewen Denney, Tevfik Bultan, and Andreas Zeller (Eds.). IEEE, 224–234. Google Scholar
Digital Library
Index Terms
Synthesizing differentially private programs
Recommendations
DPGen: Automated Program Synthesis for Differential Privacy
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications SecurityDifferential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then decides where to ...
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Evaluating differentially private decision tree model over model inversion attack
AbstractMachine learning techniques have been widely used and shown remarkable performance in various fields. Along with the widespread utilization of machine learning, concerns about privacy violations have been raised. Recently, as privacy invasion ...






Comments