Abstract
Differential privacy has emerged as a leading theoretical framework for privacy-preserving data gathering and analysis. It allows meaningful statistics to be collected for a population without revealing ``too much'' information about any individual member of the population. For software profiling, this machinery allows profiling data from many users of a deployed software system to be collected and analyzed in a privacy-preserving manner. Such a solution is appealing to many stakeholders, including software users, software developers, infrastructure providers, and government agencies.
We propose an approach for differentially-private collection of frequency vectors from software executions. Frequency information is reported with the addition of random noise drawn from the Laplace distribution. A key observation behind the design of our scheme is that event frequencies are closely correlated due to the static code structure. Differential privacy protections must account for such relationships; otherwise, a seemingly-strong privacy guarantee is actually weaker than it appears. Motivated by this observation, we propose a novel and general differentially-private profiling scheme when correlations between frequencies can be expressed through linear inequalities. Using a linear programming formulation, we show how to determine the magnitude of random noise that should be added to achieve meaningful privacy protections under such linear constraints. Next, we develop an efficient instance of this general machinery for an important subclass of constraints. Instead of LP, our solution uses a reachability analysis of a constraint graph. As an exemplar, we employ this approach to implement differentially-private method frequency profiling for Android apps.
Any differentially-private scheme has to balance two competing aspects: privacy and accuracy. Through an experimental study to characterize these trade-offs, we (1) show that our proposed randomization achieves much higher accuracy compared to related prior work, (2) demonstrate that high accuracy and high privacy protection can be achieved simultaneously, and (3) highlight the importance of linear constraints in the design of the randomization. These promising results provide evidence that our approach is a good candidate for privacy-preserving frequency profiling of deployed software.
Supplemental Material
- Apple. 2017. Learning with privacy at scale. https://machinelearning.apple.com/ 2017 /12/06/ learning-with-privacy-atscale.html.Google Scholar
- Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. 2017. BLENDER: Enabling local search with a hybrid diferential privacy model. In USENIX Security. 747-764.Google Scholar
- Thomas Ball and James Larus. 1994. Optimally profiling and tracing programs. TOPLAS 16, 4 ( July 1994 ), 1319-1360.Google Scholar
- Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Guha Thakurta. 2017. Practical locally private heavy hitters. In NIPS. 2288-2296.Google Scholar
- Aditya Budi, David Lo, and Lingxiao Jiang. 2011. kb-anonymity: A model for anonymized behaviour-preserving test and debugging data. In PLDI. 447-457.Google Scholar
- Mark Bun, Jelani Nelson, and Uri Stemmer. 2018. Heavy hitters and the structure of local privacy. In PODS. 435-447.Google Scholar
- James Clause and Alessandro Orso. 2007. A technique for enabling and supporting debugging of field failures. In ICSE. 261-270.Google Scholar
- James Clause and Alessandro Orso. 2011. Camouflage: Automated anonymization of field data. In ICSE. 21-30.Google Scholar
Digital Library
- Aref Dajan, Amy Lauger, Phyllis Singer, Daniel Kifer, Jerome Reiter, Ashwin Machanavajjhala, Simson Garfinkel, Scot Dahl, Matthew Graham, Vishesh Karwa, Hang Kim, Philip Leclerc, Ian Schmutte, William Sexton, Lars Vilhuber, and John Abowd. 2017. The modernization of statistical disclosure limitation at the U.S. Census Bureau. https://www2.census.gov/ cac/sac/meetings/2017-09/statistical-disclosure-limitation.pdf.Google Scholar
- Madeline Diep, Myra Cohen, and Sebastian Elbaum. 2006. Probe distribution techniques to profile events in deployed software. In ISSRE. 331-342.Google Scholar
- Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. In NIPS. 3571-3580.Google Scholar
- John Duchi, Michael Jordan, and Martin Wainwright. 2013. Local privacy and statistical minimax rates. In FOCS. 429-438.Google Scholar
- Cynthia Dwork. 2006. Diferential privacy. In ICALP. 1-12.Google Scholar
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. 265-284.Google Scholar
Digital Library
- Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of diferential privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 ( 2014 ), 211-407.Google Scholar
- Sebastian Elbaum and Madeline Hardojo. 2004. An empirical study of profiling strategies for released software and their impact on testing activities. In ISSTA. 65-75.Google Scholar
- Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054-1067.Google Scholar
- Exodus Privacy. 2020. Most frequent app trackers for Android. https://reports.exodus-privacy.eu.org/en/reports/stats.Google Scholar
- Facebook. 2020. Facebook Analytics. https://analytics.facebook.com.Google Scholar
- Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In OOPSLA. 57-76.Google Scholar
Digital Library
- Google. 2020a. Android Debug Bridge (adb). https://developer.android.com/studio/command-line/adb.Google Scholar
- Google. 2020b. Firebase. https://firebase.google.com.Google Scholar
- Google. 2020c. UI/Application Exerciser Monkey. https://developer.android.com/studio/test/monkey.Google Scholar
- Irit Hadar, Tomer Hasson, Oshrat Ayalon, Eran Toch, Michael Birnhack, Sofia Sherman, and Arod Balissa. 2018. Privacy by designers: Software developers' privacy mindset. ESE 23, 1 ( 2018 ), 259-289.Google Scholar
- Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. 2012. Performance debugging in the large via mining millions of stack traces. In ICSE. 145-155.Google Scholar
- Murali Haran, Alan Karr, Alessandro Orso, Adam Porter, and Ashish Sanil. 2005. Applying classification techniques to remotely-collected program execution data. In ESEC/FSE. 146-155.Google Scholar
- Wei Jin and Alessandro Orso. 2012. BugRedux: Reproducing field failures for in-house debugging. In ICSE. 474-484.Google Scholar
- Wei Jin and Alessandro Orso. 2013. F3: Fault localization for field failures. In ISSTA. 213-223.Google Scholar
- Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately ? SICOMP 40, 3 ( 2011 ), 793-826.Google Scholar
- Ben Liblit, Alex Aiken, Alice Zheng, and Michael Jordan. 2003. Bug isolation via remote program sampling. In PLDI. 141-154.Google Scholar
Digital Library
- MathWorks. 2020. Optimization Toolbox. https://www.mathworks.com/help/optim.Google Scholar
- Mitula. 2020. Mitula Homes. https://play.google.com/store/apps/details?id=com.mitula.homes.Google Scholar
- Priya Nagpurkar, Hussam Mousa, Chandra Krintz, and Timothy Sherwood. 2006. Eficient remote profiling for resourceconstrained devices. TACO 3, 1 (March 2006 ), 35-66.Google Scholar
Digital Library
- Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In S&P. 111-125.Google Scholar
- Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing social networks. In S&P. 173-187.Google Scholar
- Joseph Near, David Darais, Chike Abuah, Tim Stevens, Pranav Gaddamadugu, Lun Wang, Neel Somani, Mu Zhang, Nikhil Sharma, Alex Shan, and Dawn Song. 2019. Duet: An expressive higher-order language and linear type system for statically enforcing diferential privacy. PACMPL 3, OOPSLA ( 2019 ), 1-30.Google Scholar
- Kobbi Nissim and Uri Stemmer. 2018. Clustering algorithms for the centralized and local models. In ALT. 619-653.Google Scholar
- Peter Ohmann, Alexander Brooks, Loris D'Antoni, and Ben Liblit. 2017. Control-flow recovery from partial failure reports. In PLDI. 390-405.Google Scholar
- Peter Ohmann, David Bingham Brown, Naveen Neelakandan, Jef Linderoth, and Ben Liblit. 2016. Optimizing customized program coverage. In ASE. 27-38.Google Scholar
- Alessandro Orso, Taweesup Apiwattanapong, and Mary Jean Harrold. 2003. Leveraging field data for impact analysis and regression testing. In ESEC/FSE. 128-137.Google Scholar
- Alessandro Orso, Donglin Liang, Mary Jean Harrold, and Richard Lipton. 2002. GAMMA system: Continuous evolution of software after deployment. In ISSTA. 65-69.Google Scholar
- Christina Pavlopoulou and Michal Young. 1999. Residual test coverage monitoring. In ICSE. 277-284.Google Scholar
- Fayola Peters, Tim Menzies, Liang Gong, and Hongyu Zhang. 2013. Balancing privacy and utility in cross-company defect prediction. TSE 39, 8 ( 2013 ), 1054-1068.Google Scholar
- Sable. 2020. Soot-A framework for analyzing and transforming Java and Android applications. https://soot-oss.github.io/ soot.Google Scholar
- ACM SIGACT/EATCS. 2017. Gödel Prize. https://sigact.org/prizes/g%C3%B6del/citation2017.pdf.Google Scholar
- Abhradeep Guha Thakurta, Andrew H Vyrros, Umesh S Vaishampayan, Gaurav Kapoor, Julien Freudiger, Vivek Rangarajan Sridhar, and Doug Davidson. 2017. Learning new words. In Granted US Patents 9594741 and 9645998.Google Scholar
- Uber. 2017. Uber releases open source project for diferential privacy. https://medium.com /uber-security-privacy/diferentialprivacy-open-source-7892c82c42b6.Google Scholar
- Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. 2017. Locally diferentially private protocols for frequency estimation. In USENIX Security. 729-745.Google Scholar
- Tianhao Wang, Milan Lopuhaä-Zwakenberg, Zitao Li, Boris Skoric, and Ninghui Li. 2020. Consistent and accurate frequency oracles under local diferential privacy. In NDSS. 1-16.Google Scholar
- Yuxin Wang, Zeyu Ding, Guanhong Wang, Daniel Kifer, and Danfeng Zhang. 2019. Proving diferential privacy with shadow execution. In PLDI. 655-669.Google Scholar
- Stanley Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 309, 60 ( 1965 ), 63-69.Google Scholar
- Danfeng Zhang and Daniel Kifer. 2017. LightDP: Towards automating diferential privacy proofs. In PLDI. 888-901.Google Scholar
- Hailong Zhang, Yu Hao, Sufian Latif, Raef Bassily, and Atanas Rountev. 2020a. A study of event frequency profiling with diferential privacy. In CC. 51-62.Google Scholar
- Hailong Zhang, Sufian Latif, Raef Bassily, and Atanas Rountev. 2020b. Diferentially-Private Control-Flow Node Coverage for Software Usage Analysis. In USENIX Security. 1021-1038.Google Scholar
Index Terms
Differentially-private software frequency profiling under linear constraints
Recommendations
A differentially private algorithm for location data release
The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Evaluating differentially private decision tree model over model inversion attack
AbstractMachine learning techniques have been widely used and shown remarkable performance in various fields. Along with the widespread utilization of machine learning, concerns about privacy violations have been raised. Recently, as privacy invasion ...
Disclosure Risk from Homogeneity Attack in Differentially Private Release of Frequency Distribution
CODASPY '22: Proceedings of the Twelfth ACM Conference on Data and Application Security and PrivacyDifferential privacy (DP) provides a robust model to achieve privacy guarantees in released information. We examine the robustness of the protection against homogeneity attack (HA) in multi-dimensional frequency distributions sanitized via DP ...






Comments