skip to main content
research-article
Open Access

Differentially-private software frequency profiling under linear constraints

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Differential privacy has emerged as a leading theoretical framework for privacy-preserving data gathering and analysis. It allows meaningful statistics to be collected for a population without revealing ``too much'' information about any individual member of the population. For software profiling, this machinery allows profiling data from many users of a deployed software system to be collected and analyzed in a privacy-preserving manner. Such a solution is appealing to many stakeholders, including software users, software developers, infrastructure providers, and government agencies.

We propose an approach for differentially-private collection of frequency vectors from software executions. Frequency information is reported with the addition of random noise drawn from the Laplace distribution. A key observation behind the design of our scheme is that event frequencies are closely correlated due to the static code structure. Differential privacy protections must account for such relationships; otherwise, a seemingly-strong privacy guarantee is actually weaker than it appears. Motivated by this observation, we propose a novel and general differentially-private profiling scheme when correlations between frequencies can be expressed through linear inequalities. Using a linear programming formulation, we show how to determine the magnitude of random noise that should be added to achieve meaningful privacy protections under such linear constraints. Next, we develop an efficient instance of this general machinery for an important subclass of constraints. Instead of LP, our solution uses a reachability analysis of a constraint graph. As an exemplar, we employ this approach to implement differentially-private method frequency profiling for Android apps.

Any differentially-private scheme has to balance two competing aspects: privacy and accuracy. Through an experimental study to characterize these trade-offs, we (1) show that our proposed randomization achieves much higher accuracy compared to related prior work, (2) demonstrate that high accuracy and high privacy protection can be achieved simultaneously, and (3) highlight the importance of linear constraints in the design of the randomization. These promising results provide evidence that our approach is a good candidate for privacy-preserving frequency profiling of deployed software.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is the presentation video of my talk for our accepted paper to the OOPSLA2020 technical track. In the paper, we propose a novel and general differentially-private software frequency profiling scheme when correlations between frequencies can be expressed through linear inequalities. Using a linear programming formulation, we show how to determine the magnitude of random noise that should be added to achieve meaningful privacy protections under such linear constraints.

References

  1. Apple. 2017. Learning with privacy at scale. https://machinelearning.apple.com/ 2017 /12/06/ learning-with-privacy-atscale.html.Google ScholarGoogle Scholar
  2. Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. 2017. BLENDER: Enabling local search with a hybrid diferential privacy model. In USENIX Security. 747-764.Google ScholarGoogle Scholar
  3. Thomas Ball and James Larus. 1994. Optimally profiling and tracing programs. TOPLAS 16, 4 ( July 1994 ), 1319-1360.Google ScholarGoogle Scholar
  4. Raef Bassily, Kobbi Nissim, Uri Stemmer, and Abhradeep Guha Thakurta. 2017. Practical locally private heavy hitters. In NIPS. 2288-2296.Google ScholarGoogle Scholar
  5. Aditya Budi, David Lo, and Lingxiao Jiang. 2011. kb-anonymity: A model for anonymized behaviour-preserving test and debugging data. In PLDI. 447-457.Google ScholarGoogle Scholar
  6. Mark Bun, Jelani Nelson, and Uri Stemmer. 2018. Heavy hitters and the structure of local privacy. In PODS. 435-447.Google ScholarGoogle Scholar
  7. James Clause and Alessandro Orso. 2007. A technique for enabling and supporting debugging of field failures. In ICSE. 261-270.Google ScholarGoogle Scholar
  8. James Clause and Alessandro Orso. 2011. Camouflage: Automated anonymization of field data. In ICSE. 21-30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aref Dajan, Amy Lauger, Phyllis Singer, Daniel Kifer, Jerome Reiter, Ashwin Machanavajjhala, Simson Garfinkel, Scot Dahl, Matthew Graham, Vishesh Karwa, Hang Kim, Philip Leclerc, Ian Schmutte, William Sexton, Lars Vilhuber, and John Abowd. 2017. The modernization of statistical disclosure limitation at the U.S. Census Bureau. https://www2.census.gov/ cac/sac/meetings/2017-09/statistical-disclosure-limitation.pdf.Google ScholarGoogle Scholar
  10. Madeline Diep, Myra Cohen, and Sebastian Elbaum. 2006. Probe distribution techniques to profile events in deployed software. In ISSRE. 331-342.Google ScholarGoogle Scholar
  11. Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. In NIPS. 3571-3580.Google ScholarGoogle Scholar
  12. John Duchi, Michael Jordan, and Martin Wainwright. 2013. Local privacy and statistical minimax rates. In FOCS. 429-438.Google ScholarGoogle Scholar
  13. Cynthia Dwork. 2006. Diferential privacy. In ICALP. 1-12.Google ScholarGoogle Scholar
  14. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. 265-284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of diferential privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 ( 2014 ), 211-407.Google ScholarGoogle Scholar
  16. Sebastian Elbaum and Madeline Hardojo. 2004. An empirical study of profiling strategies for released software and their impact on testing activities. In ISSTA. 65-75.Google ScholarGoogle Scholar
  17. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054-1067.Google ScholarGoogle Scholar
  18. Exodus Privacy. 2020. Most frequent app trackers for Android. https://reports.exodus-privacy.eu.org/en/reports/stats.Google ScholarGoogle Scholar
  19. Facebook. 2020. Facebook Analytics. https://analytics.facebook.com.Google ScholarGoogle Scholar
  20. Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In OOPSLA. 57-76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Google. 2020a. Android Debug Bridge (adb). https://developer.android.com/studio/command-line/adb.Google ScholarGoogle Scholar
  22. Google. 2020b. Firebase. https://firebase.google.com.Google ScholarGoogle Scholar
  23. Google. 2020c. UI/Application Exerciser Monkey. https://developer.android.com/studio/test/monkey.Google ScholarGoogle Scholar
  24. Irit Hadar, Tomer Hasson, Oshrat Ayalon, Eran Toch, Michael Birnhack, Sofia Sherman, and Arod Balissa. 2018. Privacy by designers: Software developers' privacy mindset. ESE 23, 1 ( 2018 ), 259-289.Google ScholarGoogle Scholar
  25. Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. 2012. Performance debugging in the large via mining millions of stack traces. In ICSE. 145-155.Google ScholarGoogle Scholar
  26. Murali Haran, Alan Karr, Alessandro Orso, Adam Porter, and Ashish Sanil. 2005. Applying classification techniques to remotely-collected program execution data. In ESEC/FSE. 146-155.Google ScholarGoogle Scholar
  27. Wei Jin and Alessandro Orso. 2012. BugRedux: Reproducing field failures for in-house debugging. In ICSE. 474-484.Google ScholarGoogle Scholar
  28. Wei Jin and Alessandro Orso. 2013. F3: Fault localization for field failures. In ISSTA. 213-223.Google ScholarGoogle Scholar
  29. Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately ? SICOMP 40, 3 ( 2011 ), 793-826.Google ScholarGoogle Scholar
  30. Ben Liblit, Alex Aiken, Alice Zheng, and Michael Jordan. 2003. Bug isolation via remote program sampling. In PLDI. 141-154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. MathWorks. 2020. Optimization Toolbox. https://www.mathworks.com/help/optim.Google ScholarGoogle Scholar
  32. Mitula. 2020. Mitula Homes. https://play.google.com/store/apps/details?id=com.mitula.homes.Google ScholarGoogle Scholar
  33. Priya Nagpurkar, Hussam Mousa, Chandra Krintz, and Timothy Sherwood. 2006. Eficient remote profiling for resourceconstrained devices. TACO 3, 1 (March 2006 ), 35-66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In S&P. 111-125.Google ScholarGoogle Scholar
  35. Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing social networks. In S&P. 173-187.Google ScholarGoogle Scholar
  36. Joseph Near, David Darais, Chike Abuah, Tim Stevens, Pranav Gaddamadugu, Lun Wang, Neel Somani, Mu Zhang, Nikhil Sharma, Alex Shan, and Dawn Song. 2019. Duet: An expressive higher-order language and linear type system for statically enforcing diferential privacy. PACMPL 3, OOPSLA ( 2019 ), 1-30.Google ScholarGoogle Scholar
  37. Kobbi Nissim and Uri Stemmer. 2018. Clustering algorithms for the centralized and local models. In ALT. 619-653.Google ScholarGoogle Scholar
  38. Peter Ohmann, Alexander Brooks, Loris D'Antoni, and Ben Liblit. 2017. Control-flow recovery from partial failure reports. In PLDI. 390-405.Google ScholarGoogle Scholar
  39. Peter Ohmann, David Bingham Brown, Naveen Neelakandan, Jef Linderoth, and Ben Liblit. 2016. Optimizing customized program coverage. In ASE. 27-38.Google ScholarGoogle Scholar
  40. Alessandro Orso, Taweesup Apiwattanapong, and Mary Jean Harrold. 2003. Leveraging field data for impact analysis and regression testing. In ESEC/FSE. 128-137.Google ScholarGoogle Scholar
  41. Alessandro Orso, Donglin Liang, Mary Jean Harrold, and Richard Lipton. 2002. GAMMA system: Continuous evolution of software after deployment. In ISSTA. 65-69.Google ScholarGoogle Scholar
  42. Christina Pavlopoulou and Michal Young. 1999. Residual test coverage monitoring. In ICSE. 277-284.Google ScholarGoogle Scholar
  43. Fayola Peters, Tim Menzies, Liang Gong, and Hongyu Zhang. 2013. Balancing privacy and utility in cross-company defect prediction. TSE 39, 8 ( 2013 ), 1054-1068.Google ScholarGoogle Scholar
  44. Sable. 2020. Soot-A framework for analyzing and transforming Java and Android applications. https://soot-oss.github.io/ soot.Google ScholarGoogle Scholar
  45. ACM SIGACT/EATCS. 2017. Gödel Prize. https://sigact.org/prizes/g%C3%B6del/citation2017.pdf.Google ScholarGoogle Scholar
  46. Abhradeep Guha Thakurta, Andrew H Vyrros, Umesh S Vaishampayan, Gaurav Kapoor, Julien Freudiger, Vivek Rangarajan Sridhar, and Doug Davidson. 2017. Learning new words. In Granted US Patents 9594741 and 9645998.Google ScholarGoogle Scholar
  47. Uber. 2017. Uber releases open source project for diferential privacy. https://medium.com /uber-security-privacy/diferentialprivacy-open-source-7892c82c42b6.Google ScholarGoogle Scholar
  48. Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. 2017. Locally diferentially private protocols for frequency estimation. In USENIX Security. 729-745.Google ScholarGoogle Scholar
  49. Tianhao Wang, Milan Lopuhaä-Zwakenberg, Zitao Li, Boris Skoric, and Ninghui Li. 2020. Consistent and accurate frequency oracles under local diferential privacy. In NDSS. 1-16.Google ScholarGoogle Scholar
  50. Yuxin Wang, Zeyu Ding, Guanhong Wang, Daniel Kifer, and Danfeng Zhang. 2019. Proving diferential privacy with shadow execution. In PLDI. 655-669.Google ScholarGoogle Scholar
  51. Stanley Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 309, 60 ( 1965 ), 63-69.Google ScholarGoogle Scholar
  52. Danfeng Zhang and Daniel Kifer. 2017. LightDP: Towards automating diferential privacy proofs. In PLDI. 888-901.Google ScholarGoogle Scholar
  53. Hailong Zhang, Yu Hao, Sufian Latif, Raef Bassily, and Atanas Rountev. 2020a. A study of event frequency profiling with diferential privacy. In CC. 51-62.Google ScholarGoogle Scholar
  54. Hailong Zhang, Sufian Latif, Raef Bassily, and Atanas Rountev. 2020b. Diferentially-Private Control-Flow Node Coverage for Software Usage Analysis. In USENIX Security. 1021-1038.Google ScholarGoogle Scholar

Index Terms

  1. Differentially-private software frequency profiling under linear constraints

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Programming Languages
        Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
        November 2020
        3108 pages
        EISSN:2475-1421
        DOI:10.1145/3436718
        Issue’s Table of Contents

        Copyright © 2020 Owner/Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 November 2020
        Published in pacmpl Volume 4, Issue OOPSLA

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!