skip to main content
research-article

Online Algorithm for Differentially Private Genome-wide Association Studies

Authors Info & Claims
Published:05 March 2021Publication History
Skip Abstract Section

Abstract

Digitization of healthcare records contributed to a large volume of functional scientific data that can help researchers to understand the behaviour of many diseases. However, the privacy implications of this data, particularly genomics data, have surfaced recently as the collection, dissemination, and analysis of human genomics data is highly sensitive. There have been multiple privacy attacks relying on the uniqueness of the human genome that reveals a participant or a certain group’s presence in a dataset. Therefore, the current data sharing policies have ruled out any public dissemination and adopted precautionary measures prior to genomics data release, which hinders timely scientific innovation. In this article, we investigate an approach that only releases the statistics from genomic data rather than the whole dataset and propose a generalized Differentially Private mechanism for Genome-wide Association Studies (GWAS). Our method provides a quantifiable privacy guarantee that adds noise to the intermediate outputs but ensures satisfactory accuracy of the private results. Furthermore, the proposed method offers multiple adjustable parameters that the data owners can set based on the optimal privacy requirements. These variables are presented as equalizers that balance between the privacy and utility of the GWAS. The method also incorporates Online Bin Packing technique [1], which further bounds the privacy loss linearly, growing according to the number of open bins and scales with the incoming queries. Finally, we implemented and benchmarked our approach using seven different GWAS studies to test the performance of the proposed methods. The experimental results demonstrate that for 1,000 arbitrary online queries, our algorithms are more than 80% accurate with reasonable privacy loss and exceed the state-of-the-art approaches on multiple studies (i.e., EigenStrat, LMM, TDT).

Skip Supplemental Material Section

Supplemental Material

References

  1. Joan Boyar, Shahin Kamali, Kim S. Larsen, and Alejandro López-Ortiz. 2016. Online bin packing with advice. Algorithmica 74, 1 (2016), 507--527.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Robert H. Miller and Ida Sim. 2004. Physicians’ use of electronic medical records: Barriers and solutions. Health Affairs 23, 2 (2004), 116--126.Google ScholarGoogle ScholarCross RefCross Ref
  3. Guy Paré, Louis Raymond, Ana Ortiz de Guinea, Placide Poba-Nzaou, Marie-Claude Trudel, Josianne Marsan, and Thomas Micheneau. 2015. Electronic health record usage behaviors in primary care medical practices: A survey of family physicians in Canada. Int. J. Med. Inform. 84, 10 (2015), 857--867.Google ScholarGoogle ScholarCross RefCross Ref
  4. Muhammad Naveed, Erman Ayday, Ellen W. Clayton, Jacques Fellay, Carl A. Gunter, Jean-Pierre Hubaux, Bradley A. Malin, and XiaoFeng Wang. 2015. Privacy in the genomic era. ACM Comput. Surveys 48, 1 (2015), 6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Md Momin Al Aziz, Md Nazmus Sadat, Dima Alhadidi, Shuang Wang, Xiaoqian Jiang, Cheryl L. Brown, and Noman Mohammed. 2019. Privacy-preserving techniques of genomic data—A survey. Brief. Bioinform. 20, 3 (2019), 887--895. https://doi.org/10.1093/bib/bbx139Google ScholarGoogle Scholar
  6. Alexandros Mittos, Bradley Malin, and Emiliano De Cristofaro. 2019. Systematizing genome privacy research: A privacy-enhancing technologies perspective. Proc. Privacy Enhanc. Technol. 2019, 1 (2019), 87--107.Google ScholarGoogle ScholarCross RefCross Ref
  7. Bradley Malin, Kenneth Goodman et al. 2018. Between access and privacy: Challenges in sharing health data. Yearbook Med. Info. 27, 1 (2018), 055--059.Google ScholarGoogle Scholar
  8. The Personal Information Protection and Electronic Documents Act (PIPEDA). [n.d.]. Retrieved from https://goo.gl/TScuoW.Google ScholarGoogle Scholar
  9. Peter Kilbridge. 2003. The cost of HIPAA compliance. New England J. Med. 348, 15 (2003), 1423.Google ScholarGoogle ScholarCross RefCross Ref
  10. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography. Springer, 265--284.Google ScholarGoogle Scholar
  11. Cynthia Dwork. 2006. Differential privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming—Volume Part II (ICALP’06). 1--12.Google ScholarGoogle Scholar
  12. J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. 2014. Differential privacy: An economic method for choosing epsilon. In Proceedings of the IEEE 27th Computer Security Foundations Symposium. 398--410.Google ScholarGoogle Scholar
  13. Andreas Haeberlen, Benjamin C. Pierce, and Arjun Narayan. 2011. Differential privacy under fire. In Proceedings of the USENIX Security Symposium.Google ScholarGoogle Scholar
  14. Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIXSecurity’14). 17--32.Google ScholarGoogle Scholar
  15. Md Momin Al Aziz, Reza Ghasemi, Md Waliullah, and Noman Mohammed. 2017. Aftermath of bustamante attack on genomic beacon service. BMC Med. Genom. 10, 2 (2017), 43.Google ScholarGoogle Scholar
  16. Moritz Hardt and Guy N. Rothblum. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS’10). IEEE, 61--70.Google ScholarGoogle Scholar
  17. Fei Yu, Michal Rybar, Caroline Uhler, and Stephen E. Fienberg. 2014. Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In Proceedings of the International Conference on Privacy in Statistical Databases. Springer, 170--184.Google ScholarGoogle Scholar
  18. Shuang Wang, Noman Mohammed, and Rui Chen. 2014. Differentially private genome data dissemination through top-down specialization. BMC Med. Info. Decision Making 14, 1 (2014), S2.Google ScholarGoogle Scholar
  19. Caroline Uhlerop, Aleksandra Slavković, and Stephen E. Fienberg. 2013. Privacy-preserving data sharing for genome-wide association studies. J. Privacy Confidential. 5, 1 (2013), 137.Google ScholarGoogle Scholar
  20. Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1079--1087.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yuichi Sei and Akihiko Ohsuga. 2017. Privacy-preserving Chi-squared testing for genome SNP databases. In Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’17). IEEE, 3884--3889.Google ScholarGoogle Scholar
  22. Florian Tramèr, Zhicong Huang, Jean-Pierre Hubaux, and Erman Ayday. 2015. Differential privacy with bounded priors: Reconciling utility and privacy in genome-wide association studies. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 1286--1297.Google ScholarGoogle Scholar
  23. Sean Simmons and Bonnie Berger. 2016. Realizing privacy preserving genome-wide association studies. Bioinformatics 32, 9 (2016), 1293--1300.Google ScholarGoogle ScholarCross RefCross Ref
  24. Fei Yu, Stephen E. Fienberg, Aleksandra B. Slavković, and Caroline Uhler. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50 (2014), 133--141.Google ScholarGoogle ScholarCross RefCross Ref
  25. Sean Simmons, Cenk Sahinalp, and Bonnie Berger. 2016. Enabling privacy-preserving GWASs in heterogeneous human populations. Cell Syst. 3, 1 (2016), 54--61.Google ScholarGoogle ScholarCross RefCross Ref
  26. Meng Wang, Zhanglong Ji, Shuang Wang, Jihoon Kim, Hai Yang, Xiaoqian Jiang, and Lucila Ohno-Machado. 2017. Mechanisms to protect the privacy of families when using the transmission disequilibrium test in genome-wide association studies. Bioinformatics 33, 23 (2017), 3716--3725.Google ScholarGoogle Scholar
  27. Md Nazmus Sadat, Md Momin Al Aziz, Noman Mohammed, Feng Chen, Xiaoqian Jiang, and Shuang Wang. 2019. SAFETY: Secure GWAS in federated environment through a hYbrid solution. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 1 (2019), 93--102. DOI:10.1109/TCBB.2018.2829760Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Junfeng Fan and Frederik Vercauteren. 2012. Somewhat practical fully homomorphic encryption. IACR Cryptol. ePrint Arch. 2012 (2012), 144.Google ScholarGoogle Scholar
  29. Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3–4 (2014), 211--407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Frank D. McSherry. 2009. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 19--30.Google ScholarGoogle Scholar
  31. Indrajit Roy, Srinath T. V. Setty, Ann Kilzer, Vitaly Shmatikov, and Emmett Witchel. 2010. Airavat: Security and privacy for MapReduce. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’10), Vol. 10. 297--312.Google ScholarGoogle Scholar
  32. Jean Louis Raisaro, Juan Ramón Troncoso-Pastoriza, Mickaël Misbach, João Sá Sousa, Sylvain Pradervand, Edoardo Missiaglia, Olivier Michielin, Bryan Ford, and Jean-Pierre Hubaux. 2018. Med Co: Enabling secure and privacy-preserving exploration of distributed clinical and genomic data. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 4 (2018), 1328--1341.Google ScholarGoogle Scholar
  33. Jean Louis Raisaro, Gwangbae Choi, Sylvain Pradervand, Raphael Colsenet, Nathalie Jacquemont, Nicolas Rosat, Vincent Mooser, and Jean-Pierre Hubaux. 2018. Protecting privacy and security of genomic data in I2B2 with homomorphic encryption and differential privacy. IEEE/ACM Trans. Comput. Biol. Bioinform. 15, 5 (2018), 1413--1426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Greg Gibson. 2018. Population genetics and GWAS: A primer. PLoS Biol. 16, 3 (2018), e2005485.Google ScholarGoogle Scholar
  35. A. J. Paverd, Andrew Martin, and Ian Brown. 2014. Modelling and automatically analysing privacy properties for honest-but-curious adversaries. Technical Report.Google ScholarGoogle Scholar
  36. Harmonic Series. [n.d.]. Retrieved from https://en.wikipedia.org/wiki/Harmonic_series_(mathematics).Google ScholarGoogle Scholar
  37. Eric W. Weisstein. [n.d.]. Block-Stacking problem. https://mathworld.wolfram.com/BookStackingProblem.html.Google ScholarGoogle Scholar
  38. Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2017. The composition theorem for differential privacy. IEEE Trans. Info. Theory 63, 6 (2017), 4037--4049.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Stanley L. Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63--69.Google ScholarGoogle ScholarCross RefCross Ref
  40. Laura Clarke, Xiangqun Zheng-Bradley, Richard Smith, Eugene Kulesha, Chunlin Xiao, Iliana Toneva, Brendan Vaughan, Don Preuss, Rasko Leinonen, Martin Shumway, et al. 2012. The 1,000 genomes project: Data management and community access. Nature Methods 9, 5 (2012), 459.Google ScholarGoogle ScholarCross RefCross Ref
  41. Differential Privacy GWAS-implementation. [n.d.]. Retrieved from https://github.com/mominbuet/DifferentialPrivacyGWAS.Google ScholarGoogle Scholar
  42. Lon R. Cardon and Lyle J. Palmer. 2003. Population stratification and spurious allelic association. Lancet 361, 9357 (2003), 598--604.Google ScholarGoogle Scholar
  43. Nour Almadhoun, Erman Ayday, and Özgür Ulusoy. 2020. Inference attacks against differentially private query results from genomic datasets including dependent tuples. Bioinformatics 36, Supplement 1 (2020), i136–i145.Google ScholarGoogle Scholar
  44. William S. Bush and Jason H. Moore. 2012. Genome-wide association studies. PLoS Comput. Biol. 8, 12 (2012), e1002822.Google ScholarGoogle ScholarCross RefCross Ref
  45. Steven S. Seiden. 2002. On the online bin packing problem. J. ACM 49, 5 (2002), 640--671.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. R. Garey and D. S. Johnson. 1981. Approximation algorithms for Bin packing problems: A survey. In Analysis and Design of Algorithms in Combinatorial Optimization. International Centre for Mechanical Sciences (Courses and Lectures), vol 266, G. Ausiello and M. Lucertini (Eds.). Springer. DOI:https://doi.org/10.1007/978-3-7091-2748-3_8Google ScholarGoogle Scholar

Index Terms

  1. Online Algorithm for Differentially Private Genome-wide Association Studies

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!