skip to main content
research-article

Quantifying Interdependent Risks in Genomic Privacy

Published:06 February 2017Publication History
Skip Abstract Section

Abstract

The rapid progress in human-genome sequencing is leading to a high availability of genomic data. These data is notoriously very sensitive and stable in time, and highly correlated among relatives. In this article, we study the implications of these familial correlations on kin genomic privacy. We formalize the problem and detail efficient reconstruction attacks based on graphical models and belief propagation. With our approach, an attacker can infer the genomes of the relatives of an individual whose genome or phenotype are observed by notably relying on Mendel’s Laws, statistical relationships between the genomic variants, and between the genome and the phenotype. We evaluate the effect of these dependencies on privacy with respect to the amount of observed variants and the relatives sharing them. We also study how the algorithmic performance evolves when we take these various relationships into account. Furthermore, to quantify the level of genomic privacy as a result of the proposed inference attack, we discuss possible definitions of genomic privacy metrics, and compare their values and evolution. Genomic data reveals Mendelian disorders and the likelihood of developing severe diseases, such as Alzheimer’s. We also introduce the quantification of health privacy, specifically, the measure of how well the predisposition to a disease is concealed from an attacker. We evaluate our approach on actual genomic data from a pedigree and show the threat extent by combining data gathered from a genome-sharing website as well as an online social network.

References

  1. Dakshi Agrawal and Charu C. Aggarwal. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 247--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Erman Ayday, Emiliano De Cristofaro, Jean-Pierre Hubaux, and Gene Tsudik. 2015. Whole genome sequencing: Revolutionary medicine or privacy nightmare? Computer 2, 58--66.Google ScholarGoogle ScholarCross RefCross Ref
  3. Erman Ayday, A. Einolghozati, and Faramarz Fekri. 2012. BPRS: Belief Propagation based iterative Recommender System. In IEEE ISIT.Google ScholarGoogle Scholar
  4. Erman Ayday and Faramarz Fekri. 2012a. Belief propagation based iterative trust and reputation management. IEEE Transactions on Dependable and Secure Computing 9, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Erman Ayday and Faramarz Fekri. 2012b. BP-P2P: A belief propagation-based trust and reputation management for P2P networks. In SECON.Google ScholarGoogle Scholar
  6. Erman Ayday, Jean Louis Raisaro, Urs Hengartner, Adam Molyneaux, and Jean-Pierre Hubaux. 2013a. Privacy-preserving processing of raw genomic data. In DPM’13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Erman Ayday, Jean Louis Raisaro, Jean-Pierre Hubaux, and Jacques Rougemont. 2013b. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Erman Ayday, Jean Louis Raisaro, Paul J. McLaren, Jacques Fellay, and Jean-Pierre Hubaux. 2013c. Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. HealthTech. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik. 2011. Countering GATTACA: Efficient and secure testing of fully-sequenced human genomes. In CCS’11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marina Blanton and Mehrdad Aliasgari. 2010. Secure outsourcing of DNA searching via finite automata. In DBSec’10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fons Bruekers, Stefan Katzenbeisser, Klaus Kursawe, and Pim Tuyls. 2008. Privacy-Preserving Matching of DNA Profiles. IACR Cryptology ePrint Archive 2008 (2008), 203.Google ScholarGoogle Scholar
  12. Joshua T. Burdick, Wei-Min Chen, Gonçalo R. Abecasis, and Vivian G. Cheung. 2006. In silico method for inferring genotypes in pedigrees. Nature Genetics 38, 9, 1002--1004.Google ScholarGoogle ScholarCross RefCross Ref
  13. Christopher A. Cassa, Brian Schmidt, Isaac S. Kohane, and Kenneth D. Mandl. 2008. My sister’s keeper?: Genomic research and the identifiability of siblings. BMC Medical Genomics 1, 1, 32.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jinghu Chen, Ajay Dholakia, Evangelos Eleflhetiou, Mac P. C. Fossotier, and Xiao-Yu Hu. 2002. Near optimum reduced-complexity decoding algorithm for LDPC codes. In IEEE ISIT’02.Google ScholarGoogle Scholar
  15. Yangyi Chen, Bo Peng, XiaoFeng Wang, and Haixu Tang. 2012. Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. In NDSS’12.Google ScholarGoogle Scholar
  16. Scott D. Constable, Yuzhe Tang, Shuang Wang, Xiaoqian Jiang, and Steve Chapin. 2015. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Medical Informatics and Decision Making 15, Suppl 5, S2.Google ScholarGoogle ScholarCross RefCross Ref
  17. George Danezis and Emiliano De Cristofaro. 2014. Fast and private genomic testing for disease susceptibility. Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Claudia Diaz, Stefaan Seys, Joris Claessens, and Bart Preneel. 2003. Towards measuring anonymity. In Privacy Enhancing Technologies. Springer, 54--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mentari Djatmiko, Arik Friedman, Roksana Boreli, Felix Lawrence, Brian Thorne, and Stephen Hardy. 2014. Secure evaluation protocol for personalized medicine. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Radoje Drmanac, Andrew B. Sparks, Matthew J. Callow, Aaron L. Halpern, Norman L. Burns, Bahram G. Kermani, Paolo Carnevali, Igor Nazarenko, and others. 2010. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 5961, 8--81.Google ScholarGoogle Scholar
  21. Douglas S. Falconer and Trudy F. C. Mackay. 1996. Introduction to Quantitative Genetics (4th ed.). Addison Wesley Longman, Harlow, Essex, UK.Google ScholarGoogle Scholar
  22. Stephen E. Fienberg, Aleksandra Slavkovic, and Caroline Uhler. 2011. Privacy preserving GWAS data sharing. In Proceedings of the IEEE 11th International Conference on Data Mining Workshops (ICDMW’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Maayan Fishelson and Dan Geiger. 2002. Exact genetic linkage computations for general pedigrees. Bioinformatics 18, Suppl 1, S189--S198.Google ScholarGoogle ScholarCross RefCross Ref
  24. Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2013. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jane Gitschier. 2009. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. The American Journal of Human Genetics 84, 2.Google ScholarGoogle ScholarCross RefCross Ref
  26. Bastian Greshake, Philipp E. Bayer, Helge Rausch, and Julia Reda. 2014. OpenSNP--a crowdsourced web resource for personal genomics. PLoS One 9, 3, e89204.Google ScholarGoogle ScholarCross RefCross Ref
  27. Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin, and Yaniv Erlich. 2013. Identifying personal genomes by surname inference. Science 339, 6117.Google ScholarGoogle ScholarCross RefCross Ref
  28. Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V. Pearson, Dietrich A. Stephan, Stanley F. Nelson, and David W. Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4.Google ScholarGoogle Scholar
  29. Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2013. Addressing the concerns of the Lacks family: Quantification of kin genomic privacy. In Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2014. Reconciling utility with privacy in genomics. Proceedings of the ACM Workshop on Privacy in the Electronic Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2015a. On non-cooperative genomic privacy. In International Conference on Financial Cryptography and Data Security. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  32. Mathias Humbert, Kévin Huguenin, Joachim Hugonot, Erman Ayday, and Jean-Pierre Hubaux. 2015b. De-anonymizing genomic databases using phenotypic traits. In Proceedings on Privacy Enhancing Technologies (PoPETs’15).Google ScholarGoogle ScholarCross RefCross Ref
  33. Claus Skaanning Jensen, Augustine Kong, and Uffe Kjærulff. 1995. Blocking Gibbs sampling in very large probabilistic expert systems. International Journal of Human Computer Studies 42, 6, 647--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Finn V. Jensen and Frank Jensen. 1994. Optimal junction trees. In Proceedings of the 10th International Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, 360--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Somesh Jha, Louis Kruger, and Vitaly Shmatikov. 2008. Towards practical privacy for genomic computation. In Proceedings of the 2008 IEEE Symposium on Security and Privacy 216--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xiaoqian Jiang, Yongan Zhao, Xiaofeng Wang, Bradley Malin, Shuang Wang, Lucila Ohno-Machado, and Haixu Tang. 2014. A community assessment of privacy preserving techniques for human genomes. BMC Medical Informatics and Decision Making 14, Suppl 1, S1.Google ScholarGoogle ScholarCross RefCross Ref
  37. Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Andrew D. Johnson and Christopher J. O’Donnell. 2009. An open access database of genome-wide association results. BMC Medical Genetics 10, 6.Google ScholarGoogle ScholarCross RefCross Ref
  39. Michael I. Jordan. 2004. Graphical models. Statistical Science 140--155.Google ScholarGoogle Scholar
  40. Murat Kantarcioglu, Wei Jiang, Ying Liu, and Brad Malin. 2008. A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on Information Technology in Biomedicine 12, 5, 606--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Nikolaos Karvelas, Andreas Peter, Stefan Katzenbeisser, Erik Tews, and Kay Hamacher. 2014. Privacy-preserving whole genome sequence processing through proxy-aided oram. In Proceedings of the 13th Workshop on Privacy in the Electronic Society. ACM, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Miran Kim and Kristin Lauter. 2015. Private genome analysis through homomorphic encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S3.Google ScholarGoogle ScholarCross RefCross Ref
  43. Bonnie Kirkpatrick, Eran Halperin, and Richard M. Karp. 2010. Haplotype inference in complex pedigrees. Journal of Computational Biology 17, 3, 269--280.Google ScholarGoogle ScholarCross RefCross Ref
  44. Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Frank R. Kschischang, Brenda J. Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Steffen L. Lauritzen and Nuala A. Sheehan. 2003. Graphical models for genetic analyses. Statistical Science 489--514.Google ScholarGoogle Scholar
  47. Yun Li, Cristen Willer, Serena Sanna, and Gonçalo Abecasis. 2009. Genotype imputation. Annual Review of Genomics and Human Genetics 10, 387.Google ScholarGoogle ScholarCross RefCross Ref
  48. Wen-Jie Lu, Yoshiji Yamada, and Jun Sakuma. 2015. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S1.Google ScholarGoogle ScholarCross RefCross Ref
  49. Joris M. Mooij and Hilbert J. Kappen. 2007. Sufficient conditions for convergence of the sum--product algorithm. IEEE Transactions on Information Theory 53, 12, 4422--4437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Kevin Murphy and others. 2001. The Bayes net toolbox for MATLAB. Computing Science and Statistics 33, 2, 1024--1034.Google ScholarGoogle Scholar
  51. Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, 467--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Hossein Pishro-Nik and Faramarz Fekri. 2004. On decoding of low-density parity-check codes on the binary erasure channel. IEEE Transactions on Information Theory 50, 439--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sahel Shariati Samani, Zhicong Huang, Erman Ayday, Mark Elliot, Jacques Fellay, Jean-Pierre Hubaux, and Zoltan Kutalik. 2015. Quantifying genomic privacy via inference attack with high-order SNV correlations. In IEEE Security and Privacy Workshops (SPW’15). IEEE, 32--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Andrei Serjantov and George Danezis. 2003. Towards an information theoretic metric for anonymity. In Privacy Enhancing Technologies. Springer, 41--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Nuala A. Sheehan. 2000. On the application of Markov chain Monte Carlo methods to genetic analyses on complex pedigrees. International Statistical Review 68, 1, 83--110.Google ScholarGoogle ScholarCross RefCross Ref
  57. Reza Shokri, George Theodorakopoulos, J.-Y. Le Boudec, and J.-P. Hubaux. 2011. Quantifying location privacy. In IEEE Symposium on Security and Privacy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Frank Stajano, Lucia Bianchi, Pietro Liò, and Douwe Korff. 2008. Forensic genomics: Kin privacy, driftnets and other open questions. In Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Latanya Sweeney, Akua Abu, and Julia Winn. 2013. Identifying participants in the personal genome project by name. Available at SSRN 2257732.Google ScholarGoogle Scholar
  60. Alun Thomas, Alexander Gutin, Victor Abkevich, and Aruna Bansal. 2000. Multilocus linkage analysis by blocked Gibbs sampling. Statistics and Computing 10, 3, 259--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Juan Ramón Troncoso-Pastoriza, Stefan Katzenbeisser, and Mehmet Celik. 2007. Privacy preserving error resilient DNA searching through oblivious automata. Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Isabel Wagner. 2015. Genomic privacy metrics: A systematic comparison. International Workshop on Genome Privacy and Security (in Conjunction with IEEE Symposium on Security and Privacy). Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Rui Wang, Yong Fuga Li, XiaoFeng Wang, Haixu Tang, and Xiaoyong Zhou. 2009a. Learning your identity and disease from research papers: Information leaks in genome wide association study. Proceedings of the 16th ACM CCS (2009), 534--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Rui Wang, XiaoFeng Wang, Zhou Li, Haixu Tang, Michael K. Reiter, and Zheng Dong. 2009b. Privacy-preserving genomic computation through program specialization. Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09), 338--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Shuang Wang, Noman Mohammed, and Rui Chen. 2014. Differentially private genome data dissemination through top-down specialization. BMC Medical Informatics and Decision Making 14, Suppl 1, S2.Google ScholarGoogle ScholarCross RefCross Ref
  66. Wei Xie, Murat Kantarcioglu, William S. Bush, Dana Crawford, Joshua C. Denny, Raymond Heatherly, and Bradley A. Malin. 2014. SecureMA: Protecting participant privacy in genetic association meta-analysis. Bioinformatics 30, 23, 133--141.Google ScholarGoogle ScholarCross RefCross Ref
  67. Fei Yu, Stephen E. Fienberg, Aleksandra B. Slavkovic, and Caroline Uhler. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. Journal of Biomedical Informatics 50, 133--141.Google ScholarGoogle ScholarCross RefCross Ref
  68. Fei Yu and Zhanglong Ji. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies: An application to iDASH healthcare privacy protection challenge. BMC Medical Informatics and Decision Making 14, Suppl 1, S3.Google ScholarGoogle ScholarCross RefCross Ref
  69. Yihua Zhang, Marina Blanton, and Ghada Almashaqbeh. 2015a. Secure distributed genome analysis for GWAS and sequence comparison computation. BMC Medical Informatics and Decision Making 15, Suppl 5, S4.Google ScholarGoogle ScholarCross RefCross Ref
  70. Yuchen Zhang, Wenrui Dai, Xiaoqian Jiang, Hongkai Xiong, and Shuang Wang. 2015b. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S5.Google ScholarGoogle ScholarCross RefCross Ref
  71. Xiaoyong Zhou, Bo Peng, Yong Fuga Li, Yangyi Chen, Haixu Tang, and XiaoFeng Wang. 2011. To release or not to release: Evaluating information leaks in aggregate human-genome data. In ESORICS’11. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Quantifying Interdependent Risks in Genomic Privacy

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Privacy and Security
          ACM Transactions on Privacy and Security  Volume 20, Issue 1
          February 2017
          99 pages
          ISSN:2471-2566
          EISSN:2471-2574
          DOI:10.1145/3038258
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 February 2017
          • Accepted: 1 November 2016
          • Revised: 1 June 2016
          • Received: 1 December 2015
          Published in tops Volume 20, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!