Abstract
The rapid progress in human-genome sequencing is leading to a high availability of genomic data. These data is notoriously very sensitive and stable in time, and highly correlated among relatives. In this article, we study the implications of these familial correlations on kin genomic privacy. We formalize the problem and detail efficient reconstruction attacks based on graphical models and belief propagation. With our approach, an attacker can infer the genomes of the relatives of an individual whose genome or phenotype are observed by notably relying on Mendel’s Laws, statistical relationships between the genomic variants, and between the genome and the phenotype. We evaluate the effect of these dependencies on privacy with respect to the amount of observed variants and the relatives sharing them. We also study how the algorithmic performance evolves when we take these various relationships into account. Furthermore, to quantify the level of genomic privacy as a result of the proposed inference attack, we discuss possible definitions of genomic privacy metrics, and compare their values and evolution. Genomic data reveals Mendelian disorders and the likelihood of developing severe diseases, such as Alzheimer’s. We also introduce the quantification of health privacy, specifically, the measure of how well the predisposition to a disease is concealed from an attacker. We evaluate our approach on actual genomic data from a pedigree and show the threat extent by combining data gathered from a genome-sharing website as well as an online social network.
- Dakshi Agrawal and Charu C. Aggarwal. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 247--255. Google Scholar
Digital Library
- Erman Ayday, Emiliano De Cristofaro, Jean-Pierre Hubaux, and Gene Tsudik. 2015. Whole genome sequencing: Revolutionary medicine or privacy nightmare? Computer 2, 58--66.Google Scholar
Cross Ref
- Erman Ayday, A. Einolghozati, and Faramarz Fekri. 2012. BPRS: Belief Propagation based iterative Recommender System. In IEEE ISIT.Google Scholar
- Erman Ayday and Faramarz Fekri. 2012a. Belief propagation based iterative trust and reputation management. IEEE Transactions on Dependable and Secure Computing 9, 3. Google Scholar
Digital Library
- Erman Ayday and Faramarz Fekri. 2012b. BP-P2P: A belief propagation-based trust and reputation management for P2P networks. In SECON.Google Scholar
- Erman Ayday, Jean Louis Raisaro, Urs Hengartner, Adam Molyneaux, and Jean-Pierre Hubaux. 2013a. Privacy-preserving processing of raw genomic data. In DPM’13. Google Scholar
Digital Library
- Erman Ayday, Jean Louis Raisaro, Jean-Pierre Hubaux, and Jacques Rougemont. 2013b. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’13). Google Scholar
Digital Library
- Erman Ayday, Jean Louis Raisaro, Paul J. McLaren, Jacques Fellay, and Jean-Pierre Hubaux. 2013c. Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. HealthTech. Google Scholar
Digital Library
- Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik. 2011. Countering GATTACA: Efficient and secure testing of fully-sequenced human genomes. In CCS’11. Google Scholar
Digital Library
- Marina Blanton and Mehrdad Aliasgari. 2010. Secure outsourcing of DNA searching via finite automata. In DBSec’10. Google Scholar
Digital Library
- Fons Bruekers, Stefan Katzenbeisser, Klaus Kursawe, and Pim Tuyls. 2008. Privacy-Preserving Matching of DNA Profiles. IACR Cryptology ePrint Archive 2008 (2008), 203.Google Scholar
- Joshua T. Burdick, Wei-Min Chen, Gonçalo R. Abecasis, and Vivian G. Cheung. 2006. In silico method for inferring genotypes in pedigrees. Nature Genetics 38, 9, 1002--1004.Google Scholar
Cross Ref
- Christopher A. Cassa, Brian Schmidt, Isaac S. Kohane, and Kenneth D. Mandl. 2008. My sister’s keeper?: Genomic research and the identifiability of siblings. BMC Medical Genomics 1, 1, 32.Google Scholar
Cross Ref
- Jinghu Chen, Ajay Dholakia, Evangelos Eleflhetiou, Mac P. C. Fossotier, and Xiao-Yu Hu. 2002. Near optimum reduced-complexity decoding algorithm for LDPC codes. In IEEE ISIT’02.Google Scholar
- Yangyi Chen, Bo Peng, XiaoFeng Wang, and Haixu Tang. 2012. Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds. In NDSS’12.Google Scholar
- Scott D. Constable, Yuzhe Tang, Shuang Wang, Xiaoqian Jiang, and Steve Chapin. 2015. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Medical Informatics and Decision Making 15, Suppl 5, S2.Google Scholar
Cross Ref
- George Danezis and Emiliano De Cristofaro. 2014. Fast and private genomic testing for disease susceptibility. Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’14). Google Scholar
Digital Library
- Claudia Diaz, Stefaan Seys, Joris Claessens, and Bart Preneel. 2003. Towards measuring anonymity. In Privacy Enhancing Technologies. Springer, 54--68. Google Scholar
Digital Library
- Mentari Djatmiko, Arik Friedman, Roksana Boreli, Felix Lawrence, Brian Thorne, and Stephen Hardy. 2014. Secure evaluation protocol for personalized medicine. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (WPES’14). Google Scholar
Digital Library
- Radoje Drmanac, Andrew B. Sparks, Matthew J. Callow, Aaron L. Halpern, Norman L. Burns, Bahram G. Kermani, Paolo Carnevali, Igor Nazarenko, and others. 2010. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 5961, 8--81.Google Scholar
- Douglas S. Falconer and Trudy F. C. Mackay. 1996. Introduction to Quantitative Genetics (4th ed.). Addison Wesley Longman, Harlow, Essex, UK.Google Scholar
- Stephen E. Fienberg, Aleksandra Slavkovic, and Caroline Uhler. 2011. Privacy preserving GWAS data sharing. In Proceedings of the IEEE 11th International Conference on Data Mining Workshops (ICDMW’11). Google Scholar
Digital Library
- Maayan Fishelson and Dan Geiger. 2002. Exact genetic linkage computations for general pedigrees. Bioinformatics 18, Suppl 1, S189--S198.Google Scholar
Cross Ref
- Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2013. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security’13). Google Scholar
Digital Library
- Jane Gitschier. 2009. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. The American Journal of Human Genetics 84, 2.Google Scholar
Cross Ref
- Bastian Greshake, Philipp E. Bayer, Helge Rausch, and Julia Reda. 2014. OpenSNP--a crowdsourced web resource for personal genomics. PLoS One 9, 3, e89204.Google Scholar
Cross Ref
- Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin, and Yaniv Erlich. 2013. Identifying personal genomes by surname inference. Science 339, 6117.Google Scholar
Cross Ref
- Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V. Pearson, Dietrich A. Stephan, Stanley F. Nelson, and David W. Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4.Google Scholar
- Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2013. Addressing the concerns of the Lacks family: Quantification of kin genomic privacy. In Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS’13). Google Scholar
Digital Library
- Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2014. Reconciling utility with privacy in genomics. Proceedings of the ACM Workshop on Privacy in the Electronic Society. Google Scholar
Digital Library
- Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2015a. On non-cooperative genomic privacy. In International Conference on Financial Cryptography and Data Security. Springer.Google Scholar
Cross Ref
- Mathias Humbert, Kévin Huguenin, Joachim Hugonot, Erman Ayday, and Jean-Pierre Hubaux. 2015b. De-anonymizing genomic databases using phenotypic traits. In Proceedings on Privacy Enhancing Technologies (PoPETs’15).Google Scholar
Cross Ref
- Claus Skaanning Jensen, Augustine Kong, and Uffe Kjærulff. 1995. Blocking Gibbs sampling in very large probabilistic expert systems. International Journal of Human Computer Studies 42, 6, 647--666. Google Scholar
Digital Library
- Finn V. Jensen and Frank Jensen. 1994. Optimal junction trees. In Proceedings of the 10th International Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, 360--366. Google Scholar
Digital Library
- Somesh Jha, Louis Kruger, and Vitaly Shmatikov. 2008. Towards practical privacy for genomic computation. In Proceedings of the 2008 IEEE Symposium on Security and Privacy 216--230. Google Scholar
Digital Library
- Xiaoqian Jiang, Yongan Zhao, Xiaofeng Wang, Bradley Malin, Shuang Wang, Lucila Ohno-Machado, and Haixu Tang. 2014. A community assessment of privacy preserving techniques for human genomes. BMC Medical Informatics and Decision Making 14, Suppl 1, S1.Google Scholar
Cross Ref
- Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. (2013). Google Scholar
Digital Library
- Andrew D. Johnson and Christopher J. O’Donnell. 2009. An open access database of genome-wide association results. BMC Medical Genetics 10, 6.Google Scholar
Cross Ref
- Michael I. Jordan. 2004. Graphical models. Statistical Science 140--155.Google Scholar
- Murat Kantarcioglu, Wei Jiang, Ying Liu, and Brad Malin. 2008. A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on Information Technology in Biomedicine 12, 5, 606--617. Google Scholar
Digital Library
- Nikolaos Karvelas, Andreas Peter, Stefan Katzenbeisser, Erik Tews, and Kay Hamacher. 2014. Privacy-preserving whole genome sequence processing through proxy-aided oram. In Proceedings of the 13th Workshop on Privacy in the Electronic Society. ACM, 1--10. Google Scholar
Digital Library
- Miran Kim and Kristin Lauter. 2015. Private genome analysis through homomorphic encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S3.Google Scholar
Cross Ref
- Bonnie Kirkpatrick, Eran Halperin, and Richard M. Karp. 2010. Haplotype inference in complex pedigrees. Journal of Computational Biology 17, 3, 269--280.Google Scholar
Cross Ref
- Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Frank R. Kschischang, Brenda J. Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47. Google Scholar
Digital Library
- Steffen L. Lauritzen and Nuala A. Sheehan. 2003. Graphical models for genetic analyses. Statistical Science 489--514.Google Scholar
- Yun Li, Cristen Willer, Serena Sanna, and Gonçalo Abecasis. 2009. Genotype imputation. Annual Review of Genomics and Human Genetics 10, 387.Google Scholar
Cross Ref
- Wen-Jie Lu, Yoshiji Yamada, and Jun Sakuma. 2015. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S1.Google Scholar
Cross Ref
- Joris M. Mooij and Hilbert J. Kappen. 2007. Sufficient conditions for convergence of the sum--product algorithm. IEEE Transactions on Information Theory 53, 12, 4422--4437. Google Scholar
Digital Library
- Kevin Murphy and others. 2001. The Bayes net toolbox for MATLAB. Computing Science and Statistics 33, 2, 1024--1034.Google Scholar
- Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, 467--475. Google Scholar
Digital Library
- Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., San Francisco, CA. Google Scholar
Digital Library
- Hossein Pishro-Nik and Faramarz Fekri. 2004. On decoding of low-density parity-check codes on the binary erasure channel. IEEE Transactions on Information Theory 50, 439--454. Google Scholar
Digital Library
- Sahel Shariati Samani, Zhicong Huang, Erman Ayday, Mark Elliot, Jacques Fellay, Jean-Pierre Hubaux, and Zoltan Kutalik. 2015. Quantifying genomic privacy via inference attack with high-order SNV correlations. In IEEE Security and Privacy Workshops (SPW’15). IEEE, 32--40. Google Scholar
Digital Library
- Andrei Serjantov and George Danezis. 2003. Towards an information theoretic metric for anonymity. In Privacy Enhancing Technologies. Springer, 41--53. Google Scholar
Digital Library
- Nuala A. Sheehan. 2000. On the application of Markov chain Monte Carlo methods to genetic analyses on complex pedigrees. International Statistical Review 68, 1, 83--110.Google Scholar
Cross Ref
- Reza Shokri, George Theodorakopoulos, J.-Y. Le Boudec, and J.-P. Hubaux. 2011. Quantifying location privacy. In IEEE Symposium on Security and Privacy. Google Scholar
Digital Library
- Frank Stajano, Lucia Bianchi, Pietro Liò, and Douwe Korff. 2008. Forensic genomics: Kin privacy, driftnets and other open questions. In Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society. Google Scholar
Digital Library
- Latanya Sweeney, Akua Abu, and Julia Winn. 2013. Identifying participants in the personal genome project by name. Available at SSRN 2257732.Google Scholar
- Alun Thomas, Alexander Gutin, Victor Abkevich, and Aruna Bansal. 2000. Multilocus linkage analysis by blocked Gibbs sampling. Statistics and Computing 10, 3, 259--269. Google Scholar
Digital Library
- Juan Ramón Troncoso-Pastoriza, Stefan Katzenbeisser, and Mehmet Celik. 2007. Privacy preserving error resilient DNA searching through oblivious automata. Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS’07). Google Scholar
Digital Library
- Isabel Wagner. 2015. Genomic privacy metrics: A systematic comparison. International Workshop on Genome Privacy and Security (in Conjunction with IEEE Symposium on Security and Privacy). Google Scholar
Digital Library
- Rui Wang, Yong Fuga Li, XiaoFeng Wang, Haixu Tang, and Xiaoyong Zhou. 2009a. Learning your identity and disease from research papers: Information leaks in genome wide association study. Proceedings of the 16th ACM CCS (2009), 534--544. Google Scholar
Digital Library
- Rui Wang, XiaoFeng Wang, Zhou Li, Haixu Tang, Michael K. Reiter, and Zheng Dong. 2009b. Privacy-preserving genomic computation through program specialization. Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09), 338--347. Google Scholar
Digital Library
- Shuang Wang, Noman Mohammed, and Rui Chen. 2014. Differentially private genome data dissemination through top-down specialization. BMC Medical Informatics and Decision Making 14, Suppl 1, S2.Google Scholar
Cross Ref
- Wei Xie, Murat Kantarcioglu, William S. Bush, Dana Crawford, Joshua C. Denny, Raymond Heatherly, and Bradley A. Malin. 2014. SecureMA: Protecting participant privacy in genetic association meta-analysis. Bioinformatics 30, 23, 133--141.Google Scholar
Cross Ref
- Fei Yu, Stephen E. Fienberg, Aleksandra B. Slavkovic, and Caroline Uhler. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. Journal of Biomedical Informatics 50, 133--141.Google Scholar
Cross Ref
- Fei Yu and Zhanglong Ji. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies: An application to iDASH healthcare privacy protection challenge. BMC Medical Informatics and Decision Making 14, Suppl 1, S3.Google Scholar
Cross Ref
- Yihua Zhang, Marina Blanton, and Ghada Almashaqbeh. 2015a. Secure distributed genome analysis for GWAS and sequence comparison computation. BMC Medical Informatics and Decision Making 15, Suppl 5, S4.Google Scholar
Cross Ref
- Yuchen Zhang, Wenrui Dai, Xiaoqian Jiang, Hongkai Xiong, and Shuang Wang. 2015b. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption. BMC Medical Informatics and Decision Making 15, Suppl 5, S5.Google Scholar
Cross Ref
- Xiaoyong Zhou, Bo Peng, Yong Fuga Li, Yangyi Chen, Haixu Tang, and XiaoFeng Wang. 2011. To release or not to release: Evaluating information leaks in aggregate human-genome data. In ESORICS’11. Google Scholar
Digital Library
Index Terms
Quantifying Interdependent Risks in Genomic Privacy
Recommendations
Addressing the concerns of the lacks family: quantification of kin genomic privacy
CCS '13: Proceedings of the 2013 ACM SIGSAC conference on Computer & communications securityThe rapid progress in human-genome sequencing is leading to a high availability of genomic data. This data is notoriously very sensitive and stable in time. It is also highly correlated among relatives. A growing number of genomes are becoming ...
Evaluating the Strength of Genomic Privacy Metrics
The genome is a unique identifier for human individuals. The genome also contains highly sensitive information, creating a high potential for misuse of genomic data (for example, genetic discrimination). In this article, we investigate how genomic ...
Quantifying Genomic Privacy via Inference Attack with High-Order SNV Correlations
SPW '15: Proceedings of the 2015 IEEE Security and Privacy WorkshopsAs genomic data becomes widely used, the problem of genomic data privacy becomes a hot interdisciplinary research topic among geneticists, bioinformaticians and security and privacy experts. Practical attacks have been identified on genomic data, and ...






Comments