Abstract
The genome is a unique identifier for human individuals. The genome also contains highly sensitive information, creating a high potential for misuse of genomic data (for example, genetic discrimination). In this article, we investigate how genomic privacy can be measured in scenarios where an adversary aims to infer a person’s genomic markers by constructing probability distributions on the values of genetic variations. We measured the strength of privacy metrics by requiring that metrics are monotonic with increasing adversary strength and uncovered serious problems with several existing metrics currently used to measure genomic privacy. We provide suggestions on metric selection, interpretation, and visualization and illustrate the work flow using case studies for three real-world diseases.
- Dakshi Agrawal and Charu C. Aggarwal. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’01). ACM, New York, NY, 247--255. Google Scholar
Digital Library
- James Alexander and Jonathan Smith. 2003. Engineering privacy in public: Confounding face recognition. In Proceedings of the 3rd International Workshop on Privacy Enhancing Technologies (PET’03) (LNCS 2760). Springer, Berlin, 88--106.Google Scholar
Cross Ref
- Christer Andersson and Reine Lundin. 2008. On the fundamentals of anonymity metrics. In Proceedings of the 3rd IFIP International Summer School on The Future of Identity in the Information Society. Springer, Berlin, 325--341.Google Scholar
Cross Ref
- Erman Ayday, Jean Louis Raisaro, Urs Hengartner, Adam Molyneaux, and Jean-Pierre Hubaux. 2014. Privacy-preserving processing of raw genomic data. In Data Privacy Management and Autonomous Spontaneous Security. Springer, Berlin, 133--147. Google Scholar
Digital Library
- Erman Ayday, Jean Louis Raisaro, and Jean-Pierre Hubaux. 2013a. Personal use of the genomic data: Privacy vs. storage cost. In Proc. IEEE Global Communications Conf. (GLOBECOM 2013). IEEE, Los Alamitos, CA, 2723--2729.Google Scholar
Cross Ref
- Erman Ayday, Jean Louis Raisaro, Jean-Pierre Hubaux, and Jacques Rougemont. 2013b. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In Proceedings of the 12th ACM Workshop on Workshop on Privacy in the Electronic Society (WPES’13). ACM, 95--106. Google Scholar
Digital Library
- Elisa Bertino, Dan Lin, and Wei Jiang. 2008. A survey of quantification of privacy preserving data mining algorithms. In Privacy-Preserving Data Mining: Models and Algorithms. Number 34 in Advances in Database Systems. Springer, Berlin, 183--205.Google Scholar
- Terence Chen, Abdelberi Chaabane, Pierre Ugo Tournoux, Mohamed-Ali Kaafar, and Roksana Boreli. 2013. How much is too much? Leveraging ads audience estimation to evaluate public profile uniqueness. In Proceedings of the 13th International Symposium on Privacy Enhancing Technologies (PETS’13) (LNCS 7981). Springer, Berlin, 225--244.Google Scholar
Cross Ref
- Xihui Chen and Jun Pang. 2012. Measuring query privacy in location-based services. In Proceedings of the 2nd ACM Conference on Data and Application Security and Privacy (CODASPY’12). ACM, New York, NY, 49--60. Google Scholar
Digital Library
- Sebastian Clauß and Stefan Schiffner. 2006. Structuring anonymity metrics. In Proceedings of the 13th ACM Conference on Computer and Communications Security 2006 (CCS’06): 2nd ACM Workshop on Digital Identity Management (DIM’06). ACM, New York, NY, 55--62. Google Scholar
Digital Library
- Yuxin Deng, Jun Pang, and Peng Wu. 2007. Measuring anonymity with relative entropy. In Proceedings of the 8th International Workshop on Formal Aspects in Security and Trust (FAST’11). Springer, Berlin, 65--79. Google Scholar
Digital Library
- Claudia Diaz, Stefaan Seys, Joris Claessens, and Bart Preneel. 2003. Towards measuring anonymity. In Privacy Enhancing Technologies. 54--68. Google Scholar
Digital Library
- Claudia Diaz, Carmela Troncoso, and George Danezis. 2007. Does additional information always reduce anonymity? In Proceedings of the 6th ACM Workshop on Privacy in Electronic Society (WPES’07). ACM, New York, NY, 72--75. Google Scholar
Digital Library
- Radoje Drmanac, Andrew B. Sparks, Matthew J. Callow, Aaron L. Halpern, Norman L. Burns, et al. 2010. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 5961 (Jan. 2010), 78--81.Google Scholar
Cross Ref
- Cynthia Dwork. 2006. Differential privacy. In Proceedings of the 33rd International Colloqium on Automata, Languages and Programming (ICALP’06) (LNCS 4052). Springer, Berlin, 1--12. Google Scholar
Digital Library
- Yaniv Erlich and Arvind Narayanan. 2014. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 6 (Jun. 2014), 409--421.Google Scholar
- Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. 2014. Privacy in pharmacogenetics: An end-to-end case study of personalized Warfarin dosing. In USENIX Security. USENIX. Google Scholar
Digital Library
- Julien Freudiger, Maxim Raya, Márk Félegyházi, Panos Papadimitratos, and Jean-Pierre Hubaux. 2007. Mix-zones for location privacy in vehicular networks. In Proceedings of the 1st International Workshop on Wireless Networking for Intelligent Transportation Systems (WiN-ITS’07). ICST, Vancouver, Canada.Google Scholar
- Michael T. Goodrich. 2009. The mastermind attack on genomic data. In Proceedings of the 30th IEEE Symposium on Security and Privacy. 204--218. Google Scholar
Digital Library
- Scott Gottlieb. 2001. US employer agrees to stop genetic testing. Br. Med. J. 322, 7284 (Feb. 2001), 449.Google Scholar
- Bastian Greshake, Philipp E. Bayer, Helge Rausch, and Julia Reda. 2014. openSNP--A crowdsourced web resource for personal genomics. PLoS ONE 9, 3 (March 2014).Google Scholar
Cross Ref
- Daojing He, S. Chan, and M. Guizani. 2015. Privacy and incentive mechanisms in people-centric sensing networks. IEEE Commun. Mag. 53, 10 (2015), 200--206.Google Scholar
Cross Ref
- Jerry L. Hintze and Ray D. Nelson. 1998. Violin plots: A box plot-density trace synergism. Am. Stat. 52, 2 (May 1998), 181--184.Google Scholar
- Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V. Pearson, Dietrich A. Stephan, Stanley F. Nelson, and David W. Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, 8 (August 2008), e1000167.Google Scholar
Cross Ref
- Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2013. Addressing the concerns of the lacks family: Quantification of kin genomic privacy. In Proceedings of the 20th ACM Conf. on Computer and Communications Security (CCS’13). ACM, Berlin, 1141--1152. Google Scholar
Digital Library
- Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2014. Reconciling utility with privacy in genomics. In Proceedings of the 13th Workshop on Privacy in the Electronic Society (WPES’14). ACM, New York, NY, 11--20. Google Scholar
Digital Library
- Mathias Humbert, Kévin Huguenin, Joachim Hugonot, Erman Ayday, and Jean-Pierre Hubaux. 2015. De-anonymizing genomic databases using phenotypic traits.Google Scholar
- Georgios Kalogridis, Costas Efthymiou, Stojan Z. Denic, Tim A. Lewis, and Rafael Cepeda. 2010. Privacy for smart meters: Towards undetectable appliance load signatures. In Proceedings of the 1st International Conference on Smart Grid Communications (SmartGridComm’10). IEEE, Los Alamitos, CA, 232--237.Google Scholar
Cross Ref
- Zhen Lin, Michael Hewett, and Russ B. Altman. 2002. Using binning to maintain confidentiality of medical data. In Proceedings of the AMIA Symposium (AMIA’02), 454--458.Google Scholar
- Changchang Liu and Prateek Mittal. 2016. LinkMirage: Enabling privacy-preserving analytics on social relationships. In NDSS.Google Scholar
- Bradley A. Malin. 2005. Protecting DNA sequence anonymity with generalization lattices. Methods Inf. Med. 44, 5 (2005), 687--692.Google Scholar
Cross Ref
- Marina Meilă. 2007. Comparing clusterings—an information based distance. J. Multivar. Anal. 98, 5 (May 2007), 873--895. Google Scholar
Digital Library
- Steven J. Murdoch. 2014. Quantifying and measuring anonymity. In Data Privacy Management and Autonomous Spontaneous Security. Springer, Berlin, 3--13. Google Scholar
Digital Library
- Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing social networks. In 30th IEEE Symposium on Security and Privacy. 173--187. Google Scholar
Digital Library
- Muhammad Naveed, Erman Ayday, Ellen W. Clayton, Jacques Fellay, Carl A. Gunter, Jean-Pierre Hubaux, Bradley A. Malin, and Xiaofeng Wang. 2015. Privacy in the genomic era. ACM Comput. Surv. 48, 1 (Aug. 2015), 6:1--6:44. Google Scholar
Digital Library
- Dale R. Nyholt, Chang-En Yu, and Peter M. Visscher. 2009. On Jim Watson’s APOE status: Genetic information is hard to hide. Eur. J. Hum. Genet. 17, 2 (Feb. 2009), 147--149.Google Scholar
Cross Ref
- Simon Oya, Carmela Troncoso, and Fernando Pérez-González. 2014. Do dummies pay off? Limits of dummy traffic protection in anonymous communications. In Proceedings of the 14th International Symposium on Privacy Enhancing Technologies (PETS’14) (LNCS 8555). Springer, Berlin, 204--223.Google Scholar
Cross Ref
- Ravi Sachidanandam, David Weissman, Steven C. Schmidt, Jerzy M. Kakol, Lincoln D. Stein, Gabor Marth, Steve Sherry, James C. Mullikin, Beverley J. Mortimore, David L. Willey, Sarah E. Hunt, Charlotte G. Cole, Penny C. Coggill, Catherine M. Rice, Zemin Ning, Jane Rogers, David R. Bentley, Pui-Yan Kwok, Elaine R. Mardis, Raymond T. Yeh, Brian Schultz, Lisa Cook, Ruth Davenport, Michael Dante, Lucinda Fulton, LaDeana Hillier, Robert H. Waterston, John D. McPherson, Brian Gilman, Stephen Schaffner, William J. Van Etten, David Reich, John Higgins, Mark J. Daly, Brendan Blumenstiel, Jennifer Baldwin, Nicole Stange-Thomann, Michael C. Zody, Lauren Linton, Eric S. Lander, and David Altshuler. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 6822 (Feb. 2001), 928--933.Google Scholar
- Sahel Samani, Zhicong Huang, Erman Ayday, Mark Elliot, Jacques Fellay, Jean-Pierre Hubaux, and Zoltán Kutalik. 2015. Quantifying genomic privacy via inference attack with high-order SNV correlations. In Proceedings of the 2015 IEEE Security and Privacy Workshops (SPW’15). 32--40. Google Scholar
Digital Library
- Andrei Serjantov and George Danezis. 2002. Towards an information theoretic metric for anonymity. In Proceedings of the 2nd Internationl Symposium on Privacy Enhancing Technologies (PETS’02) (LNCS 2482). Springer, Berlin, 41--53. Google Scholar
Digital Library
- S. T. Sherry, M.-H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. 2001. dbSNP: The NCBI database of genetic variation. Nucl. Acids Res. 29, 1 (Jan. 2001), 308--311.Google Scholar
Cross Ref
- Reza Shokri, George Theodorakopoulos, Jean-Yves Le Boudec, and Jean-Pierre Hubaux. 2011. Quantifying location privacy. In Proceedings of the 2011 32nd IEEE Symp. on Security and Privacy (S8P’11). IEEE, 247--262. Google Scholar
Digital Library
- Montgomery Slatkin. 2008. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 6 (June 2008), 477--485.Google Scholar
Cross Ref
- Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl.-Based Syst. 10, 05 (2002), 557--570. Google Scholar
Digital Library
- Paul Syverson. 2013. Why I’m not an entropist. In Proc. 17th Int. Workshop on Security Protocols (LNCS 7028). Springer, Berlin, 213--230.Google Scholar
Cross Ref
- The 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526, 7571 (Oct. 2015), 68--74.Google Scholar
Cross Ref
- Sarah A. Tishkoff and Kenneth K. Kidd. 2004. Implications of biogeography of human populations for “race” and medicine. Nat. Genet. 36 (Oct. 2004), S21--S27.Google Scholar
- Isabel Wagner. 2015. Genomic privacy metrics: A systematic comparison. In Proceedings of the 2015 IEEE Security and Privacy Workshops (SPW). 50--59. Google Scholar
Digital Library
- Isabel Wagner and David Eckhoff. 2015. Technical privacy metrics: A systematic survey. arXiv:1512.00327 {cs, math} (Dec. 2015). http://arxiv.org/abs/1512.00327Google Scholar
- Rui Wang, Yong Fuga Li, XiaoFeng Wang, Haixu Tang, and Xiaoyong Zhou. 2009. Learning your identity and disease from research papers: Information leaks in genome wide association study. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09). ACM, Berlin, 534--544. Google Scholar
Digital Library
- Danielle Welter, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, Paul Flicek, Teri Manolio, Lucia Hindorff, and Helen Parkinson. 2014. The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucl. Acids Res. 42, D1 (Jan. 2014), D1001--D1006.Google Scholar
Cross Ref
- Kris Wetterstrand. 2016. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP). Retrieved from https://www.genome.gov/sequencingcostsdata.Google Scholar
- Ye Zhu and Riccardo Bettati. 2005. Anonymity vs. information leakage in anonymity systems. In Proc. 25th IEEE Int. Conf. on Distributed Computing Systems (ICDCS’05). IEEE, Los Alamitos, CA, 514--524. Google Scholar
Digital Library
Index Terms
Evaluating the Strength of Genomic Privacy Metrics
Recommendations
Technical Privacy Metrics: A Systematic Survey
The goal of privacy metrics is to measure the degree of privacy enjoyed by users in a system and the amount of protection offered by privacy-enhancing technologies. In this way, privacy metrics contribute to improving user privacy in the digital world. ...
Designing Strong Privacy Metrics Suites Using Evolutionary Optimization
The ability to measure privacy accurately and consistently is key in the development of new privacy protections. However, recent studies have uncovered weaknesses in existing privacy metrics, as well as weaknesses caused by the use of only a single ...
Quantifying Interdependent Risks in Genomic Privacy
The rapid progress in human-genome sequencing is leading to a high availability of genomic data. These data is notoriously very sensitive and stable in time, and highly correlated among relatives. In this article, we study the implications of these ...






Comments