skip to main content
research-article
Open Access

Exploiting Contextual Information in Attacking Set-Generalized Transactions

Published:18 September 2017Publication History
Skip Abstract Section

Abstract

Transactions are records that contain a set of items about individuals. For example, items browsed by a customer when shopping online form a transaction. Today, many activities are carried out on the Internet, resulting in a large amount of transaction data being collected. Such data are often shared and analyzed to improve business and services, but they also contain private information about individuals that must be protected. Techniques have been proposed to sanitize transaction data before their release, and set-based generalization is one such method. In this article, we study how well set-based generalization can protect transactions. We propose methods to attack set-generalized transactions by exploiting contextual information that is available within the released data. Our results show that set-based generalization may not provide adequate protection for transactions, and up to 70% of the items added into the transactions during generalization to obfuscate original data can be detected by our methods with a precision over 80%.

References

  1. M. Barbaro and T. Zeller. 2006. A face is exposed for AOL searcher no. 4417749. New York Times (2006).Google ScholarGoogle Scholar
  2. M. Bawa, R. J. Bayardo Jr, and R. Agrawal. 2003. Privacy-preserving indexing of documents on the network. In Proceedings of the 29th International Conference on VLDB. 922--933. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. L. Cilibrasi and P. M. B. Vitányi. 2007. The google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 3 (2007), 370--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Datta, D. Sharma, and A. Sinha. 2012. Provable de-anonymization of large datasets with sparse dimensions. In Principles of Security and Trust. 229--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Deeerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6 (1990), 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. R. Giannella, K. Liu, and H. Kargupta. 2013. Breaching euclidean distance-preserving data perturbation using few known inputs. Data Knowl. Eng. 84 (2013), 93--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Golle. 2006. Revisiting the uniqueness of simple demographics in the US population. In Proceedings of the 5th ACM Workshop on Privacy in Electronic Society. 77--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Li, T. Li, and S. Venkatasubramanian. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE 23rd International Conference on Data Engineering. 106--115.Google ScholarGoogle Scholar
  9. G. Loukides, A. Gkoulalas-Divanis, and B. Malin. 2011. COAT: COnstraint-based anonymization of transactions. Knowl. Inf. Syst. 28, 2 (2011), 251--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Loukides, A. Gkoulalas-Divanis, and J. Shao. 2013. Efficient and flexible anonymization of transaction data. Knowl. Inf. Syst. 36, 1 (2013), 153--210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. 2007. -Diversity: Privacy beyond -anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. J. Martin, D. Kifer, A. Machanavajjhala, and J. Gehrke. 2007. Worse-case background knowledge for privacy-preserving data publishing. In Proceedings of the 23rd International Conference on Data Engineering (ICDE’07).Google ScholarGoogle Scholar
  13. A. Narayanan and V. Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the IEEE Symposium on Security and Privacy. 111--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Nenkova and K. McKeown. 2012. A survey of text summarization techniques. In Mining Text Data, C. C. Aggarwal and C. Zhai (Eds.). 43--76.Google ScholarGoogle Scholar
  15. H. Ong and J. Shao. 2014. De-anonymising set-generalised transactions based on semantic relationships. In Proceedings of the 1st International Conferenc on Future Data and Security Engineering. 107--121.Google ScholarGoogle Scholar
  16. D. Sánchez, M. Batet, and A. Viejo. 2013. Detecting term relationships to improve textual document sanitization. In Proceedings of Pacific Asia Conference on Information Systems. 105--119.Google ScholarGoogle Scholar
  17. L. Sweeney. 2002. k-Anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl.-Based Syst. 10, 5 (2002), 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Tang and L. Liu. 2015. Privacy-preserving multi-keyword search in information networks. IEEE Trans. Knowl. Data Eng. 27, 4 (2015), 2424--2437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Tang, L. Liu, A. Iyengar, and K. Lee amd Q. Zhang. 2014. e-PPI: Locator service in information networks with personalized privacy preservation. In Proceedings of IEEE 34th International Conference on Distributed Computing Systems (ICDCS’14). 186--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Terrovitis, J. Liagouris, N. Mamoulis, and S. Skiadopoulos. 2012. Privacy preservation by disassociation. Proceedings of the VLDB Endowment (PVLDB’12) 5, 10 (2012), 944--955. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Terrovitis, N. Mamoulis, and P. Kalnis. 2008. Privacy-preserving anonymization of set-valued data. In Proceedings of International Conference on Very Large Data Bases (VLDB’08). 115--125.Google ScholarGoogle Scholar
  22. R. C. Wong, A. W. Fu, K. Wang, and J. Pei. 2007. Minimality attack in privacy preserving data publishing. In Proceedings of the 33rd International Conference on VLDB. 543--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Z. Wu and M. Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. 133--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Xiao, Y. Tao, and N. Koudas. 2010. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM Trans. Database Syst. 35, 2 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting Contextual Information in Attacking Set-Generalized Transactions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Internet Technology
      ACM Transactions on Internet Technology  Volume 17, Issue 4
      Special Issue on Provenance of Online Data and Regular Papers
      November 2017
      165 pages
      ISSN:1533-5399
      EISSN:1557-6051
      DOI:10.1145/3133307
      • Editor:
      • Munindar P. Singh
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 September 2017
      • Accepted: 1 May 2017
      • Revised: 1 December 2016
      • Received: 1 October 2015
      Published in toit Volume 17, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!