Abstract
Transactions are records that contain a set of items about individuals. For example, items browsed by a customer when shopping online form a transaction. Today, many activities are carried out on the Internet, resulting in a large amount of transaction data being collected. Such data are often shared and analyzed to improve business and services, but they also contain private information about individuals that must be protected. Techniques have been proposed to sanitize transaction data before their release, and set-based generalization is one such method. In this article, we study how well set-based generalization can protect transactions. We propose methods to attack set-generalized transactions by exploiting contextual information that is available within the released data. Our results show that set-based generalization may not provide adequate protection for transactions, and up to 70% of the items added into the transactions during generalization to obfuscate original data can be detected by our methods with a precision over 80%.
- M. Barbaro and T. Zeller. 2006. A face is exposed for AOL searcher no. 4417749. New York Times (2006).Google Scholar
- M. Bawa, R. J. Bayardo Jr, and R. Agrawal. 2003. Privacy-preserving indexing of documents on the network. In Proceedings of the 29th International Conference on VLDB. 922--933. Google Scholar
Digital Library
- R. L. Cilibrasi and P. M. B. Vitányi. 2007. The google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 3 (2007), 370--383. Google Scholar
Digital Library
- A. Datta, D. Sharma, and A. Sinha. 2012. Provable de-anonymization of large datasets with sparse dimensions. In Principles of Security and Trust. 229--248. Google Scholar
Digital Library
- S. Deeerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6 (1990), 391--407.Google Scholar
Cross Ref
- C. R. Giannella, K. Liu, and H. Kargupta. 2013. Breaching euclidean distance-preserving data perturbation using few known inputs. Data Knowl. Eng. 84 (2013), 93--110. Google Scholar
Digital Library
- P. Golle. 2006. Revisiting the uniqueness of simple demographics in the US population. In Proceedings of the 5th ACM Workshop on Privacy in Electronic Society. 77--80. Google Scholar
Digital Library
- N. Li, T. Li, and S. Venkatasubramanian. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE 23rd International Conference on Data Engineering. 106--115.Google Scholar
- G. Loukides, A. Gkoulalas-Divanis, and B. Malin. 2011. COAT: COnstraint-based anonymization of transactions. Knowl. Inf. Syst. 28, 2 (2011), 251--282. Google Scholar
Digital Library
- G. Loukides, A. Gkoulalas-Divanis, and J. Shao. 2013. Efficient and flexible anonymization of transaction data. Knowl. Inf. Syst. 36, 1 (2013), 153--210.Google Scholar
Digital Library
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. 2007. -Diversity: Privacy beyond -anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (2007). Google Scholar
Digital Library
- D. J. Martin, D. Kifer, A. Machanavajjhala, and J. Gehrke. 2007. Worse-case background knowledge for privacy-preserving data publishing. In Proceedings of the 23rd International Conference on Data Engineering (ICDE’07).Google Scholar
- A. Narayanan and V. Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the IEEE Symposium on Security and Privacy. 111--125. Google Scholar
Digital Library
- A. Nenkova and K. McKeown. 2012. A survey of text summarization techniques. In Mining Text Data, C. C. Aggarwal and C. Zhai (Eds.). 43--76.Google Scholar
- H. Ong and J. Shao. 2014. De-anonymising set-generalised transactions based on semantic relationships. In Proceedings of the 1st International Conferenc on Future Data and Security Engineering. 107--121.Google Scholar
- D. Sánchez, M. Batet, and A. Viejo. 2013. Detecting term relationships to improve textual document sanitization. In Proceedings of Pacific Asia Conference on Information Systems. 105--119.Google Scholar
- L. Sweeney. 2002. k-Anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl.-Based Syst. 10, 5 (2002), 557--570. Google Scholar
Digital Library
- Y. Tang and L. Liu. 2015. Privacy-preserving multi-keyword search in information networks. IEEE Trans. Knowl. Data Eng. 27, 4 (2015), 2424--2437.Google Scholar
Digital Library
- Y. Tang, L. Liu, A. Iyengar, and K. Lee amd Q. Zhang. 2014. e-PPI: Locator service in information networks with personalized privacy preservation. In Proceedings of IEEE 34th International Conference on Distributed Computing Systems (ICDCS’14). 186--197. Google Scholar
Digital Library
- M. Terrovitis, J. Liagouris, N. Mamoulis, and S. Skiadopoulos. 2012. Privacy preservation by disassociation. Proceedings of the VLDB Endowment (PVLDB’12) 5, 10 (2012), 944--955. Google Scholar
Digital Library
- M. Terrovitis, N. Mamoulis, and P. Kalnis. 2008. Privacy-preserving anonymization of set-valued data. In Proceedings of International Conference on Very Large Data Bases (VLDB’08). 115--125.Google Scholar
- R. C. Wong, A. W. Fu, K. Wang, and J. Pei. 2007. Minimality attack in privacy preserving data publishing. In Proceedings of the 33rd International Conference on VLDB. 543--554. Google Scholar
Digital Library
- Z. Wu and M. Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. 133--138. Google Scholar
Digital Library
- X. Xiao, Y. Tao, and N. Koudas. 2010. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM Trans. Database Syst. 35, 2 (2010). Google Scholar
Digital Library
Index Terms
Exploiting Contextual Information in Attacking Set-Generalized Transactions
Recommendations
On anonymizing transactions with sensitive items
K-anonymity (Samarati and Sweeny 1998 ; Samarati, IEEE Trans Knowl Data Eng, 13(6):1010---1027, 2001 ; Sweeny, Int J Uncertain, Fuzziness Knowl-Based Syst, 10(5):557---570, 2002 ) and its variants, l -diversity (Machanavajjhala et al., ACM TKDD, ...
Efficient and flexible anonymization of transaction data
Transaction data are increasingly used in applications, such as marketing research and biomedical studies. Publishing these data, however, may risk privacy breaches, as they often contain personal information about individuals. Approaches to anonymizing ...
Semantic Attack on Anonymised Transactions
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIII - Volume 9480A transaction is a data record that contains items associated with an individual. For example, a set of movies rated by an individual form a transaction. Transaction data are important to applications such as marketing analysis and medical studies, but ...






Comments