Abstract
With the rapid growth of applications which generate timestamped sequences (click streams, GPS trajectories, RFID sequences), sequence anonymization has become an important problem, in that should such data be published or shared. Existing trajectory anonymization techniques disregard the importance of time or the sensitivity of events. This article is the first, to our knowledge, thorough study on time-stamped event sequence anonymization. We propose a novel and tunable generalization framework tailored to event sequences. We generalize time stamps using time intervals and events using a taxonomy which models the domain semantics. We consider two scenarios: (i) sharing the data with a single receiver (the SSR setting), where the receiver’s background knowledge is confined to a set of time stamps and time generalization suffices, and (ii) sharing the data with colluding receivers (the SCR setting), where time generalization should be combined with event generalization. For both cases, we propose appropriate anonymization methods that prevent both user identification and event prediction. To achieve computational efficiency and scalability, we propose optimization techniques for both cases using a utility-based index, compact summaries, fast to compute bounds for utility, and a novel taxonomy-aware distance function. Extensive experiments confirm the effectiveness of our approach compared with state of the art, in terms of information loss, range query distortion, and preserving temporal causality patterns. Furthermore, our experiments demonstrate efficiency and scalability on large-scale real and synthetic datasets.
- Abul, O., Bonchi, F., and Nanni, M. 2008. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE). Google Scholar
Digital Library
- Aggarwal, C. C. and Yu, P. S. 2007a. On anonymization of string data. In Proceedings of the SIAM International Conference on Data Mining (SDM).Google Scholar
- Aggarwal, C. C. and Yu, P. S. 2007b. On privacy-preservation of text and sparse binary data with sketches. In Proceedings of the SIAM International Conference on Data Mining (SDM).Google Scholar
- Agichtein, E., Brill, E., Dumais, S., and Ragno, R. 2006. Learning user interaction models for predicting Web search result preferences. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google Scholar
Digital Library
- Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. SIGMOD Records 29, 2, 439--450. Google Scholar
Digital Library
- Alon, N., Matias, Y., and Szegedy, M. 1996. The space complexity of approximating the frequency moments. In Proceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC). Google Scholar
Digital Library
- Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. The New York Times. August 9.Google Scholar
- Bayardo, R. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE). Google Scholar
Digital Library
- Bonchi, F., Lakshmanan, L. V. S., and Wang, W. H. 2011. Trajectory anonymity in publishing personal mobility data. SIGKDD Explor. 13, 1, 30--42. Google Scholar
Digital Library
- Brinkhoff, T. 2003. Generating traffic data. IEEE Data Eng. Bullet. 26, 2, 19--25.Google Scholar
- Chen, R., Acs, G., and Castelluccia, C. 2012a. Differentially private sequential data publication via variable-length n-grams. In Proceedings of the ACM Conference on Computer and Communications Security. Google Scholar
Digital Library
- Chen, R., Fung, B. C. M., Desai, B. C., and Sossou, N. M. 2012b. Differentially private transit data publication: A case study on the montreal transportation system. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 213--221. Google Scholar
Digital Library
- Chen, R., Fung, B. C. M., Mohammed, N., Desai, B. C., and Wang, K. 2013. Privacy-preserving trajectory data publishing by local suppression. Inf. Sci. 231, 83--97. Google Scholar
Digital Library
- Cheng, R., Zhang, Y., Bertino, E., and Prabhakar, S. 2006. Preserving user location privacy in mobile data management infrastructures. In Proceedings of the 6th Workshop on Privacy Enhancing Technologies. Google Scholar
Digital Library
- Chow, C.-Y. and Mokbel, M. F. 2011. Trajectory privacy in location-based services and data publication. SIGKDD Explor. 13, 1, 19--29. Google Scholar
Digital Library
- Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on VLDB (VLDB). Google Scholar
Digital Library
- Cormode, G., Srivastava, D., Li, N., and Li, T. 2010. Minimizing minimality and maximizing utility: Analyzing method-based attacks on anonymized data. Proc. VLDB Endow. 3, 1--2, 1045--1056. Google Scholar
Digital Library
- Deshpande, M. and Karypis, G. 2004. Selective Markov models for predicting web page accesses. ACM Trans. Internet Technol. 4, 2, 163--184. Google Scholar
Digital Library
- Ding, B., Winslett, M., Han, J., and Li, Z. 2011. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google Scholar
Digital Library
- Dupret, G. E. and Piwowarski, B. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google Scholar
Digital Library
- Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC). Lecture Notes in Computer Science, vol. 3876. Springer, Berlin Heidelberg. 265--284. Google Scholar
Digital Library
- Friedman, J., Bentley, J. L., and Finkel, R. A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 209--226. Google Scholar
Digital Library
- Fung, B., M. Cao, M., Desai, B., and Xu, H. 2009. Privacy protection for RFID data. In Proceedings of the ACM Symposium on Applied Computing (SAC). Google Scholar
Digital Library
- Ganta, S. R., Kasiviswanathan, S. P., and Smith, A. 2008. Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google Scholar
Digital Library
- Ghinita, G. 2009. Private queries and trajectory anonymization: A dual perspective on location privacy. Trans. Data Privacy 2, 1, 3--19. Google Scholar
Digital Library
- Ghinita, G., Karras, P., Kalnis, P., and Mamoulis, N. 2007. Fast data anonymization with low information loss. In Proceedings of the 33rd International Conference on VLDB (VLDB). Google Scholar
Digital Library
- Ghinita, G., Kalnis, P., Khoshgozaran, A., Shahabi, C., and Tan, K.-L. 2008. Private queries in location based services: Anonymizers are not necessary. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google Scholar
Digital Library
- Goel, S., Broder, A. Z., Gabrilovich, E., and Pang, B. 2010. Anatomy of the long tail: Ordinary people with extraordinary tastes. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). Google Scholar
Digital Library
- Google. 2011. Google privacy FAQ. http://www.google.com/privacy/faq.html.Google Scholar
- Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., and Gehrke, J. 2009. Publishing search logs---A comparative study of privacy guarantees. IEEE Trans. Knowl. Data Eng. 24, 3, 520--532. Google Scholar
Digital Library
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google Scholar
Digital Library
- Hadjieleftheriou, M., Kollios, G., Bakalov, P., and Tsotras, V. J. 2005. Complex spatio-temporal pattern queries. In Proceedings of the 31st International Conference on VLDB (VLDB). Google Scholar
Digital Library
- He, Y. and Naughton, J. F. 2009. Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2, 1, 934--945. Google Scholar
Digital Library
- Huo, Z., Meng, X., Hu, H., and Huang, Y. 2012. You can walk alone: Trajectory privacy-preserving through significant stays protection. In Proceedings of the 17th International Conference on Database Systems for Advanced Application (DASFAA). Lecture Notes in Computer Science, vol. 7238, Springer, Berlin Heidelberg. 351--358. Google Scholar
Digital Library
- Iwuchukwu, T. and Naughton, J. F. 2007. K-anonymization as spatial indexing: Toward scalable and incremental anonymization. In Proceedings of the 29th International Conference on VLDB (VLDB). Google Scholar
Digital Library
- Jones, R., Kumar, R., Pang, B., and Tomkins, A. 2007. “I know what you did last summer”: Query logs and user privacy. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM). Google Scholar
Digital Library
- Kido, H., Yanagisawa, Y., and Satoh, T. 2005. An anonymous communication technique using dummies for location-based services. In Proceedings of the International Conference on Pervasive Services.Google Scholar
- Kifer, D. 2009. Attacks on privacy and definetti’s theorem. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google Scholar
Digital Library
- Kifer, D. and Gehrke, J. 2006. Injecting utility into anonymized datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google Scholar
Digital Library
- Kifer, D. and Machanavajjhala, A. 2011. No free lunch in data privacy. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google Scholar
Digital Library
- Koren, Y. 2010. Collaborative filtering with temporal dynamics. Commun. ACM 53, 4, 89--97. Google Scholar
Digital Library
- Korolova, A., Kenthapadi, K., Mishra, N., and Ntoulas, A. 2009. Releasing search queries and clicks privately. In Proceedings of the 18th International Conference on World Wide Web (WWW). Google Scholar
Digital Library
- Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Ann. Math. Stat. 22, 1, 79--86.Google Scholar
Cross Ref
- LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). Google Scholar
Digital Library
- Li, N., Li, T., and Venkatasubramanian, S. 2007. T-closeness: Privacy beyond K-anonymity and L-diversity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE).Google Scholar
- Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2006. ℓ-diversity: Privacy beyond K-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). Google Scholar
Digital Library
- Machanavajjhala, A., Kifer, D., Abowd, J. M., Gehrke, J., and Vilhuber, L. 2008. Privacy: Theory meets practice on the map. In Proceedings of the 24th International Conference on Data Engineering (ICDE). Google Scholar
Digital Library
- Machanavajjhala, A., Korolova, A., and Sarma, A. D. 2011. Personalized social recommendations - accurate or private? Proc. VLDB Endow. 4, 7, 440--450. Google Scholar
Digital Library
- Mahdavifar, S., Abadi, M., Kahani, M., and Mahdikhani, H. 2012. A clustering-based approach for personalized privacy preserving publication of moving object trajectory data. In Proceedings of the 6th International Conference on Network and System Security. 149--165. Google Scholar
Digital Library
- Mannila, H., Toivonen, H., and Inkeri, V. A. 1997. Discovery of frequent episodes in event sequences. Data Mining Knowl. Discov. 1, 3, 259--289. Google Scholar
Digital Library
- Matthijs, N. and Radlinski, F. 2011. Personalizing Web search using long term browsing history. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM). Google Scholar
Digital Library
- Meyerson, A. and Williams, R. 2004. On the complexity of optimal K-anonymity. In Proceedings of the 23rd Symposium on Principles of Database Systems (PODS). Google Scholar
Digital Library
- Mokbel, M. F., Chow, C.-Y., and Aref, W. G. 2006. The new casper: Query processing for location services without compromising privacy. In Proceedings of the 28th International Conference on VLDB (VLDB). Google Scholar
Digital Library
- Monreale, A., Andrienko, G., Andrienko, N., Giannotti, F., Pedreschi, D., Rinzivillo, S., and Wrobe, S. 2010. Movement data anonymity through generalization. Trans. Data Privacy 3, 2, 27--31. Google Scholar
Digital Library
- Moon, B., Jagadish, H. V., Faloutsos, C., and Saltz, J. H. 2001. Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans. Knowl. Data Eng. 13, 1, 124--141. Google Scholar
Digital Library
- Nergiz, M. E., Atzori, M., Saygin, Y., and Güç, B. 2009. Towards trajectory anonymization: A generalization-based approach. Trans. Data Privacy 2, 1, 47--75. Google Scholar
Digital Library
- Pang, H., Ding, X., and Xiao, X. 2010. Embellishing text search queries to protect user privacy. Proc. VLDB Endow. 3, 1--2, 598--607. Google Scholar
Digital Library
- Papoulis, A. and Pillai, S. U. 2002. Probability, Random Variables and Stochastic Processes. McGraw Hill.Google Scholar
- Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 6, 1010--1027. Google Scholar
Digital Library
- Seidl, T. and Kriegel, H.-P. 1998. Optimal multi-step k-nearest neighbor search. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google Scholar
Digital Library
- Sherkat, R. and Rafiei, D. 2008. On efficiently searching trajectories and archival data for historical similarities. Proc. VLDB Endow. 1, 1, 896--908. Google Scholar
Digital Library
- Sweeney, L. 2002. K-anonymity: A model for protecting privacy. Int. J. Uncertainty, Fuzziness Knowl.-Based Syst, 10, 5, 557--570. Google Scholar
Digital Library
- Terrovitis, M. and Mamoulis, N. 2008. Privacy preservation in the publication of trajectories. In Proceedings of the 9th International Conference on Mobile Data Management (MDM). Google Scholar
Digital Library
- Terrovitis, M., Mamoulis, N., and Kalnis, P. 2008. Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1, 1, 115--125. Google Scholar
Digital Library
- Terrovitis, M., Liagouris, J., Mamoulis, N., and Skiadopoulos, S. 2012. Privacy preservation by disassociation. Proc. VLDB Endow. 5, 10, 944--955. Google Scholar
Digital Library
- Wagstaff, K. and Cardie, C. 2000. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning (ICML). Google Scholar
Digital Library
- Wang, L. and Jiang, T. 1994. On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 4, 337--348.Google Scholar
Cross Ref
- Wong, R. C.-W., Fu, A. W., Wang, K., and Pei, J. 2007. Minimality attack in privacy preserving data publishing. In Proceedings of the 29th International Conference on VLDB (VLDB). Google Scholar
Digital Library
- Xiao, X. and Tao, Y. 2006. Anatomy: Simple and effective privacy preservation. In Proceedings of the 28th International Conference on VLDB (VLDB). Google Scholar
Digital Library
- Xiao, X., Wang, G., and Gehrke, J. 2010. Differential privacy via wavelet transforms. In Proceedings of the 26th International Conference on Data Engineering (ICDE).Google Scholar
- Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. W.-C. 2006. Utility-based anonymization using local recoding. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google Scholar
Digital Library
- Xu, Y., Wang, K., Fu, A. W.-C., and Yu, P. S. 2008. Anonymizing transaction databases for publication. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google Scholar
Digital Library
- Yarovoy, R., Bonchi, F., Lakshmanan, L. V. S., and Wang, W. H. 2009. Anonymizing moving objects: How to hide a MOB in a crowd? In Proceedings of the 12th International Conference on Expanding Database Technology: Advances in Database Technology (EDBT). Google Scholar
Digital Library
- Zeng, C., Naughton, J. F., and Cai, J.-Y. 2012. On differentially private frequent itemset mining. Proc. VLDB Endow. 6, 1, 25--36. Google Scholar
Digital Library
Index Terms
Efficient Time-Stamped Event Sequence Anonymization
Recommendations
Efficient and flexible anonymization of transaction data
Transaction data are increasingly used in applications, such as marketing research and biomedical studies. Publishing these data, however, may risk privacy breaches, as they often contain personal information about individuals. Approaches to anonymizing ...
A framework for efficient data anonymization under privacy and accuracy constraints
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-...
Transparent anonymization: Thwarting adversaries who know the algorithm
Numerous generalization techniques have been proposed for privacy-preserving data publishing. Most existing techniques, however, implicitly assume that the adversary knows little about the anonymization algorithm adopted by the data publisher. ...






Comments