skip to main content
research-article

Efficient Time-Stamped Event Sequence Anonymization

Published:01 December 2013Publication History
Skip Abstract Section

Abstract

With the rapid growth of applications which generate timestamped sequences (click streams, GPS trajectories, RFID sequences), sequence anonymization has become an important problem, in that should such data be published or shared. Existing trajectory anonymization techniques disregard the importance of time or the sensitivity of events. This article is the first, to our knowledge, thorough study on time-stamped event sequence anonymization. We propose a novel and tunable generalization framework tailored to event sequences. We generalize time stamps using time intervals and events using a taxonomy which models the domain semantics. We consider two scenarios: (i) sharing the data with a single receiver (the SSR setting), where the receiver’s background knowledge is confined to a set of time stamps and time generalization suffices, and (ii) sharing the data with colluding receivers (the SCR setting), where time generalization should be combined with event generalization. For both cases, we propose appropriate anonymization methods that prevent both user identification and event prediction. To achieve computational efficiency and scalability, we propose optimization techniques for both cases using a utility-based index, compact summaries, fast to compute bounds for utility, and a novel taxonomy-aware distance function. Extensive experiments confirm the effectiveness of our approach compared with state of the art, in terms of information loss, range query distortion, and preserving temporal causality patterns. Furthermore, our experiments demonstrate efficiency and scalability on large-scale real and synthetic datasets.

References

  1. Abul, O., Bonchi, F., and Nanni, M. 2008. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C. C. and Yu, P. S. 2007a. On anonymization of string data. In Proceedings of the SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  3. Aggarwal, C. C. and Yu, P. S. 2007b. On privacy-preservation of text and sparse binary data with sketches. In Proceedings of the SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  4. Agichtein, E., Brill, E., Dumais, S., and Ragno, R. 2006. Learning user interaction models for predicting Web search result preferences. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. SIGMOD Records 29, 2, 439--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alon, N., Matias, Y., and Szegedy, M. 1996. The space complexity of approximating the frequency moments. In Proceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barbaro, M. and Zeller, T. 2006. A face is exposed for AOL searcher no. 4417749. The New York Times. August 9.Google ScholarGoogle Scholar
  8. Bayardo, R. and Agrawal, R. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bonchi, F., Lakshmanan, L. V. S., and Wang, W. H. 2011. Trajectory anonymity in publishing personal mobility data. SIGKDD Explor. 13, 1, 30--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brinkhoff, T. 2003. Generating traffic data. IEEE Data Eng. Bullet. 26, 2, 19--25.Google ScholarGoogle Scholar
  11. Chen, R., Acs, G., and Castelluccia, C. 2012a. Differentially private sequential data publication via variable-length n-grams. In Proceedings of the ACM Conference on Computer and Communications Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, R., Fung, B. C. M., Desai, B. C., and Sossou, N. M. 2012b. Differentially private transit data publication: A case study on the montreal transportation system. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 213--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, R., Fung, B. C. M., Mohammed, N., Desai, B. C., and Wang, K. 2013. Privacy-preserving trajectory data publishing by local suppression. Inf. Sci. 231, 83--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cheng, R., Zhang, Y., Bertino, E., and Prabhakar, S. 2006. Preserving user location privacy in mobile data management infrastructures. In Proceedings of the 6th Workshop on Privacy Enhancing Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chow, C.-Y. and Mokbel, M. F. 2011. Trajectory privacy in location-based services and data publication. SIGKDD Explor. 13, 1, 19--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on VLDB (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cormode, G., Srivastava, D., Li, N., and Li, T. 2010. Minimizing minimality and maximizing utility: Analyzing method-based attacks on anonymized data. Proc. VLDB Endow. 3, 1--2, 1045--1056. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Deshpande, M. and Karypis, G. 2004. Selective Markov models for predicting web page accesses. ACM Trans. Internet Technol. 4, 2, 163--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ding, B., Winslett, M., Han, J., and Li, Z. 2011. Differentially private data cubes: Optimizing noise sources and consistency. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dupret, G. E. and Piwowarski, B. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference (TCC). Lecture Notes in Computer Science, vol. 3876. Springer, Berlin Heidelberg. 265--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Friedman, J., Bentley, J. L., and Finkel, R. A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 209--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fung, B., M. Cao, M., Desai, B., and Xu, H. 2009. Privacy protection for RFID data. In Proceedings of the ACM Symposium on Applied Computing (SAC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ganta, S. R., Kasiviswanathan, S. P., and Smith, A. 2008. Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ghinita, G. 2009. Private queries and trajectory anonymization: A dual perspective on location privacy. Trans. Data Privacy 2, 1, 3--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ghinita, G., Karras, P., Kalnis, P., and Mamoulis, N. 2007. Fast data anonymization with low information loss. In Proceedings of the 33rd International Conference on VLDB (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ghinita, G., Kalnis, P., Khoshgozaran, A., Shahabi, C., and Tan, K.-L. 2008. Private queries in location based services: Anonymizers are not necessary. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Goel, S., Broder, A. Z., Gabrilovich, E., and Pang, B. 2010. Anatomy of the long tail: Ordinary people with extraordinary tastes. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Google. 2011. Google privacy FAQ. http://www.google.com/privacy/faq.html.Google ScholarGoogle Scholar
  30. Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., and Gehrke, J. 2009. Publishing search logs---A comparative study of privacy guarantees. IEEE Trans. Knowl. Data Eng. 24, 3, 520--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hadjieleftheriou, M., Kollios, G., Bakalov, P., and Tsotras, V. J. 2005. Complex spatio-temporal pattern queries. In Proceedings of the 31st International Conference on VLDB (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. He, Y. and Naughton, J. F. 2009. Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2, 1, 934--945. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Huo, Z., Meng, X., Hu, H., and Huang, Y. 2012. You can walk alone: Trajectory privacy-preserving through significant stays protection. In Proceedings of the 17th International Conference on Database Systems for Advanced Application (DASFAA). Lecture Notes in Computer Science, vol. 7238, Springer, Berlin Heidelberg. 351--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Iwuchukwu, T. and Naughton, J. F. 2007. K-anonymization as spatial indexing: Toward scalable and incremental anonymization. In Proceedings of the 29th International Conference on VLDB (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jones, R., Kumar, R., Pang, B., and Tomkins, A. 2007. “I know what you did last summer”: Query logs and user privacy. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kido, H., Yanagisawa, Y., and Satoh, T. 2005. An anonymous communication technique using dummies for location-based services. In Proceedings of the International Conference on Pervasive Services.Google ScholarGoogle Scholar
  38. Kifer, D. 2009. Attacks on privacy and definetti’s theorem. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kifer, D. and Gehrke, J. 2006. Injecting utility into anonymized datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kifer, D. and Machanavajjhala, A. 2011. No free lunch in data privacy. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Koren, Y. 2010. Collaborative filtering with temporal dynamics. Commun. ACM 53, 4, 89--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Korolova, A., Kenthapadi, K., Mishra, N., and Ntoulas, A. 2009. Releasing search queries and clicks privately. In Proceedings of the 18th International Conference on World Wide Web (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Ann. Math. Stat. 22, 1, 79--86.Google ScholarGoogle ScholarCross RefCross Ref
  44. LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Li, N., Li, T., and Venkatasubramanian, S. 2007. T-closeness: Privacy beyond K-anonymity and L-diversity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  46. Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. 2006. ℓ-diversity: Privacy beyond K-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Machanavajjhala, A., Kifer, D., Abowd, J. M., Gehrke, J., and Vilhuber, L. 2008. Privacy: Theory meets practice on the map. In Proceedings of the 24th International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Machanavajjhala, A., Korolova, A., and Sarma, A. D. 2011. Personalized social recommendations - accurate or private? Proc. VLDB Endow. 4, 7, 440--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Mahdavifar, S., Abadi, M., Kahani, M., and Mahdikhani, H. 2012. A clustering-based approach for personalized privacy preserving publication of moving object trajectory data. In Proceedings of the 6th International Conference on Network and System Security. 149--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mannila, H., Toivonen, H., and Inkeri, V. A. 1997. Discovery of frequent episodes in event sequences. Data Mining Knowl. Discov. 1, 3, 259--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Matthijs, N. and Radlinski, F. 2011. Personalizing Web search using long term browsing history. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Meyerson, A. and Williams, R. 2004. On the complexity of optimal K-anonymity. In Proceedings of the 23rd Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mokbel, M. F., Chow, C.-Y., and Aref, W. G. 2006. The new casper: Query processing for location services without compromising privacy. In Proceedings of the 28th International Conference on VLDB (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Monreale, A., Andrienko, G., Andrienko, N., Giannotti, F., Pedreschi, D., Rinzivillo, S., and Wrobe, S. 2010. Movement data anonymity through generalization. Trans. Data Privacy 3, 2, 27--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Moon, B., Jagadish, H. V., Faloutsos, C., and Saltz, J. H. 2001. Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Trans. Knowl. Data Eng. 13, 1, 124--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Nergiz, M. E., Atzori, M., Saygin, Y., and Güç, B. 2009. Towards trajectory anonymization: A generalization-based approach. Trans. Data Privacy 2, 1, 47--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Pang, H., Ding, X., and Xiao, X. 2010. Embellishing text search queries to protect user privacy. Proc. VLDB Endow. 3, 1--2, 598--607. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Papoulis, A. and Pillai, S. U. 2002. Probability, Random Variables and Stochastic Processes. McGraw Hill.Google ScholarGoogle Scholar
  59. Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 6, 1010--1027. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Seidl, T. and Kriegel, H.-P. 1998. Optimal multi-step k-nearest neighbor search. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Sherkat, R. and Rafiei, D. 2008. On efficiently searching trajectories and archival data for historical similarities. Proc. VLDB Endow. 1, 1, 896--908. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Sweeney, L. 2002. K-anonymity: A model for protecting privacy. Int. J. Uncertainty, Fuzziness Knowl.-Based Syst, 10, 5, 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Terrovitis, M. and Mamoulis, N. 2008. Privacy preservation in the publication of trajectories. In Proceedings of the 9th International Conference on Mobile Data Management (MDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Terrovitis, M., Mamoulis, N., and Kalnis, P. 2008. Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1, 1, 115--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Terrovitis, M., Liagouris, J., Mamoulis, N., and Skiadopoulos, S. 2012. Privacy preservation by disassociation. Proc. VLDB Endow. 5, 10, 944--955. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Wagstaff, K. and Cardie, C. 2000. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Wang, L. and Jiang, T. 1994. On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 4, 337--348.Google ScholarGoogle ScholarCross RefCross Ref
  68. Wong, R. C.-W., Fu, A. W., Wang, K., and Pei, J. 2007. Minimality attack in privacy preserving data publishing. In Proceedings of the 29th International Conference on VLDB (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Xiao, X. and Tao, Y. 2006. Anatomy: Simple and effective privacy preservation. In Proceedings of the 28th International Conference on VLDB (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Xiao, X., Wang, G., and Gehrke, J. 2010. Differential privacy via wavelet transforms. In Proceedings of the 26th International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  71. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. W.-C. 2006. Utility-based anonymization using local recoding. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Xu, Y., Wang, K., Fu, A. W.-C., and Yu, P. S. 2008. Anonymizing transaction databases for publication. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Yarovoy, R., Bonchi, F., Lakshmanan, L. V. S., and Wang, W. H. 2009. Anonymizing moving objects: How to hide a MOB in a crowd? In Proceedings of the 12th International Conference on Expanding Database Technology: Advances in Database Technology (EDBT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Zeng, C., Naughton, J. F., and Cai, J.-Y. 2012. On differentially private frequent itemset mining. Proc. VLDB Endow. 6, 1, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Time-Stamped Event Sequence Anonymization

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!