skip to main content
research-article
Open Access

A Parallel Algorithm For Anonymizing Large-scale Trajectory Data

Authors Info & Claims
Published:12 March 2020Publication History
Skip Abstract Section

Abstract

With the proliferation of location-based services enabled by a large number of mobile devices and applications, the quantity of location data, such as trajectories collected by service providers, is gigantic. If these datasets could be published, then they would be valuable assets to various service providers to explore business opportunities, to study commuter behavior for better transport management, which in turn benefits the general public for day-to-day commuting. However, there are two major concerns that considerably limit the availability and the usage of these trajectory datasets. The first is the threat to individual privacy, as users’ trajectories may be misused to discover sensitive information, such as home locations, their children’s school locations, or social information like habits or relationships. The other concern is the ability to analyze the exabytes of location data in a timely manner. Although there have been trajectory anonymization approaches proposed in the past to mitigate privacy concerns. None of these prior works address the scalability issue, since it is a newly occurring problem brought by the significantly increasing adoption of location-based services. In this article, we conquer these two challenges by designing a novel parallel trajectory anonymization algorithm that achieves scalability, strong privacy protection, and high utility rate of the anonymized trajectory datasets. We have conducted extensive experiments using MapReduce and Spark on real maps with different topologies, and our results prove both effectiveness and efficiency when compared with the centralized approaches.

References

  1. Osman Abul, Francesco Bonchi, and Mirco Nanni. 2008. Never Walk Alone: Uncertainty for anonymity in moving objects databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’08).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Francesco Bonchi and Hui Wendy Wang. 2011. Trajectory anonymity in publishing personal mobility data. Spec. Interest Group Knowl. Discov. Data Mining 13, 1 (2011), 30--42.Google ScholarGoogle Scholar
  3. Rui Chen, Benjamin C. M. Fung, Noman Mohammed, Bipin C. Desai, and Ke Wang. 2013. Privacy-preserving trajectory data publishing by local suppression. Info. Sci. 231 (2013), 83--97.Google ScholarGoogle Scholar
  4. Melanie Deal. 2016. Census Bureau Reports 471,000 Workers Commute into Los Angeles County, California, Each Day. Retrieved from http://www.census.gov/newsroom/press-releases/2013/cb13-r13.html.Google ScholarGoogle Scholar
  5. Josep Domingo-Ferrer and Rolando Trujillo-Rasua. 2012. Microaggregation- and permutation-based anonymization of movement data. Info. Sci. 208 (2012), 55--80.Google ScholarGoogle Scholar
  6. Ahmed Eldawy and Mohamed Mokbel. 2013. A demonstration of spatialhadoop: An efficient mapreduce framework for spatial data. Very Large Data Base 6, 12 (2013), 1230--1233.Google ScholarGoogle Scholar
  7. Apache Software Foundation. 2016. What is Apache Hadoop? Retrieved from http://hadoop.apache.org/.Google ScholarGoogle Scholar
  8. Hend Kamal Gedawy. 2009. Dynamic path planning and traffic light coordination for emergency vehicle routing. Carnegie Mellon University Thesis (2009), 1--9.Google ScholarGoogle Scholar
  9. Moein Ghasemzadeh, Benjamin C. M. Fung, Rui Chen, and Anjali Awasthi. 2014. Anonymizing trajectory data for passenger flow analysis. Transportation Research Part C 39 (2014), 63--79.Google ScholarGoogle ScholarCross RefCross Ref
  10. Marco Gruteser and Dirk Grunwald. 2003. Anonymous usage of location-based services through spatial and temporal cloaking. Proceedings of the 1st International Conference on Mobile Systems Applications and Services (MobiSys’03). Vol. 3, 31--42.Google ScholarGoogle Scholar
  11. Sashi Gurung, Dan Lin, Wei Jiang, Ali Hurson, and Rui Zhang. 2014. Traffic information publication with privacy preservation. ACM Trans. Intell. Syst. Technol. 5, 3 (2014), 1--26. DOI:https://doi.org/10.1145/2542666Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alon Y. Halevy, Michael J. Franklin, and David Maier. 2009. TRUSTER:TRajectory data processting on ClUSTERs. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA’09). 768--771. DOI:https://doi.org/10.1007/11733836Google ScholarGoogle Scholar
  13. Pin-I Han and Hsiao-Ping Tsai. 2015. SST: Privacy preserving for semantic trajectories. In Proceedings of the 16th IEEE International Conference on Mobile Data Management, Vol. 2. 80--85. DOI:https://doi.org/10.1109/MDM.2015.18Google ScholarGoogle Scholar
  14. Xi He, Graham Cormode, Ashwin Machanavajjhala, Cecilia M. Procopiuc, and Divesh Srivastava. 2015. DPT: Differentially private trajectory synthesis using hierarchical reference systems. Proc. Very Large Data Base Endow. 8, 11 (2015), 1154--1165. DOI:https://doi.org/2150-8097/15/07Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. S. Jensen, D. Lin, and B. C. Ooi. 2007. Continuous clustering of moving objects. IEEE Trans. Knowl. Data Eng. 19, 9 (2007), 1161--1174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xun Li, Wenwen Li, Luc Anselin, Sergio Rey, and Kochinsky. 2014. A mapreduce algorithm to create contiguity weights for spatial analysis of big data. In Proceedings of the International Workshop on Analytics for Big Spatial Data (BigSpatial’14).Google ScholarGoogle Scholar
  17. Dan Lin, Elisa Bertino, Reynold Cheng, and Sunil Prabhakar. 2008. Position transformation: A location privacy protection method for moving objects. In Proceedings of the SIGSPATIAL ACM International Conference on Advances in Geographic Information Systems (GIS’08).Google ScholarGoogle Scholar
  18. Dan Lin, Elisa Bertino, Reynold Cheng, and Sunil Prabhakar. 2009. Location privacy in moving-object environments. Trans. Data Privacy 2, 1 (2009), 21--46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Anna Monreale, Dino Pedreschi, Ruggero G. Pensa, and Fabio Pinelli. 2014. Anonymity Preserving Sequential Pattern Mining, Vol. 22. 141--173. DOI:https://doi.org/10.1007/s10506-014-9154-6Google ScholarGoogle Scholar
  20. Mehmet Ercan Nergiz, Maurizio Atzori, Yucel Saygin, and Baris Guc. 2009. Towards trajectory anonymization a generalization-based approach. Trans. Data Priv. 2, 106 (2009), 47--75. DOI:https://doi.org/10.1145/1503402.1503413Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ruggero G. Pensa, Anna Monreale, Fabio Pinelli, and Dino Pedreschi. 2008. Pattern-preserving k-anonymization of sequences and its application to mobility data mining. CEUR Workshop Proc. 397 (2008), 44--60.Google ScholarGoogle Scholar
  22. Giorgos Poulis, Spiros Skiadopoulos, Grigorios Loukides, and Aris Gkoulala-Divanis. 2013. Select-organize-anonymize: A framework for trajectory data anonymization. Proceedings of the IEEE 13th International Conference on Data Mining Workshops (ICDMW’13). 867--874. DOI:https://doi.org/10.1109/ICDMW.2013.136Google ScholarGoogle Scholar
  23. Giorgos Poulis, Spiros Skiadopoulos, Grigorios Loukides, and Aris Gkoulalas. 2014. A priori-based algorithms for km-anonymizing trajectory data. Trans. Data Priv. 7, 2 (2014), 165--194.Google ScholarGoogle Scholar
  24. Giorgos Poulis, Spiros Skiadopoulos, Grigorios Loukides, and Aris Gkoulalas-Divanis. 2013. Distance-based km-anonymization of trajectory data. Proceedings of the IEEE International Conference on Mobile Data Management, Vol. 2. 57--62. DOI:https://doi.org/10.1109/MDM.2013.66Google ScholarGoogle Scholar
  25. Swaminathan Sankararaman, Pankaj Agarwal, Thomas Molhave, Jiangwei Pan, and Arnold Boedihardjo. 2013. Model-driven matching and segmentation of trajectories. In Proceedings of the ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL’13).Google ScholarGoogle Scholar
  26. Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Kai Zheng, and Panos Kalnis. 2017. Trajectory similarity join in spatial networks. Proc. Very Large Data Base Endow. 10, 11 (Aug. 2017), 1178--1189.Google ScholarGoogle Scholar
  27. Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Kai Zheng, and Panos Kalnis. 2018. Parallel trajectory similarity joins in spatial networks. Very Large Data Base J. 27, 3 (June 2018), 395--420.Google ScholarGoogle Scholar
  28. Executive Summary. 2014. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2013--2018. Retrieved from http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white_paper_c11-520862.html.Google ScholarGoogle Scholar
  29. Weina Wang, Lei Ying, and Junshan Zhang. 2014. On the tradeoff between privacy and distortion in differential privacy. In Proceedings of the Special Interest Group on Knowledge Discovery and Data Mining (KDD’14). 517--525. Retrieved from http://arxiv.org/abs/1402.3757.Google ScholarGoogle Scholar
  30. Katrina Ward, Dan Lin, and Sanjay Madria. 2017. MELT: Mapreduce-based efficient large-scale trajectory anonymization. In Proceedings of the International Conference on Scientific 8 Statistical Database Management (SSDBM’17).Google ScholarGoogle Scholar
  31. Roman Yarovoy, Francesco Bonchi, Laks V. S. Lakshmanan, and Wendy Hui Wang. 2009. Anonymizing moving objects: How to hide a MOB in a crowd? Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology (EDBT’09). 72--83. DOI:https://doi.org/10.1145/1516360.1516370Google ScholarGoogle Scholar
  32. Weizhong Zhao, Huifang Ma, and Qing He. 2009. Parallel k-means clustering based on mapreduce. In Cloud Computing. Springer, Berlin, 674--679.Google ScholarGoogle Scholar
  33. Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. Mining interesting locations and travel sequences from GPS trajectories. In ACM Press. 791--800.Google ScholarGoogle Scholar
  34. Kathryn Zixkhur. 2013. Location-based Services. Retrieved from http://www.pewinternet.org/2013/09/12/location-based-services.Google ScholarGoogle Scholar

Index Terms

  1. A Parallel Algorithm For Anonymizing Large-scale Trajectory Data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM/IMS Transactions on Data Science
            ACM/IMS Transactions on Data Science  Volume 1, Issue 1
            February 2020
            159 pages
            ISSN:2691-1922
            DOI:10.1145/3388324
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 March 2020
            • Accepted: 1 March 2019
            • Revised: 1 February 2019
            • Received: 1 May 2018
            Published in tds Volume 1, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!