Abstract
During the past several years, revealing some useful knowledge or protecting individual’s private information in an identifiable health dataset (i.e., within an Electronic Health Record) has become a tradeoff issue. Especially in this era of a global pandemic, security and privacy are often overlooked in lieu of usability. Privacy preserving data mining (PPDM) is definitely going to be have an important role to resolve this problem. Nevertheless, the scenario of mining information in an identifiable health dataset holds high complexity compared to traditional PPDM problems. Leaking individual private information in an identifiable health dataset has becomes a serious legal issue. In this article, the proposed Ant Colony System to Data Mining algorithm takes the multi-threshold constraint to secure and sanitize patents’ records in different lengths, which is applicable in a real medical situation. The experimental results show the proposed algorithm not only has the ability to hide all sensitive information but also to keep useful knowledge for mining usage in the sanitized database.
- Assad Abbas and Samee U. Khan. 2014. A review on the state-of-the-art privacy preserving approaches in the e-Health clouds. IEEE J. Biomed. Health Inf. 18 (2014), 1431–1441.Google Scholar
Cross Ref
- Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Data Bases. 487–499. Google Scholar
Digital Library
- Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 439–450. Google Scholar
Digital Library
- Mike Atallah, Elisa Bertino, Ahmed Elmagarmid, Mohamed Ibrahim, and Vassilios Verykios. 1999. Disclosure limitation of sensitive rules. In Proceedings of the Workshop on Knowledge and Data Engineering Exchange. 45–52. Google Scholar
Digital Library
- Min Chen, Yongfeng Qian, Jing Chen, Kai Hwang, Shiwen Mao, and Long Hu. 2016. Privacy protection and intrusion avoidance for cloudlet-based medical data sharing. IEEE Trans. Cloud Comput. 8, 4 (2016), 1274–1283.Google Scholar
Cross Ref
- Peng Cheng and Jeng Shyang Pan. 2014. Association rule hiding based on evolutionary multi-objective optimization by removing items. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 3100–3101. Google Scholar
Digital Library
- Elena Dasseni, Vassilios S. Verykios, Ahmed K. Elmagarmid, and Elisa Bertino. 2001. Hiding association rules by using confidence and support. In Proceedings of the International Workshop on Information Hiding. 369–383. Google Scholar
Digital Library
- Example database foodmart of Microsoft analysis services. [n.d.]. Retrieved from http://msdn. microsoft.com/en-us/library/aa217032(SQL.80).aspx.Google Scholar
- Nikunj Domadiya and Udai Pratap Rao. 2019. Privacy preserving distributed association rule mining approach on vertically partitioned healthcare data. Proc. Comput. Sci. 148 (2019), 303–312.Google Scholar
Digital Library
- Marco Dorigo and Luca Maria Gambardella. 1997. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1, 1 (1997), 53–66. Google Scholar
Digital Library
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.Google Scholar
- José Luis Fernández-Alemán, Inmaculada Carrión Señor, Pedro Ángel Oliver Lozoya, and Ambrosio Toval. 2013. Security and privacy in electronic health records: A systematic literature review. J. Biomed. Inf. 46 (2013), 541–562.Google Scholar
Cross Ref
- Wensheng Gan, Fournier-Viger Philippe Lin, Jerry Chun Wei, Han Chieh Chao, Vincent S. Tseng, and Philip S. Yu. 2019. A survey of utility-oriented pattern mining. IEEE Trans. Knowl. Data Eng. 33, 4 (2021), 1306–1327.Google Scholar
Cross Ref
- Wensheng Gan, Fournier-Viger Philippe Lin, Jerry Chun Wei, Han Chieh Chao, and Philip S. Yu. 2019. A survey of parallel sequential pattern mining. ACM Trans. Knowl. Discov. Data (2019), Article 25. Google Scholar
Digital Library
- Wensheng Gan, Jerry Chun Wei Lin, Han Chieh Chao, and Justin Zhan. 2017. Data mining in distributed environment: A survey. Data Min. Knowl. Discov. 7, 6 (2017), e1216.Google Scholar
- Wensheng Gan, Jerry Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, Tzung Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Data Min. Knowl. Discove. 8, 2 (2018), e1242. Google Scholar
Digital Library
- Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8 (2004), 53–87. Google Scholar
Digital Library
- Shuguo Han and Wee Keong Ng. 2007. Privacy-preserving genetic algorithms for rule discovery. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery. 407–417. Google Scholar
Digital Library
- Yaniv Harel, Irad Ben Gal, and Yuval Elovici. 2017. Cyber security and the role of intelligent systems in addressing its challenges. ACM Trans. Intell. Syst. Technol. 8 (2017), Article No. 49. Google Scholar
Digital Library
- John Henry Holland. 1992. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press. Google Scholar
Digital Library
- Tzung-Pei Hong, Chun Wei Lin, Kuo Tung Yang, and Shyue Liang Wang. 2013. Using TF-IDF to hide sensitive itemsets. Appl. Intell. 38, 4 (2013), 502–510. Google Scholar
Digital Library
- Md Zahidul Islam and Ljiljana Brankovic. 2011. Privacy preserving data mining: A noise addition framework using a novel clustering technique. Knowl.-Bas. Syst. 24, 8 (2011), 1214–1223. Google Scholar
Digital Library
- James Kenndy and R. C. Eberhart. 1995. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, Vol. 4. 1942–1948.Google Scholar
- James Kennedy and Russell C Eberhart. 1997. A discrete binary version of the particle swarm algorithm. In Proceedings of the International Conference on Systems, Man, and Cybernetics., Vol. 5. 4104–4108.Google Scholar
Cross Ref
- Jin Li, Yinghui Zhang, Xiaofeng Chen, and Yang Xiang. 2018. Secure attribute-based data sharing for resource-limited users in cloud computing. IEEE Trans. Multimedia 72 (2018), 1–12. Google Scholar
Digital Library
- Chun Wei Lin, Tzung Pei Hong, and Wen Hsiang Lu. 2009. The Pre-FUFP algorithm for incremental mining. Expert Syst. Appl. 36, 5 (2009), 9498–9505. Google Scholar
Digital Library
- Chun Wei Lin, Tzung Pei Hong, Kuo Tung Yang, and Shyue Liang Wang. 2015. The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl. Intell. 42, 2 (2015), 210–230. Google Scholar
Digital Library
- Chun Wei Lin, Binbin Zhang, Kuo Tung Yang, and Tzung Pei Hong. 2014. Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci. World J. 2014 (2014).Google Scholar
- Jerry Chun Wei Lin, Philippe Fournier-Viger, Lintai Wu, Wensheng Gan, Youcef Djenouri, and Ji Zhang. 2018. PPSF: An open-source privacy-preserving and security mining framework. In Proceedings of the International Conference on Data Mining Workshops. 1459–1463.Google Scholar
- Jerry Chun Wei Lin, Qiankun Liu, Philippe Fournier-Viger, Tzung Pei Hong, Miroslav Voznak, and Justin Zhan. 2016. A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng. Appl. Artif. Intell. 53 (2016), 1–18. Google Scholar
Digital Library
- Jerry Chun Wei Lin, Jimmy Ming-Tai Wu, Philippe Fournier-Viger, Youcef Djenouri, Chun Hao Chen, and Yuyu Zhang. 2019. A sanitization approach to secure shared data in an IoT environment. IEEE Access 7 (2019), 25359–25368.Google Scholar
Cross Ref
- Jerry Chun Wei Lin, Lu Yang, Philippe Fournier-Viger, and Tzung Pei Hong. 2019. Mining of skyline patterns by considering both frequent and utility constraints. Eng. Appl. Artif. Intell. 77 (2019), 229–238.Google Scholar
Cross Ref
- Jerry Chun Wei Lin, Yuyu Zhang, Binbin Zhang, Philippe Fournier-Viger, and Youcef Djenouri. 2019. Hiding sensitive itemsets with multiple objective optimization. Soft Comput. 23, 23 (2019), 12779–12797.Google Scholar
Cross Ref
- Chaobin Liu, Shixi Chen, Shuigeng Zhou, Jihong Guan, and Yao Ma. 2019. A novel privacy preserving method for data publication. Inf. Sci. 501 (2019), 421–435.Google Scholar
Digital Library
- Stanley R.M. Oliveira and Osmar R. Zaiane. 2002. Privacy preserving frequent itemset mining. In Proceedings of the International Conference on Privacy, Security and Data Mining, Vol. 14. 43–54. Google Scholar
Digital Library
- IBM quest synthetic data generator. [n.d.]. Retrieved from https://sourceforge.net/projects/ibmquestdatagen/.Google Scholar
- Frequent Itemset Mining Dataset Repository. [n.d.]. Retrieved from http://fimi.ua.ac.be/data/.Google Scholar
- G. Srivastava, J. C. Lin, M. Pirouz, Y. Li, and U. Yun. 2020. A pre-large weighted-fusion system of sensed high-utility patterns. IEEE Sens. J. (2020), 1–1.Google Scholar
- Vassilios S. Verykios, Elisa Bertino, Igor Nai Fovino, Loredana Parasiliti Provenza, Yucel Saygin, and Yannis Theodoridis. 2004. State-of-the-art in privacy preserving data mining. ACM SIGMOD Rec. 33, 1 (2004), 50–57. Google Scholar
Digital Library
- Jimmy Ming Tai Wu, Jerry Chun Wei Lin, Philippe Fournier-Viger, Youcef Djenouri, Chun Hao Chen, and Zhongcui Li. 2019. A sanitization approach to secure shared data in an IoT environment. Math. Biosci. Eng. 16 (2019), 1718–1728.Google Scholar
- Jimmy Ming Tai Wu, Justin Zhan, and Jerry Chun Wei Lin. 2017. Ant colony system sanitization approach to hiding sensitive itemsets. IEEE Access 5 (2017), 10024–10039.Google Scholar
Cross Ref
- Tsu Yang Wu, Jerry Chun Wei Lin Lin, Yuyu Zhang, and Chun Hao Chen. 2019. A grid-based swarm intelligence algorithm for privacy-preserving data mining. Appl. Intell. 9 (2019), 1–20.Google Scholar
- Ji Jiang Yang, Jian Qiang Li, and Yu Niu. 2015. A hybrid solution for privacy preserving medical data sharing in the cloud environment. Fut. Gener. Comput. Syst. 43–44 (2015), 74–86. Google Scholar
Digital Library
- Kan Yang, Zhen Liu, Xiaohua Jia, and Xuemin Shen. 2016. Time-domain attribute-based access control for cloud-based video content sharing: A cryptographic approach. IEEE Trans. Multimedia 18 (2016), 940–950. Google Scholar
Digital Library
- A. N. K. Zaman, Charlie Obimbo, and Rozita A. Dara. 2017. An improved data sanitization algorithm for privacy preserving medical data publishing. In Proceedings of the Canadian AI Conference. 64–70.Google Scholar
Index Terms
A Multi-Threshold Ant Colony System-based Sanitization Model in Shared Medical Environments
Recommendations
A sanitization approach for hiding sensitive itemsets based on particle swarm optimization
Privacy-preserving data mining (PPDM) has become an important research field in recent years, as approaches for PPDM can discover important information in databases, while ensuring that sensitive information is not revealed. Several algorithms have been ...
Two new techniques for hiding sensitive itemsets and their empirical evaluation
DaWaK'06: Proceedings of the 8th international conference on Data Warehousing and Knowledge DiscoveryMany privacy preserving data mining algorithms attempt to selectively hide what database owners consider as sensitive. Specifically, in the association-rules domain, many of these algorithms are based on item-restriction methods; that is, removing items ...
(l1, ..., lq)-diversity for Anonymizing Sensitive Quasi-Identifiers
TRUSTCOM '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 01A lot of studies of privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but not sensitive ...






Comments