skip to main content
research-article

Deep-Confidentiality: An IoT-Enabled Privacy-Preserving Framework for Unstructured Big Biomedical Data

Published:10 November 2021Publication History
Skip Abstract Section

Abstract

Due to the Internet of Things evolution, the clinical data is exponentially growing and using smart technologies. The generated big biomedical data is confidential, as it contains a patient’s personal information and findings. Usually, big biomedical data is stored over the cloud, making it convenient to be accessed and shared. In this view, the data shared for research purposes helps to reveal useful and unexposed aspects. Unfortunately, sharing of such sensitive data also leads to certain privacy threats. Generally, the clinical data is available in textual format (e.g., perception reports). Under the domain of natural language processing, many research studies have been published to mitigate the privacy breaches in textual clinical data. However, there are still limitations and shortcomings in the current studies that are inevitable to be addressed. In this article, a novel framework for textual medical data privacy has been proposed as Deep-Confidentiality. The proposed framework improves Medical Entity Recognition (MER) using deep neural networks and sanitization compared to the current state-of-the-art techniques. Moreover, the new and generic utility metric is also proposed, which overcomes the shortcomings of the existing utility metric. It provides the true representation of sanitized documents as compared to the original documents. To check our proposed framework’s effectiveness, it is evaluated on the i2b2-2010 NLP challenge dataset, which is considered one of the complex medical data for MER. The proposed framework improves the MER with 7.8% recall, 7% precision, and 3.8% F1-score compared to the existing deep learning models. It also improved the data utility of sanitized documents up to 13.79%, where the value of the k is 3.

REFERENCES

  1. [1] Luo Ligang, Li Liping, Hu Jiajia, Wang Xiaozhe, Hou Boulin, Zhang Tianze, and Zhao Lue Ping. 2016. A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system. BMC Medical Informatics and Decision Making 16, 1 (2016), 114.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Mailman Matthew D., Feolo Michael, Jin Yumi, Kimura Masato, Tryka Kimberly, Bagoutdinov Rinat, Hao Luning, et al. 2007. The NCBI dbGaP database of genotypes and phenotypes. Nature Genetics 39, 10 (2007), 11811186.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Ollier William, Sprosen Tim, and Peakman Tim. 2005. UK Biobank: From concept to reality. Pharmacogenomics 6, 6 (2005), 639–646.Google ScholarGoogle Scholar
  4. [4] Tariq Noshina, Asim Muhammad, Al-Obeidat Feras, Farooqi Muhammad Zubair, Baker Thar, Hammoudeh Mohammad, and Ghafir Ibrahim. 2019. The security of big data in fog-enabled IoT applications including blockchain: A survey. Sensors 19, 8 (2019), 1788.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Act Accountability. 1996. Health insurance portability and accountability act of 1996. Public Law 104 (1996), 191.Google ScholarGoogle Scholar
  6. [6] Carey Peter. 2018. Data Protection: A Practical Guide to UK and EU Law. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Sweeney Latanya. 2002. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 05 (2002), 557570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Machanavajjhala Ashwin, Kifer Daniel, Gehrke Johannes, and Venkitasubramaniam Muthuramakrishnan. 2007. L-Diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 3–es. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Li Ninghui, Li Tiancheng, and Venkatasubramanian Suresh. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering. IEEE, Los Alamitos, CA, 106115.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy.Foundations and Trends in Theoretical Computer Science 9, 3–4 (2014), 211407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Moqurrab Syed Atif, Anjum Adeel, Manzoor Umar, Nefti Samia, Ahmad Naveed, and Malik Saif Ur Rehman. 2017. Differential average diversity: An efficient privacy mechanism for electronic health records. Journal of Medical Imaging and Health Informatics 7, 6 (2017), 11771187.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Iwendi Celestine, Moqurrab Syed Atif, Anjum Adeel, Khan Sangeen, Mohan Senthilkumar, and Srivastava Gautam. 2020. N-Sanitization: A semantic privacy-preserving framework for unstructured medical datasets. Computer Communications 161 (2020), 160–171.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Sánchez David and Batet Montserrat. 2017. Toward sensitive document release with privacy guarantees. Engineering Applications of Artificial Intelligence 59 (2017), 2334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Batet Montserrat and Sánchez David. 2020. Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content. Artificial Intelligence Review 53 (2020), 20232041.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Sanchez David, Batet Montserrat, and Viejo Alexandre. 2013. Automatic general-purpose sanitization of textual documents. IEEE Transactions on Information Forensics and Security 8, 6 (2013), 853862. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Sánchez David, Batet Montserrat, and Viejo Alexandre. 2013. Minimizing the disclosure risk of semantic correlations in document sanitization. Information Sciences 249 (2013), 110123.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Sánchez David, Batet Montserrat, and Viejo Alexandre. 2014. Utility-preserving privacy protection of textual healthcare documents. Journal of Biomedical Informatics 52 (2014), 189198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Batet Montserrat and Sánchez David. 2014. Privacy protection of textual medical documents. In Proceedings of the 2014 IEEE Network Operations and Management Symposium (NOMS’14). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Sánchez David and Batet Montserrat. 2016. C-sanitized: A privacy model for document redaction and sanitization. Journal of the Association for Information Science and Technology 67, 1 (2016), 148163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Uzuner Özlem, South Brett R., Shen Shuying, and DuVall Scott L.. 2011. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association 18, 5 (2011), 552556.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Uzuner Özlem, Solti Imre, and Cadag Eithon. 2010. Extracting medication information from clinical text. Journal of the American Medical Informatics Association 17, 5 (2010), 514518.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Sun Weiyi, Rumshisky Anna, and Uzuner Ozlem. 2013. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. Journal of the American Medical Informatics Association 20, 5 (2013), 806813.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Stubbs Amber, Kotfila Christopher, and Uzuner Özlem. 2015. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. Journal of Biomedical Informatics 58 (2015), S11–S19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] UzZaman Naushad, Llorens Hector, Derczynski Leon, Allen James, Verhagen Marc, and Pustejovsky James. 2013. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics (* SEM’13), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval’13). 19.Google ScholarGoogle Scholar
  25. [25] Kelly Liadh, Goeuriot Lorraine, Suominen Hanna, Schreck Tobias, Leroy Gondy, Mowery Danielle L., Velupillai Sumithra, et al. 2014. Overview of the ShARe/CLEF eHealth evaluation lab 2014. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages. 172191.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Suominen Hanna, Salanterä Sanna, Velupillai Sumithra, Chapman Wendy W., Savova Guergana, Elhadad Noemie, Pradhan Sameer, et al. 2013. Overview of the ShARe/CLEF eHealth evaluation lab 2013. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages. 212231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Pradhan Sameer, Chapman Wendy, Man Suresh, and Savova Guergana. 2014. SemEval-2014 task 7: Analysis of clinical text. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14).Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Bethard Steven, Derczynski Leon, Savova Guergana, Pustejovsky James, and Verhagen Marc. 2015. SemEval-2015 task 6: Clinical TempEval. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 806814.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Bethard Steven, Savova Guergana, Chen Wei-Te, Derczynski Leon, Pustejovsky James, and Verhagen Marc. 2016. SemEval-2016 task 12: Clinical TempEval. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 10521062.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Ponomareva Natalia, Pla Ferran, Molina Antonio, and Rosso Paolo. 2007. Biomedical named entity recognition: A poor knowledge HMM-based approach. In Proceedings of the International Conference on Application of Natural Language to Information Systems. 382387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Zhang Shaodian and Elhadad Noémie. 2013. Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. Journal of Biomedical Informatics 46, 6 (2013), 10881098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Mohammed Ahmed Sultan Al-Hegami Ameen and Bagash Farea Othman Fuad Tarbosh. 2017. A biomedical named entity recognition using machine learning classifiers and rich feature set. IJCSNS 17, 1 (2017), 170.Google ScholarGoogle Scholar
  33. [33] Tsai Tzong-Han, Wu Shih-Hung, and Hsu Wen-Lian. 2005. Exploitation of linguistic features using a CRF-based biomedical named entity recognizer. In Proceedings of BioLINK, Vol. 2005.Google ScholarGoogle Scholar
  34. [34] Lyu Chen, Chen Bo, Ren Yafeng, and Ji Donghong. 2017. Long short-term memory RNN for biomedical named entity recognition. BMC Bioinformatics 18, 1 (2017), 462.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Zhu Qile, Li Xiaolin, Conesa Ana, and Pereira Cécile. 2018. GRAM-CNN: A deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics 34, 9 (2018), 15471554.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Habibi Maryam, Weber Leon, Neves Mariana, Wiegandt David Luis, and Leser Ulf. 2017. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33, 14 (2017), i37–i48.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Unanue Inigo Jauregi, Borzeshi Ehsan Zare, and Piccardi Massimo. 2017. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition. Journal of Biomedical Informatics 76 (2017), 102109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Zhu Henghui, Paschalidis Ioannis Ch., and Tahmasebi Amir. 2018. Clinical concept extraction with contextual word embedding. arxiv:1810.10566Google ScholarGoogle Scholar
  39. [39] Si Yuqi, Wang Jingqi, Xu Hua, and Roberts Kirk. 2019. Enhancing clinical concept extraction with contextual embeddings. Journal of the American Medical Informatics Association 26, 11 (July 2019), 12971304. DOI: DOI: http://dx.doi.org/10.1093/jamia/ocz096Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Lee Jinhyuk, Yoon Wonjin, Kim Sungdong, Kim Donghyeon, Kim Sunkyu, So Chan Ho, and Kang Jaewoo. 2019. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (Sept. 2019), 1234–1240. DOI: DOI: http://dx.doi.org/10.1093/bioinformatics/btz682Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Yu Lei, Liu Ling, Pu Calton, Gursoy Mehmet Emre, and Truex Stacey. 2019. Differentially private model publishing for deep learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP’19). IEEE, Los Alamitos, CA, 332349.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Alawad Mohammed, Yoon Hong-Jun, Gao Shang, Mumphrey Brent, Wu Xiao-Cheng, Durbin Eric B., Jeong Jong Cheol, et al. 2020. Privacy-preserving deep learning NLP models for cancer registries. IEEE Transactions on Emerging Topics in Computing 9, 3 (2020), 12191230.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Fan Lixin, Ng Kam Woh, Ju Ce, Zhang Tianyu, Liu Chang, Chan Chee Seng, and Yang Qiang. 2020. Rethinking privacy preserving deep learning: How to evaluate and thwart privacy attacks. arxiv:2006.11601Google ScholarGoogle Scholar
  44. [44] Mirshghallah Fatemehsadat, Taram Mohammadkazem, Vepakomma Praneeth, Singh Abhishek, Raskar Ramesh, and Esmaeilzadeh Hadi. 2020. Privacy in deep learning: A survey. arxiv:2004.12254Google ScholarGoogle Scholar
  45. [45] Chakaravarthy Venkatesan T., Gupta Himanshu, Roy Prasan, and Mohania Mukesh K.. 2008. Efficient techniques for document sanitization. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 843852. DOI: DOI: http://dx.doi.org/10.1145/1458082.1458194 Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Cumby Chad and Ghani Rayid. 2011. A machine learning based system for semi-automatically redacting documents. In Proceedings of the 23rd IAAI Conference.Google ScholarGoogle Scholar
  47. [47] Anandan Balamurugan, Clifton Chris, Jiang Wei, Murugesan Mummoorthy, Pastrana-Camacho Pedro, and Si Luo. 2012. t-Plausibility: Generalizing words to desensitize text.Transactions on Data Privacy 5, 3 (2012), 505534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Qin Ying and Zeng Yingfei. 2018. Research of clinical named entity recognition based on bi-LSTM-CRF. Journal of Shanghai Jiaotong University (Science) 23, 3 (2018), 392397.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Liu Zengjian, Yang Ming, Wang Xiaolong, Chen Qingcai, Tang Buzhou, Wang Zhe, and Xu Hua. 2017. Entity recognition from clinical texts via recurrent neural network. BMC Medical Informatics and Decision Making 17, 2 (2017), 67.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Chollet. François2015. Keras. Retrieved September 22, 2021 from https://github.com/fchollet/keras.Google ScholarGoogle Scholar
  51. [51] Kozareva Zornitsa. 2006. Bootstrapping named entity recognition with automatically generated gazetteer lists. In Proceedings of the Student Research Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Hospital Beaumont. 2020. Home Page. Retrieved May 4, 2020 from http://www.beaumont.ie/.Google ScholarGoogle Scholar

Index Terms

  1. Deep-Confidentiality: An IoT-Enabled Privacy-Preserving Framework for Unstructured Big Biomedical Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 22, Issue 2
        May 2022
        582 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3490674
        • Editor:
        • Ling Liu
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 November 2021
        • Revised: 1 August 2020
        • Accepted: 1 August 2020
        • Received: 1 June 2020
        Published in toit Volume 22, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!