skip to main content
research-article

A Novel Real-time Anti-spam Framework

Published:28 September 2021Publication History
Skip Abstract Section

Abstract

As one of the most pervasive current modes of communication, email needs to be fast and reliable. However, spammers and attackers use it as a primary channel to conduct illegal activities. Although many approaches have been developed and evaluated for spam detection, they do not provide sufficient accuracy. This deficiency results in significant economic losses for organizations. In this article, we first propose a framework for creating novel spam filters using Keras to combine a Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) classification models. We then use this framework to introduce a specific solution applicable to realistic scenarios involving dynamic incoming email data in real-time. This solution takes the form of a real-time content-based spam classifier. We evaluate its performance concerning accuracy, precision, recall, false-positive, and false-negative rates. Our experimental results show that our approach can significantly outperform existing solutions for real-time spam detection.

References

  1. N. Abroyan. 2017. Convolutional and recurrent neural networks for real-time data classification. In Proceedings of the 7th International Conference on Innovative Computing Technology (INTECH’17). 23–27. DOI:https://doi.org/10.1109/INTECH.2017.8102422Google ScholarGoogle ScholarCross RefCross Ref
  2. N. Abroyan and R. G. Hakobyan. 2016. A review of the usage of machine learning in real-time systems. Proceedings of NPUA Information Technologies Electronics Radio Engineering 19, 1 (2016), 46–54.Google ScholarGoogle Scholar
  3. A. Alghoul, S. A. Ajrami, G. A. Jarousha, G. Harb, and S. S. Abu-Naser. 2018. Email classification using artificial neural network. Int. J. Acad. Eng. Res. 2, 11 (2018).Google ScholarGoogle Scholar
  4. A. Almomani, S. Atawneh, A. Meulenberg, and E. Almomani. 2013. A survey of phishing email filtering techniques. IEEE Commun. Surv. Tutor. 15 (2013), 2070–2090.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. C. Atluri and V. Tran. 2017. Botnets threat analysis and detection. Inf. Secur. Pract. 123 (2017), 7–28. DOI:https://doi.org/org/10.1007/978-3-319-48947-6_2Google ScholarGoogle Scholar
  6. A. Barushka and P. Hajek. 2018. Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. Appl. Intell. 48, 10 (2018), 3538–3556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Barushka and P. Hajek. 2019. Review spam detection using word embeddings and deep neural networks. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI’19). 340–350.Google ScholarGoogle Scholar
  8. N. Bhargava, G. Sharma, R. Bhargava, and M. Mathuria. 2013. Decision tree analysis on j48 algorithm for data mining. Int. J. Adv. Res.Comput. Sci. Softw. Eng. 3, 6 (2013).Google ScholarGoogle Scholar
  9. A. Bhowmick and S. M. Hazarika. E-mail spam filtering: a review of techniques and trends. In Lecture Notes in Electrical Engineering, Vol.  443. pp. 583–590. DOI:https://doi.org/10.1007/978-981-10-4765-7_61Google ScholarGoogle Scholar
  10. J. Carson. 2018. Symantec Internet Security Threat Report 2018: The Top Takeaways. Retrieved from https://thycotic.com/company/blog/2018/04/17/symantec-internet-security-threat-report-2018/.Google ScholarGoogle Scholar
  11. K. Chen, X. Zou, X. Chen, and H. Wang. 2019. An automated online spam detector based on deep cascade forest). In Proceedings of the International Conference on Science of Cyber Security (SciSec’19). 33–46.Google ScholarGoogle Scholar
  12. V. Cihan and M. T. A. Hezha. 2019. Comparison of string matching algorithms on spam email detection. In Proceedings of the International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT’18). DOI:https://doi.org/10.1109/IBIGDELFT.2018.8625317Google ScholarGoogle Scholar
  13. E. G. Dada, J. S. Bassi, H. Chiroma, S. M. Abdulhamid, A. O. Adetunmbi, and O. E. Ajibuwa. 2019. Machine learning for email spam filtering: review, approaches and open research problems. Elsevier 5, 6 (2019). DOI:https://doi.org/10.1016/j.heliyon.2019.e01802Google ScholarGoogle Scholar
  14. Merriam-Webster Dictionary. 2018. Merriam-Webster Dictionary, “Spam - Definition Spam Use by Merriam-Webster.”Retrieved from https://thycotic.com/company/blog/2018/04/17/symantec-internet-security-threat-report-2018/.Google ScholarGoogle Scholar
  15. J. Fernandez-Conde, P. Cuenca-Jimenez, and R. Toldedo-Moreo. 2019. Improving scheduling performance of a real-time system by incorporation of an artificial intelligence planner. In Proceedings of the International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC’19). 127–136.Google ScholarGoogle Scholar
  16. H. A. Ghada and B. M. Rabiei. 2019. Comparison of four email classification algorithms using WEKA. Int. J. Comput. Sci. Inf. Secur. 17, 2 (2019).Google ScholarGoogle Scholar
  17. M. Granik and V. Mesyura. 2017. Fake news detection using naive Bayes classifier. In Proceedings of the IEEE 1st Ukraine Conference on Electrical and Computer Engineering (UKRCON’17). DOI:https://doi.org/10.1109/UKRCON.2017.8100379Google ScholarGoogle Scholar
  18. R. Team. 2015. Email Statistics Report. The Radicati Group, Inc. Palo Alto, CA, USA, Mar.2015.Google ScholarGoogle Scholar
  19. G. Jain, M. Sharma, and B. Agarwal. 2019. Optimizing semantic LSTM for spam detection. Int. J. Inf. Technol. 11 (2019), 239–250.Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Jiang, S. Feng, Q. Song, R. A. Calix, M. Gupta, and G. R. Bernard. 2018. Identifying tweets of personal health experience through word embedding and LSTM neural network.BMC Bioinf. 19, 210 (2018). DOI:https://doi.org/10.1186/s12859-018-2198-yGoogle ScholarGoogle Scholar
  21. S. Kadam, A. Gala, P. Gehlot, A. Kurup, and K. Ghag. 2018. Word embedding based multinomial naive Bayes algorithm for spam filtering. In Proceedings of the 4th International Conference on Computing Communication Control and Automation (ICCUBEA’18). DOI:https://doi.org/10.1109/ICCUBEA.2018.8697601Google ScholarGoogle Scholar
  22. Z. Khan and U. Qamar. 2016. Text mining approach to detect spam in emails. In Proceedings of the International Conference on Innovations in Intelligent Systems and Computing Technologies.Google ScholarGoogle Scholar
  23. R. Kiran and I. AtmosukartoS. 2005. Spam or Not Spam—That Is the Question. Technical Report. University of Washington.Google ScholarGoogle Scholar
  24. A. Kumar and J. P. Singh. 2018. Location reference identification from tweets during emergencies: A deep learning approach. Int. J. Disast. Risk Reduct. 33 (2018), 365–375. DOI:https://doi.org/10.1016/j.ijdrr.2018.10.021Google ScholarGoogle ScholarCross RefCross Ref
  25. D. Kwon, K. Natarajan, S. C. Suh, H. Kim, and J. Kim. 2018. An empirical study on network anomaly detection using convolutional neural networks. In Proceedings of the IEEE 38th International Conference on Distributed Computing Systems (ICDCS’18). DOI:https://doi.org/10.1109/ICDCS.2018.00178Google ScholarGoogle Scholar
  26. J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang. 2019. Learning under concept drift: A review. IEEE Trans. Knowl. Data Eng. 31, 12 (2019), 2346–2363. DOI:https://doi.org/10.1109/TKDE.2018.2876857Google ScholarGoogle Scholar
  27. M. Popovac, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla. 2018. Convolutional neural network based SMS spam detection. In Proceedings of the 26th Telecommunications Forum (TELFOR’18). DOI:https://doi.org/10.1109/TELFOR.2018.8611916Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Ramos. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning. 133–142.Google ScholarGoogle Scholar
  29. R. M. Ravindran and A. S. Thanamani. 2015. K-means document clustering using vector space model. Bonfring Int. J. Data Mining 5, 2 (2015), 10–14.Google ScholarGoogle ScholarCross RefCross Ref
  30. R. Ruskanda and F. Zakhralativa. 2019. Study on the effect of pre-processing methods for spam email detection. Indon. J. Comput.1 (2019), 109–118.Google ScholarGoogle Scholar
  31. N. Saidani, K. Adi, and M. S. Allili. 2017. A supervised approach for spam detection using text-based semantic representation. In Proceedings of the International Conference on E-Technologies (MCETECH’17), Vol.  289. Springer, Cham, 136–148.Google ScholarGoogle ScholarCross RefCross Ref
  32. G. Schryen. 2007. 2007 anti-spam measures. In 2Anti-Spam Measures. Springer, Berlin, 178. DOI:https://doi.org/10.1007/978-3-540-71750-8_4%7DGoogle ScholarGoogle Scholar
  33. P. Sethi, V. Bhandari, and B. Kohli. 2017. SMS spam detection and comparison of various machine learning algorithms. In Proceedings of the International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN’17).Google ScholarGoogle Scholar
  34. M. Shuaib, O. Osho, I. Ismaila, and J. K. Alhassan. 2018. Comparative analysis of classification algorithms for email spam detection. Int. J. Comput. Netw. Inf. Secur. 10, 1 (2018), 60–67.Google ScholarGoogle Scholar
  35. R. Varghese and K. A. Dhanya. 2017. Efficient feature set for spam Email filtering. In Proceedings of the IEEE 7th International Advance Computing Conference (IACC’17). 732–737.Google ScholarGoogle ScholarCross RefCross Ref
  36. L. Vrizlynn. 2017. IEEE 802.11 network anomaly detection and attack classification: a deep learning approach. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC’17). DOI:https://doi.org/10.1109/WCNC.2017.7925567Google ScholarGoogle Scholar
  37. Y. Wang, F. Yu, and Y. Wei. 2018. Research of email classification based on deep neural network. In Proceedings of the 2nd International Conference of Sensor Network and Computer Engineering (ICSNCE’18). DOI:https://doi.org/10.2991/icsnce-18.2018.16Google ScholarGoogle Scholar
  38. H. Yang, Q. Liu, S. Zhou, and Y. Luo. 2019. A spam filtering method based on multi-modal fusion. Appl. Sci. 9, 6 (2019). DOI:https://doi.org/10.3390/app9061152Google ScholarGoogle Scholar
  39. B. Zhang, G. Zhao, Y. Feng, X. Zhang, W. Jiang, J. Dai, and J. Gao. 2016. Behavior analysis based SMS spammer detection in mobile communication networks. In Proceedings of the IEEE 1st International Conference on Data Science in Cyberspace (DSC’16). 2–7.Google ScholarGoogle Scholar
  40. I. Zliobaite, M. Pechenizkiy, and J. Gama. 2016. An overview of concept drift applications. Big Data Anal.: New Algor. New Soc. 16 (2016), 91–114.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Novel Real-time Anti-spam Framework

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!