Abstract
This article proposes an improved Bayesian scheme by focusing on the region in which Bayesian may fail to correctly identify labels and improve classification performance by handling those errors. Bayesian method, as a probabilistic classifier, uses Bayes’ theorem to calculate the probability of an instance belonging to a class, where the class label with a maximum probability is assigned to the instance. In a spam detection problem, it can be considered that the prediction of the Bayesian classifier is weak when the probability obtained for classes spam and non-spam are close to each other. Therefore, we define a threshold to determine weak prediction against strong prediction. A hybrid strategy using a two-layer Bayesian approach is presented: basic Bayesian (BBayes) and corrected weak region Bayesian (CWRBayes), which are concerned with strong and weak predictions, respectively. Both techniques, BBayes and CWRBayes, have the same classification mechanism, but they use different feature selection mechanisms. The proposed methods are implemented and evaluated over two datasets of spam e-mails, and the results show that the proposed method has better performance than the baseline of the naïve Bayesian and some other Bayesian variants.
- [1] . 2018. Security and privacy in smart health: Efficient policy-hiding attribute-based access control. IEEE Internet of Things Journal 5, 3 (2018), 2130–2145.Google Scholar
Cross Ref
- [2] . 2018. A novel security scheme based on instant encrypted transmission for internet of things. Security and Communication Networks 2018 (2018), 1–7.Google Scholar
Digital Library
- [3] . 2010. E-mail classification using data reduction method. In Proceedings of the 2010 5th International ICST Conference on Communications and Networking in China. IEEE, 1–5.Google Scholar
- [4] . 2020. Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Applied Soft Computing 91 (2020), 106229.Google Scholar
Cross Ref
- [5] . 2020. Hybrid e-mail spam detection model using artificial intelligence. International Journal of Machine Learning and Computing 10, 2 (2020), 316–322.Google Scholar
Cross Ref
- [6] . 2009. A review of machine learning approaches to spam filtering. Expert Systems with Applications 36, 7 (2009), 10206–10222.Google Scholar
Digital Library
- [7] . 2017. Optimization based clustering algorithms for authorship analysis of phishing e-mails. Neural Processing Letters 46, 2(2017), 411–425.
DOI: Google ScholarDigital Library
- [8] . 2006. Using online linear classifiers to filter spam e-mails. Pattern Analysis and Applications 9, 4 (2006), 339–351.Google Scholar
Digital Library
- [9] . 2000. An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Association for Computing Machinery, New York, NY, 160–167.
DOI: Google ScholarDigital Library
- [10] . 2004. Phrases and feature selection in e-mail classification. In Proceedings of the ADCS. 59–62.Google Scholar
- [11] . 2004. The impact of feature selection on signature-driven spam detection. In Proceedings of the 1st Conference on E-mail and Anti-Spam.Google Scholar
- [12] . 2006. Comparative Study of Features Space Reduction Techniques for Spam Detection. Master’s thesis. National University of Sciences and Technology, Pakistan.Google Scholar
- [13] . 2009. A novel spam e-mail detection system based on negative selection. In Proceedings of the 2009 4th International Conference on Computer Sciences and Convergence Information Technology. IEEE, 987–992.Google Scholar
Digital Library
- [14] . 2012. Comparative study on e-mail spam classifier using data mining techniques. In Proceedings of the International Multiconference of Engineers and Computer Scientists. 14–16.Google Scholar
- [15] . 2016. Proposed efficient algorithm to filter spam using machine learning techniques. Pacific Science Review A: Natural Science and Engineering 18, 2 (2016), 145–149.Google Scholar
Cross Ref
- [16] . 2017. Spam mail detection through data mining techniques. In Proceedings of the 2017 International Conference on Intelligent Communication and Computational Techniques. IEEE, 61–64.Google Scholar
Cross Ref
- [17] . 2016. Spammers operations: A multifaceted strategic analysis. Security and Communication Networks 9, 4 (2016), 336–356.Google Scholar
Digital Library
- [18] . 2020. Header based e-mail spam detection framework using support vector machine (SVM) technique. In Proceedings of the International Conference on Soft Computing and Data Mining. Springer, 57–65.Google Scholar
- [19] . 2020. Text-based spam tweets detection using neural networks. In Proceedings of the Advances in Computing and Intelligent Systems. Springer, 401–408.Google Scholar
Cross Ref
- [20] . 2004. Spam mail detection using artificial neural network and Bayesian filter. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer, 505–510.Google Scholar
- [21] . 2019. Spam detection using ensemble learning. In Proceedings of the Harmony Search and Nature Inspired Optimization Algorithms. Springer, 661–668.Google Scholar
Cross Ref
- [22] . 2020. Detection and classification of legitimate and spam e-mails using k-nearesest. International Journal of Computer Applications 175, 18 (2020), 28–32.Google Scholar
Cross Ref
- [23] . 2012. Spam detection using random boost. Pattern Recognition Letters 33, 10 (2012), 1237–1244.Google Scholar
Digital Library
- [24] . 2020. Fake e-mail and spam detection: User feedback with Naives Bayesian approach. In Proceedings of the International Conference on Computational Science and Applications. Springer, 41–47.Google Scholar
Cross Ref
- [25] . 2007. Spam filtering for short messages. In Proceedings of the 16th ACM Conference on Information and Knowledge Management.Association for Computing Machinery, New York, NY,313–320.
DOI: Google ScholarDigital Library
- [26] . 2020. Visual spoofing in content based spam detection. arXiv:2004.05265. Retrieved from https://arxiv.org/abs/2004.05265.Google Scholar
- [27] . 2021. Crafting adversarial e-mail content against machine learning based spam e-mail detection. In Proceedings of the 2021 International Symposium on Advanced Security on Software and Systems. 23–28.Google Scholar
Digital Library
- [28] . 2020. Spam detection in social networks based on peer acceptance. In Proceedings of the Australasian Computer Science Week Multiconference.Association for Computing Machinery, New York, NY, 7 pages.
DOI: Google ScholarDigital Library
- [29] . 2016. A Bayesian classification approach using class-specific features for text categorization. IEEE Transactions on Knowledge and Data Engineering 28, 6 (2016), 1602–1606.
DOI: Google ScholarDigital Library
- [30] . 2003. Tackling the poor assumptions of Naive Bayes text classifiers. In Proceedings of the 20th International Conference on Machine Learning. 616–623.Google Scholar
Digital Library
- [31] . 2016. A term frequency based weighting scheme using Naïve Bayes for text classification. Journal of Computational and Theoretical Nanoscience 13, 1 (2016), 319–326.Google Scholar
Cross Ref
- [32] . 2014. Attribute weighted Naive Bayes classifier using a local optimization. Neural Computing and Applications 24, 5 (2014), 995–1002.
DOI: Google ScholarCross Ref
- [33] . 2000. Pattern Classification (2nd. Ed.). Wiley-Interscience.Google Scholar
Digital Library
- [34] . 1975. Statistical Prediction Analysis. Cambridge University Press.
DOI: Google ScholarCross Ref
- [35] . 2011. NB+: An improved naive Bayesian algorithm. Knowledge-based Systems 24, 5 (2011), 563–569.Google Scholar
Digital Library
- [36] . 2001. An improved naive Bayesian classifier technique coupled with a novel input solution method [rainfall prediction]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 31, 2 (2001), 249–256.Google Scholar
Digital Library
- [37] . 1994. Induction of selective Bayesian classifiers. Uncertainty Proceedings 1994. Morgan Kaufmann.Google Scholar
Cross Ref
- [38] . 2005. Learning lazy naive Bayesian classifiers for ranking. In Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence. IEEE, 5–pp.Google Scholar
- [39] . 2004. Boosting Naive Bayes by active learning. In Proceedings of the 2004 International Conference on Machine Learning and Cybernetics. 1383–1386
DOI: Google ScholarCross Ref
- [40] . 2016. Assessing sandy beach macrofaunal patterns along large-scale environmental gradients: A Fuzzy Naïve Bayes approach. Estuarine Coastal and Shelf Science 175, June (2016), 70–78.
DOI: Google ScholarCross Ref
- [41] . 1998. Naive Bayesian classifier committees. In Proceedings of the 10th European Conference on Machine Learning. Springet-Verlag, 196–207.Google Scholar
Digital Library
- [42] . 2002. A method to boost Naïve Bayesian classifiers. In Proceedings of the Advances in Knowledge Discovery and Data Mining. , , and (Eds.), Springer, Berlin, 115–122.Google Scholar
Cross Ref
- [43] . 2003. A decomposition of classes via clustering to explain and improve Naive Bayes. In Proceedings of the Machine Learning: ECML 2003. , , , and (Eds.), Springer, Berlin, 444–455.Google Scholar
Digital Library
- [44] . 1999. Improving Naive Bayes classifiers using neuro-fuzzy learning. In Proceedings of the ICONIP’99. ANZIIS’99 & ANNES’99 & ACNN’99. 6th International Conference on Neural Information Processing. IEEE, 154–159.Google Scholar
Cross Ref
- [45] . 2008. Semi-naive Bayesian Classification. Citeseer.Google Scholar
- [46] . 2010. Semi-Naive Bayesian Learning. Springer US, Boston, MA, 889–892.
DOI: Google ScholarCross Ref
- [47] . 2020. Adversarial machine learning for spam filters. In Proceedings of the 15th International Conference on Availability, Reliability, and Security. 1–6.Google Scholar
Digital Library
- [48] . 2014. Cost-sensitive three-way e-mail spam filtering. Journal of Intelligent Information Systems 42, 1 (2014), 19–45.Google Scholar
Digital Library
- [49] . 1996. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the 13th International Conference on Machine Learning. 105–112.Google Scholar
Digital Library
- [50] . 2004. Learning to Filter Unsolicited Commercial e-mail. “ DEMOKRITOS”, National Center for Scientific Research.Google Scholar
- [51] . 2019. A Bi-LSTM mention hypergraph model with encoding schema for mention extraction. Engineering Applications of Artificial Intelligence 85, October (2019), 175–181.Google Scholar
Cross Ref
- [52] . 2021. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowledge-based Systems 212, January (2021), 106548.Google Scholar
Cross Ref
- [53] . 2021. Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recognition Letters 145, May (2021), 157–164.Google Scholar
Digital Library
- [54] . 2020. A discrete hidden Markov model for SMS spam detection. Applied Sciences 10, 14 (2020), 5011.Google Scholar
Cross Ref
- [55] Isra’a AbdulNabi and Qussai Yaseen. 2021. Spam e-mail detection using deep learning techniques. Procedia Computer Science 184, 2021 (2021), 853–858.Google Scholar
Cross Ref
- [56] . 2021. Spam e-mails detection based on distributed word embedding with deep learning. In Proceedings of the Machine Intelligence and Big Data Analytics for Cybersecurity Applications. Springer, 161–189.Google Scholar
Cross Ref
- [57] . 2018. A neural network-based ensemble approach for spam detection in Twitter. IEEE Transactions on Computational Social Systems 5, 4 (2018), 973–984.Google Scholar
Cross Ref
- [58] . 2019. Review spam detection using word embeddings and deep neural networks. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, 340–350.Google Scholar
Cross Ref
- [59] . 2021. An embedding-based topic model for document classification. ACM Transactions on Asian and Low-resource Language Information Processing 20, 3, (2021), 13 pages.
DOI: Google ScholarDigital Library
- [60] . 2013. TopicSpam: A topic-model based approach for spam detection. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 217–221. Retrieved from https://aclanthology.org/P13-2039.Google Scholar
- [61] . 2020. A method based on NLP for Twitter spam detection. Preprints 2020, 2020070648. Google Scholar
Cross Ref
- [62] . 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805.Google Scholar
- [63] . 2021. Using BERT encoding to tackle the mad-lib attack in SMS spam detection. arXiv:2107.06400. Retrieved from https://arxiv.org/abs/2107.06400.Google Scholar
- [64] . 2021. A two-stage text feature selection algorithm for improving text classification. ACM Transactions on Asian and Low-resource Language Information Processing 20, 3, (2021), 19 pages.
DOI: Google ScholarDigital Library
- [65] . 2020. An efficient spam detection technique for IoT devices using machine learning. IEEE Transactions on Industrial Informatics 17, 2 (2020), 903–912.Google Scholar
Cross Ref
- [66] . 2020. The Mechanism to Detect Spam emails in Marathi Language Using NLP. Ph.D. Dissertation. Dublin, National College of Ireland.Google Scholar
Index Terms
A Weak-Region Enhanced Bayesian Classification for Spam Content-Based Filtering
Recommendations
Application of improved distributed naive Bayesian algorithms in text classification
AbstractThe naive Bayes classifier is a widely used text classification method that applies statistical theory to text classification. Due to the particularity of the text, related feature items may generate new semantic information, which may be lost ...
Bayesian Naïve Bayes classifiers to text classification
Text classification is the task of assigning predefined categories to natural language documents, and it can provide conceptual views of document collections. The Na ve Bayes NB classifier is a family of simple probabilistic classifiers based on a ...
Bayesian online classifiers for text classification and filtering
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalThis paper explores the use of Bayesian online classifiers to classify text documents. Empirical results indicate that these classifiers are comparable with the best text classification systems. Furthermore, the online approach offers the advantage of ...






Comments