Abstract
Short Message Service (SMS) is one of the widely used mobile applications for global communication for personal and business purposes. Its widespread use for customer interaction, business updates, and reminders has made it a billion-dollar industry in “Text Marketing.” Along with valid SMS, a tsunami of spam messages also pop up that serve various purposes for the sender and the majority of them are fraudulent. Filtering spam SMS in an accurate manner is a crucial and challenging task that will benefit human lives both mentally and economically. Some of the challenges in the filtering of spam SMS include less number of characters, texts in informal languages, lack of public SMS spam corpus, and so on. Focusing solely on the textual features of the SMS is a major handicap of the existing methods, as it lacks in dynamically adapting to the increasing number of new keywords and jargon. In this article, we develop an intention-based approach of SMS spam filtering that efficiently handles dynamic keywords by focusing on the semantics of the words. We capture both semantic and textual features of the short-text messages based on 13 pre-defined intention labels. Moreover, the contextual embeddings of the texts are generated using various pre-trained NLP (Natural Language Processing) models. Finally, intention scores are computed for the pre-defined labels and a bunch of supervised learning classifiers are employed for filtering as spam or ham. Our approaches are evaluated on the SMS Spam Collection [24] benchmark dataset, and extensive experimentation shows interesting results. Our model did remarkably well with an accuracy of 98.07%, Precision and Recall of ∼ 0.97, which is better than few of the existing state-of-the-art alternatives. Though the accuracy of our approach is not the best among other existing approaches, the model is highly stable due to its emphasis on extracting the contextual features from the text through intention labels.
- [1] . 2019. A review of soft techniques for SMS spam classification: Methods, approaches and applications. Eng. Applic. Artif. Intell. 86 (2019), 197–212.
DOI: Google ScholarDigital Library
- [2] . 2017. A review on mobile SMS spam filtering techniques. IEEE Access 5 (2017), 15650–15666.
DOI: Google ScholarCross Ref
- [3] . 2012. Mining Text Data. Springer.
DOI: Google ScholarCross Ref
- [4] Tiago A. Almeida, José María G. Hidalgo, and Akebo Yamakami. 2011. Contributions to the study of SMS spam filtering: new collection and results. In Proceedings of the 11th ACM Symposium on Document Engineering (DocEng’11). Association for Computing Machinery, New York, NY, USA, 259–262. Google Scholar
Digital Library
- [5] . 2009. A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29 (2009), 63–92.Google Scholar
Digital Library
- [6] Gordon V. Cormack, José María Gómez Hidalgo, and Enrique Puertas Sánz. 2007. Spam filtering for short messages. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (CIKM’07). Association for Computing Machinery, New York, NY, USA, 313–320. Google Scholar
Digital Library
- [7] . 2007. Feature engineering for mobile (SMS) spam filtering. In SIGIR.Google Scholar
- [8] . 2019. Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon 5, 6 (2019), e01802.
DOI: Google ScholarCross Ref
- [9] . 2012. SMS spam filtering: Methods and data. Exp. Syst. Applic. 39, 10 (2012), 9899–9908.
DOI: Google ScholarDigital Library
- [10] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL.Google Scholar
- [11] . 2020. Text mining. Retrieved from https://www.ibm.com/cloud/learn/text-mining.Google Scholar
- [12] . 2020. BERT. Retrieved from https//huggingface.co/transformers/model_doc/bert.html.Google Scholar
- [13] . 2020. DistilBERT. Retrieved from https//huggingface.co/transformers/model_doc/distilbert.html.Google Scholar
- [14] . 2020. RoBERTa. Retrieved from https//huggingface.co/transformers/model_doc/roberta.html.Google Scholar
- [15] . 2020. Summary of the tokenizers. Retrieved from https//huggingface.co/transformers/tokenizer_summary.html.Google Scholar
- [16] José María Gómez Hidalgo, Guillermo Cajigas Bringas, Enrique Puertas Sánz, and Francisco Carrero García. 2006. Content based SMS spam filtering. In Proceedings of the 2006 ACM Symposium on Document Engineering (DocEng’06). Association for Computing Machinery, New York, NY, USA, 107–114. Google Scholar
Digital Library
- [17] . 2011. Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann Publishers, Elsevier.Google Scholar
Digital Library
- [18] . 2019. The power of mobile messaging. Retrieved from https//info.sopranodesign.com/the-power-of-mobile-communications.Google Scholar
- [19] . 2013. The curse of 140 characters: Evaluating the efficacy of SMS spam detection on Android. In SPSM’13.Google Scholar
- [20] . 2021. Malicious COVID-19 vaccine SMS that compromises Android phones spreading: Cyber agency. Retrieved from https//telecom.economictimes.indiatimes.com/news/malicious-covid-19-vaccine-sms-that-compromises-android-phones-spreading-cyber-agency/82522677.Google Scholar
- [21] . 2018. Convolutional neural network based SMS spam detection. In TELFOR. 1–4.
DOI: Google ScholarCross Ref
- [22] . 2018. LSTM based short message service (SMS) modeling for spam classification. In ICMLT.Google Scholar
- [23] . 2014. Mining of Massive Datasets. Cambridge University Press.
DOI: Google ScholarCross Ref
- [24] . 2012. SMS spam collection dataset. Retrieved from https//archive.ics.uci.edu/ml/datasets/sms+spam+collection.Google Scholar
- [25] . 2020. Deep learning to filter SMS spam. Fut. Gen. Comput. Syst. 102 (2020), 524–533.
DOI: Google ScholarDigital Library
- [26] . 2022. Spam detection using genetic algorithm optimized LSTM model. In Computer Networks and Inventive Communication Technologies, , , , and (Eds.). Springer Singapore, 59–72.Google Scholar
Cross Ref
- [27] . 2021. SMS spam detection through skip-gram embeddings and shallow networks. 4193–4201.
DOI: Google ScholarCross Ref
- [28] . 2012. A novel framework for SMS spam filtering. In INISTA. 1–4.Google Scholar
- [29] . 2017. Attention is all you need. ArXiv: abs/1706.03762 (2017).Google Scholar
- [30] . 2011. A behavior-based SMS antispam system. IBM J. Res. Devel. 54 (
1 2011), 1-16.DOI: Google ScholarDigital Library
- [31] . 2021. A weighted feature enhanced hidden Markov model for spam SMS filtering. Neurocomputing 444 (2021), 48–58.
DOI: Google ScholarCross Ref
- [32] . 2011. SMSAssassin: Crowdsourcing driven mobile-based system for SMS spam filtering. Association for Computing Machinery, New York, NY.
DOI: Google ScholarDigital Library
- [33] . 2016. A method of SMS spam filtering based on AdaBoost algorithm. In WCICA. 2328–2332.
DOI: Google ScholarCross Ref
Index Terms
SpotSpam: Intention Analysis–driven SMS Spam Detection Using BERT Embeddings
Recommendations
SMS Spam
ICETE 2014: Proceedings of the 11th International Joint Conference on e-Business and Telecommunications - Volume 4Spam has been infesting our emails and Web experience for decades; distributing phishing scams, adult/dating scams, rogue security software, ransomware, money laundering and banking scams... the list goes on. Fortunately, in the last few years, user ...
SMS mobile botnet detection using a multi-agent system: research in progress
ACySE '14: Proceedings of the 1st International Workshop on Agents and CyberSecurityWith the enormous growth of Android mobile devices and the huge increase in the number of published applications (apps), Short Message Service (SMS) is becoming an important issue. SMS can be abused by attackers when they send SMS spam, transfer all ...
Spam SMS filtering based on text features and supervised machine learning techniques
AbstractThe advancement in technology made a significant mark with time, which affects every field of life like medicine, music, office, traveling, and communication. Telephone lines are used as a communication medium in ancient times. Currently, wireless ...






Comments