Abstract
Computer users are generally faced with difficulties in making correct security decisions. While an increasingly fewer number of people are trying or willing to take formal security training, online sources including news, security blogs, and websites are continuously making security knowledge more accessible. Analysis of cybersecurity texts from this grey literature can provide insights into the trending topics and identify current security issues as well as how cyber attacks evolve over time. These in turn can support researchers and practitioners in predicting and preparing for these attacks. Comparing different sources may facilitate the learning process for normal users by creating the patterns of the security knowledge gained from different sources. Prior studies neither systematically analysed the wide range of digital sources nor provided any standardisation in analysing the trending topics from recent security texts. Moreover, existing topic modelling methods are not capable of identifying the cybersecurity concepts completely and the generated topics considerably overlap. To address this issue, we propose a semi-automated classification method to generate comprehensive security categories to analyse trending topics. We further compare the identified 16 security categories across different sources based on their popularity and impact. We have revealed several surprising findings as follows: (1) The impact reflected from cybersecurity texts strongly correlates with the monetary loss caused by cybercrimes, (2) security blogs have produced the context of cybersecurity most intensively, and (3) websites deliver security information without caring about timeliness much.
- [1] . 2018. Cyber threat intelligence–issue and challenges. Indones. J. Electr. Eng. Comput. Sci. 10, 1 (2018), 371–379.Google Scholar
Cross Ref
- [2] . 2018. Exploring user mental models of end-to-end encrypted communication tools. In Proceedings of the 8th USENIX Workshop on Free and Open Communications on the Internet (FOCI’18).Google Scholar
- [3] . 2005. Privacy and rationality in individual decision making. IEEE Secur. Priv. 3, 1 (2005), 26–33. Google Scholar
Digital Library
- [4] . 2013. Alice in warningland: A large-scale field study of browser security warning effectiveness. In Proceedings of the 22nd USENIX Security Symposium (USENIX Security’13). 257–272. Google Scholar
Digital Library
- [5] . cited May 2020. Ransomware Attack Still Looms in Australia as Government Warns WannaCry Threat Not Over. Retrieved from https://www.abc.net.au/news/2017-05-15/ransomware-attack-to-hit-victims-in-australia-government-says/8526346.Google Scholar
- [6] . 2015. Effects of cyber security knowledge on attack detection. Comput. Hum. Behav. 48 (2015), 51–61. Google Scholar
Digital Library
- [7] . 2001. Adjusting for multiple testing—when and how? J. Clin. Epidemiol. 54, 4 (2001), 343–349.Google Scholar
Cross Ref
- [8] . 2009. Topic models. In Text Mining. Chapman and Hall/CRC, 101–124.Google Scholar
- [9] . 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, (Jan.2003), 993–1022. Google Scholar
Digital Library
- [10] . 2019. What security features and crime prevention advice is communicated in consumer IoT device manuals and support pages? J. Cybersecur. 5, 1 (2019), tyz005.Google Scholar
Cross Ref
- [11] . 2002. Windowed cross-correlation and peak picking for the analysis of variability in the association between behavioral time series. Psychol. Methods 7, 3 (2002), 338.Google Scholar
Cross Ref
- [12] . 2014. Taxonomy model for cyber threat intelligence information exchange technologies. In Proceedings of the 2014 ACM Workshop on Information Sharing & Collaborative Security. 51–60. Google Scholar
Digital Library
- [13] . 2018. Detecting denial-of-service attacks from social media text: Applying nlp to computer security. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1626–1635.Google Scholar
Cross Ref
- [14] . 2019. Self-efficacy-based game design to encourage security behavior online. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–6. Google Scholar
Digital Library
- [15] . 2017. Analyzing research trends in personal information privacy using topic modeling. Comput. Secur. 67 (2017), 244–253. Google Scholar
Digital Library
- [16] . 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 1 (1960), 37–46.Google Scholar
Cross Ref
- [17] . 2016. Terminology extraction with term variant detection. In Proceedings of ACL’16 System Demonstrations. 13–18.Google Scholar
Cross Ref
- [18] . 2017. The mobile privacy-security knowledge gap model: Understanding behaviors. In Proceedings of theHawaii International Conference on System Sciences.Google Scholar
- [19] . Human error to blame for 9 in 10 UK cyber data breaches in 2019. Retrieved February 2020 from https://www.cybsafe.com/press-releases/human-error-to-blame-for-9-in-10-uk-cyber-data-breaches-in-2019/.Google Scholar
- [20] . 2014. The effect of social influence on security sensitivity. In Proceedings of the 10th Symposium On Usable Privacy and Security (SOUPS’14). 143–157. Google Scholar
Digital Library
- [21] . 2014. Increasing security sensitivity with social proof: A large-scale experimental confirmation. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 739–749. Google Scholar
Digital Library
- [22] . 2018. Breaking! a typology of security and privacy news and how it’s shared. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–12. Google Scholar
Digital Library
- [23] . 2016. Human behaviour as an aspect of cybersecurity assurance. Secur. Commun. Netw. 9, 17 (2016), 4667–4679. Google Scholar
Digital Library
- [24] . 2016. Why do they do what they do?: A study of what motivates users to (not) follow computer security advice. In Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS’16). 59–75. Google Scholar
Digital Library
- [25] . 2018. To follow or not to follow: A study of user motivations around cybersecurity advice. IEEE Internet Comput. 22, 5 (2018), 25–34.Google Scholar
Digital Library
- [26] . 2019. Do stories help people adopt two-factor authentication? Studies 1, 2 (2019), 3.Google Scholar
- [27] . 2016. Do or do not, there is no try: User engagement may not improve security outcomes. In Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS’16). 97–111. Google Scholar
Digital Library
- [28] . cited Feb 2020. The Impact of cybersecurity over the Last 5 Years. Retrieved from https://def.camp/impact-cybersecurity-five-years/.Google Scholar
- [29] . 2015. An attempt to memorize strong passwords while playing games. In Proceedings of the 18th International Conference on Network-Based Information Systems. IEEE, 264–268. Google Scholar
Digital Library
- [30] . 2018. “What was that site doing with my facebook password?” Designing password-reuse notifications. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1549–1566. Google Scholar
Digital Library
- [31] . 2020. Çorba: Crowdsourcing to obtain requirements from regulations and breaches. Empir. Softw. Eng. 25, 1 (2020), 532–561.Google Scholar
Digital Library
- [32] . 2017. Human factors in cybersecurity; examining the link between Internet addiction, impulsivity, attitudes towards cybersecurity, and risky cybersecurity behaviours. Heliyon 3, 7 (2017), e00346.Google Scholar
Cross Ref
- [33] . 2021. Automatically matching bug reports with related app reviews. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 970–981. Google Scholar
Digital Library
- [34] . 2018. End-to-end measurements of email spoofing attacks. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). 1095–1112. Google Scholar
Digital Library
- [35] . 2015. Modeling the evolution of development topics using dynamic topic models. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). IEEE, 3–12.Google Scholar
Cross Ref
- [36] . 2019. pyMannKendall: A python package for non parametric mann kendall family of trend tests. J. Open Source Softw. 4, 39 (
25 7 2019), 1556. https://doi.org/10.21105/joss.01556Google ScholarCross Ref
- [37] . 2015. “... no one can hack my mind”: Comparing expert and non-expert security practices. In Proceedings of the 11th Symposium On Usable Privacy and Security (SOUPS’15). 327–346. Google Scholar
Digital Library
- [38] . 2014. A survey of emerging threats in cybersecurity. J. Comput. Syst. Sci. 80, 5 (2014), 973–993.Google Scholar
Cross Ref
- [39] . IC3: Total Damage Caused by Reported Cyber Crime 2001–2020. Retrieved April 2020 from https://www.statista.com/statistics/267132/total-damage-caused-by-by-cyber-crime-in-the-us/.Google Scholar
- [40] . 2017. How good is a security policy against real breaches? A HIPAA case study. In Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 530–540. Google Scholar
Digital Library
- [41] . Los Angeles Times. Cyber-attacks a Bigger Threat Than Al Qaeda, Officials Say. Retrieved April 2020 from https://www.latimes.com/world/la-xpm-2013-mar-12-la-fg-worldwide-threats-20130313-story.html.Google Scholar
- [42] . 2017. “I have no idea what i’m doing”—On the usability of deploying HTTPS. In Proceedings of the USENIX Security Symposium (USENIX Security’17). 1339–1356. Google Scholar
Digital Library
- [43] . 2020. A sentiment-statistical approach for identifying problematic mobile app updates based on user reviews. Information 11, 3 (2020), 152.Google Scholar
Cross Ref
- [44] . 2016. Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’16). ACM, 755–766. Google Scholar
Digital Library
- [45] . 2015. The security of latent dirichlet allocation. In Artificial Intelligence and Statistics. 681–689.Google Scholar
- [46] . The Washington Post. Obama Administration Announces Measures to Punish Russia for 2016 Election Interference. Retrieved May 2020 from https://www.washingtonpost.com/world/national-security/obama-administration-announces-measures-to-punish-russia-for-2016-election-interference/2016/12/29/311db9d6-cdde-11e6-a87f-b917067331bb_story.html.Google Scholar
- [47] . 2017. Learning a privacy incidents database. In Proceedings of the Hot Topics in Science of Security: Symposium and Bootcamp. 35–44. Google Scholar
Digital Library
- [48] . 2005. Theoretical bounds of majority voting performance for a binary classification problem. IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1988–1995. Google Scholar
Digital Library
- [49] . 2019. “If it’s important it will be a headline”: Cybersecurity information seeking in older adults. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–11. Google Scholar
Digital Library
- [50] . 2018. Informal support networks: An investigation into home data security practices. In Proceedings of the 14th Symposium on Usable Privacy and Security (SOUPS’18). 63–82. Google Scholar
Digital Library
- [51] . 2015. Efficient extraction of domain specific sentiment lexicon with active learning. Pattern Recogn. Lett. 56 (2015), 38–44. Google Scholar
Digital Library
- [52] . 2005. Terminology extraction: An analysis of linguistic and statistical approaches. In Knowledge Mining. Springer, 255–279.Google Scholar
- [53] . 2019. Put your warning where your link is: Improving and evaluating email phishing warnings. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–15. Google Scholar
Digital Library
- [54] . 2015. Identifying patterns in informal sources of security information. J. Cybersecur. 1, 1 (2015), 121–144.Google Scholar
- [55] . 2012. Stories as informal lessons about security. In Proceedings of the 8th Symposium on Usable Privacy and Security. 1–17. Google Scholar
Digital Library
- [56] . 2016. How i learned to be secure: A census-representative survey of security advice sources and behavior. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 666–677. Google Scholar
Digital Library
- [57] . 2017. Where is the digital divide? A survey of security, privacy, and socioeconomics. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 931–936. Google Scholar
Digital Library
- [58] . 2016. I think they’re trying to tell me something: Advice sources and selection for digital security. In Proceedings of the IEEE Symposium on Security and Privacy (SP). IEEE, 272–288.Google Scholar
Cross Ref
- [59] . 2018. Dancing pigs or externalities? Measuring the rationality of security decisions. In Proceedings of the ACM Conference on Economics and Computation. 215–232. Google Scholar
Digital Library
- [60] . 2018. First steps toward measuring the readability of security advice. In Proceedings of the IEEE Security & Privacy Workshop on Technology and Consumer Protection (ConPro’18).Google Scholar
- [61] . 2017. 152 simple steps to stay safe online: Security advice for non-tech-savvy users. IEEE Secur. Priv. 15, 5 (2017), 55–64.Google Scholar
Digital Library
- [62] . 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. ELRA, 45–50. http://is.muni.cz/publication/884893/en.Google Scholar
- [63] . 2007. Beautiful soup documentation. Dosegljivo: https://www.crummy.com/software/BeautifulSoup/bs4/doc/. [Dostopano: 7. 7. 2018] (2007).Google Scholar
- [64] . 2015. Exploring the space of topic coherence measures. In Proceedings of the WSDM’15. ACM, 399–408. Google Scholar
Digital Library
- [65] . 2009. Grey literature. Handb. Res. Synth. Meta-anal. 2 (2009), 103–125.Google Scholar
- [66] . 2016. An information security knowledge sharing model in organizations. Comput. Hum. Behav. 57 (2016), 442–451. Google Scholar
Digital Library
- [67] . 2019. An analysis and classification of public information security data sources used in research and practice. Comput. Secur. 82 (2019), 140–155.Google Scholar
Cross Ref
- [68] . 2015. Learning assigned secrets for unlocking mobile devices. In Proceedings of the 11th Symposium On Usable Privacy and Security (SOUPS’15). 277–295. Google Scholar
Digital Library
- [69] . If Your Employees Aren’t Learning from Your Security Training, Are You Really Teaching? Retrieved February 2020 from https://www.infosecinstitute.com/blog/if-your-employees-arent-learning-from-your-security-training-are-you-really-teaching/.Google Scholar
- [70] . 2017. No (privacy) news is good news: An analysis of New York times and guardian privacy news from 2010–2016. In Proceedings of the 15th Annual Conference on Privacy, Security and Trust (PST’17). IEEE, 159–15909.Google Scholar
Cross Ref
- [71] . 2017. Multiple sources for security: Seeking online safety information and their influence on coping self-efficacy and protection behavior habits. In Proceedings of the 50th Hawaii International Conference on System Sciences.Google Scholar
Cross Ref
- [72] . 2016. A problem shared is a problem halved: A survey on the dimensions of collective cyber defense through security information sharing. Comput. Secur. 60 (2016), 154–176. Google Scholar
Digital Library
- [73] . 2009. Card Sorting: Designing Usable Categories. Rosenfeld Media.Google Scholar
- [74] . 2018. The battle for new york: A case study of applied digital threat modeling at the enterprise level. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). 621–637. Google Scholar
Digital Library
- [75] . 2009. Crying wolf: An empirical study of ssl warning effectiveness. In Proceedings of the USENIX Security Symposium. 399–416. Google Scholar
Digital Library
- [76] . 2018. A survey on technical threat intelligence in the age of sophisticated cyber attacks. Comput. Secur. 72 (2018), 212–233. Google Scholar
Digital Library
- [77] . 2019. What is discussed about blockchain? A case study on the use of balanced LDA and the reference architecture of a domain to capture online discussions about blockchain platforms across the stack exchange communities. IEEE Trans. Softw. Eng. (2019).Google Scholar
Cross Ref
- [78] . 2010. Folk models of home computer security. In Proceedings of the 6th Symposium on Usable Privacy and Security. 1–16. Google Scholar
Digital Library
- [79] . 2019. What. hack: engaging anti-phishing training through a role-playing phishing simulation game. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12. Google Scholar
Digital Library
- [80] . 2020. What risk? I don’t understand: An empirical study on users’ understanding of the terms used in security texts. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security (ASIA CCS’20). Association for Computing Machinery, New York, NY, 248–262. Google Scholar
Digital Library
- [81] . 2016. Predicting crashing releases of mobile applications. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10. Google Scholar
Digital Library
- [82] . 2018. Modeling and predicting cyber hacking breaches. IEEE Trans. Inf. Forens. Secur. 13, 11 (2018), 2856–2871.Google Scholar
Cross Ref
- [83] . 2016. What security questions do developers ask? A large-scale study of Stack Overflow posts. J. Comput. Sci. Technol. 31, 5 (2016), 910–924.Google Scholar
Cross Ref
- [84] . 2019. YouMight’be affected: An empirical analysis of readability and usability issues in data breach notifications. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–14. Google Scholar
Digital Library
- [85] . 2018. “I’ve got nothing to lose”: Consumers’ risk perceptions and protective actions after the equifax data breach. In Proceedings of the 14th Symposium on Usable Privacy and Security (SOUPS’18). 197–216. Google Scholar
Digital Library
- [86] . 2019. Beyond mandatory: Making data breach notifications useful for consumers. IEEE Secur. Priv. 17, 2 (2019), 67–72.Google Scholar
Cross Ref
Index Terms
Analysis of Trending Topics and Text-based Channels of Information Delivery in Cybersecurity
Recommendations
Time-based Gap Analysis of Cybersecurity Trends in Academic and Digital Media
Special Issue on Analytics for Cybersecurity and Privacy, Part 1This study analyzes cybersecurity trends and proposes a conceptual framework to identify cybersecurity topics of social interest and emerging topics that need to be addressed by researchers in the field. The insights drawn from this framework allow for ...
Detection and Analysis of Trend Topics for Global Scientific Literature Using Feature Selection Based on Gini-Index
ICTAI '11: Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial IntelligenceAs the volume and diversity of scientific resources grows, trend detection and analysis have become much more important issues. A variety of trend detection, characterization, evaluation and visualization methodologies have been introduced for various ...
Twitter Trending Topic Classification
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining WorkshopsWith the increasing popularity of microblogging sites, we are in the era of information explosion. As of June 2011, about 200 million tweets are being generated everyday. Although Twitter provides a list of most popular topics people tweet about known ...






Comments