skip to main content
10.1145/3442381.3449797acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

#Twiti: Social Listening for Threat Intelligence

Published:03 June 2021Publication History

ABSTRACT

Twitter is a popular public source for threat hunting. Many security vendors and security professionals use Twitter in practice for collecting Indicators of Compromise (IOCs). However, little is known about IOCs on Twitter. Their important characteristics such as earliness, uniqueness, and accuracy have never been investigated. Moreover, how to extract IOCs from Twitter with high accuracy is not obvious. In this paper, we present Twiti, a system that automatically extracts various forms of malware IOCs from Twitter. Based on the collected IOCs, we conduct the first empirical assessment and thorough analysis of malware IOCs on Twitter. Twiti extracts IOCs from tweets identified as having malware IOC information by leveraging natural language processing and machine learning techniques. With extensive evaluation, we demonstrate that not only can Twiti extract malware IOCs accurately, but also the extracted IOCs are unique and early. By analyzing IOCs in Twiti from various aspects, we find that Twitter captures ongoing malware threats such as Emotet variants and malware distribution sites better than other public threat intelligence (TI) feeds. We also find that only a tiny fraction of IOCs on Twitter come from commercial vendor accounts and individual Twitter users are the main contributors of the early detected or exclusive IOCs, which indicates that Twitter can provide many valuable IOCs uncovered in commercial domain

References

  1. 2019 SONICWALL CYBERTHREAT REPORT. www.sonicwall.com/lp/2019-cyber-threat-report-lp.Google ScholarGoogle Scholar
  2. Abuse.ch Feodo Tracker. https://feodotracker.abuse.ch/.Google ScholarGoogle Scholar
  3. Actionable Threat Intelligence. https://www.checkpoint.com/downloads/partners/checkpoint-intsights-solution-brief.pdf.Google ScholarGoogle Scholar
  4. Alexa Top 1 Million. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip.Google ScholarGoogle Scholar
  5. AlienVault IP reputation. http://reputation.alienvault.com/reputation.data.Google ScholarGoogle Scholar
  6. Any.Run. https://app.any.run/.Google ScholarGoogle Scholar
  7. AV-TEST Security Report 2018/2019. https://www.av-test.org/fileadmin/pdf/security_report/AV-TEST_Security_Report_2018-2019.pdf.Google ScholarGoogle Scholar
  8. AWS, Google Cloud Popular Home for Botnet Controllers. https://www.darkreading.com/cloud/aws-google-cloud-popular-home-for-botnet-controllers/d/d-id/1330798.Google ScholarGoogle Scholar
  9. Cisco Umbrella 1M. http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip.Google ScholarGoogle Scholar
  10. Hackers use Microsoft Azure to host malware and run C2 servers. https://www.scmagazineuk.com/hackers-use-microsoft-azure-host-malware-run-c2-servers/article/1586279.Google ScholarGoogle Scholar
  11. Hunting Threats on Twitter. https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/hunting-threats-on-twitter.Google ScholarGoogle Scholar
  12. Hybrid Analysis. https://www.hybrid-analysis.com/.Google ScholarGoogle Scholar
  13. InQuest Labs IOC Database. https://labs.inquest.net/iocdb.Google ScholarGoogle Scholar
  14. Internet Security Threat Report 2019. https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf.Google ScholarGoogle Scholar
  15. ioc-fanger 3.1.0. https://pypi.org/project/ioc-fanger/.Google ScholarGoogle Scholar
  16. iocextract 1.13.1. https://pypi.org/project/iocextract/.Google ScholarGoogle Scholar
  17. Majestic Million. http://downloads.majestic.com/majestic_million.csv.Google ScholarGoogle Scholar
  18. MalwareBazaar. https://bazaar.abuse.ch/.Google ScholarGoogle Scholar
  19. The OpenIOC Framework. http://www.openioc.org.Google ScholarGoogle Scholar
  20. OTX AlienVault. https://otx.alienvault.com/.Google ScholarGoogle Scholar
  21. Sources of Threat Data. https://www.recordedfuture.com/threat-data-sources/.Google ScholarGoogle Scholar
  22. ThreatIngestor: Extract and aggregate IOCs. https://github.com/InQuest/ThreatIngestor.Google ScholarGoogle Scholar
  23. Twitter IOC Hunter. http://tweettioc.com/.Google ScholarGoogle Scholar
  24. Twitter Search API. https://developer.twitter.com/en/docs/tweets/search/overview.Google ScholarGoogle Scholar
  25. Twitter Timeline API. https://developer.twitter.com/en/docs/tweets/timelines/overview.Google ScholarGoogle Scholar
  26. URLhaus. https://urlhaus.abuse.ch/.Google ScholarGoogle Scholar
  27. urlscan.io. https://urlscan.io/.Google ScholarGoogle Scholar
  28. Using Twitter as a source of Indicators of Compromise. https://medium.com/@cybersiftIO/using-twitter-as-a-source-of-indicators-of-compromise-bc6877fba629.Google ScholarGoogle Scholar
  29. The Value of Threat Intelligence: Annual Study of North American & United Kingdom Companies. https://www.anomali.com/resources/whitepapers/2019-ponemon-report-the-value-of-threat-intelligence-from-anomali.Google ScholarGoogle Scholar
  30. VirusTotal Contributors. https://support.virustotal.com/hc/articles/115002146809-Contributors.Google ScholarGoogle Scholar
  31. [n.d.]. VirusTotal Reports. https://support.virustotal.com/hc/en-us/articles/115002719069-Reports.Google ScholarGoogle Scholar
  32. 2019. Garmin reportedly paid multimillion-dollar ransom after suffering cyberattack. https://www.theverge.com/2020/8/4/21353842/garmin-ransomware-attack-wearables-wastedlocker-evil-corp.Google ScholarGoogle Scholar
  33. 2019. Security researchers take down 100,000 malware sites over the last ten months. https://www.zdnet.com/article/security-researchers-take-down-100000-malware-sites-over-the-last-ten-months/.Google ScholarGoogle Scholar
  34. Mitsuaki Akiyama, Takeshi Yagi, Takeshi Yada, Tatsuya Mori, and Youki Kadobayashi. 2017. Analyzing the ecosystem of malicious URL redirection through longitudinal observation from honeypots. Computers & Security 69(2017), 155–173.Google ScholarGoogle ScholarCross RefCross Ref
  35. Eihal Alowaisheq. 2019. Cracking wall of confinement: Understanding and analyzing malicious domain takedowns. In The Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  36. X. Bouwman, H. Griffioen, J. Egbers, C. Doerr, B. Klievink, and M. van Eeten. 2020. A different cup of TI? The added value of commercial threat intelligence. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association. https://www.usenix.org/conference/usenixsecurity20/presentation/bouwmanGoogle ScholarGoogle Scholar
  37. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  38. Nuno Dionísio, Fernando Alves, Pedro M Ferreira, and Alysson Bessani. 2019. Cyberthreat detection from twitter using deep neural networks. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  39. Joobin Gharibshah, Tai Ching Li, Andre Castro, Konstantinos Pelechrinis, Evangelos E Papalexakis, and Michalis Faloutsos. 2017. Mining actionable information from security forums: the case of malicious IP addresses. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Springer, 193–211.Google ScholarGoogle Scholar
  40. Cheng Huang, Shuang Hao, Luca Invernizzi, Jiayong Liu, Yong Fang, Christopher Kruegel, and Giovanni Vigna. 2017. Gossip: Automatically identifying malicious domains from mailing list discussions. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 494–505.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Constantinos Kolias, Georgios Kambourakis, Angelos Stavrou, and Jeffrey Voas. 2017. DDoS in the IoT: Mirai and other botnets. Computer 50, 7 (2017), 80–84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vector Guo Li, Matthew Dunn, Paul Pearce, Damon McCoy, Geoffrey M Voelker, Stefan Savage, and Kirill Levchenko. 2019. Reading the Tea Leaves: A Comparative Analysis of Threat Intelligence. In 28th USENIX Security Symposium.Google ScholarGoogle Scholar
  43. Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. 2016. Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 755–766.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1 (Philadelphia, Pennsylvania) (ETMTNLP ’02). Association for Computational Linguistics, Stroudsburg, PA, USA, 63–70. https://doi.org/10.3115/1118108.1118117Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations(Baltimore, Maryland). Association for Computational Linguistics, 55–60. https://doi.org/10.3115/v1/P14-5010Google ScholarGoogle ScholarCross RefCross Ref
  46. Niels Provos, Dean McNamee, Panayiotis Mavrommatis, Ke Wang, and Nagendra Modadugu. 2007. The Ghost In The Browser: Analysis of Web-based Malware. In First Workshop on Hot Topics in Understanding Botnets (HotBots ’07).Google ScholarGoogle Scholar
  47. Sivaramakrishnan Ramanathan, Jelena Mirkovic, and Minlan Yu. 2020. BLAG: Improving the Accuracy of Blacklists. In Proceedings of the 27th Annual Network and Distributed Systems Security (NDSS) Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  48. Alan Ritter, Sam Clark, Oren Etzioni, 2011. Named entity recognition in tweets: an experimental study. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1524–1534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hyejin Shin, WooChul Shim, Jiin Moon, Jaewoo Seo, Sol Lee, and Yong H Hwang. 2020. Cybersecurity event detection with new and re-emerging words. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security (AsiaCCS). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sushant Sinha, Michael Bailey, and Farnam Jahanian. 2008. Shades of Grey: On the effectiveness of reputation-based “blacklists”. In 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE). IEEE, 57–64.Google ScholarGoogle ScholarCross RefCross Ref
  51. Bin Yu, Daniel L Gray, Jie Pan, Martine De Cock, and Anderson CA Nascimento. 2017. Inline DGA detection with deep networks. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 683–692.Google ScholarGoogle ScholarCross RefCross Ref
  52. Shengping Zhou, Zi Long, Lianzhi Tan, and Hao Guo. 2018. Automatic identification of indicators of compromise using neural-based sequence labelling. arXiv preprint arXiv:1810.10156(2018).Google ScholarGoogle Scholar

Index Terms

  1. #Twiti: Social Listening for Threat Intelligence
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '21: Proceedings of the Web Conference 2021
        April 2021
        4054 pages
        ISBN:9781450383127
        DOI:10.1145/3442381

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 June 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

        Upcoming Conference

        WWW '24
        The ACM Web Conference 2024
        May 13 - 17, 2024
        Singapore , Singapore

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format