skip to main content
research-article

A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web

Published:22 February 2021Publication History
Skip Abstract Section

Abstract

Traffic classification is essential in network management for operations ranging from capacity planning, performance monitoring, volumetry, and resource provisioning, to anomaly detection and security. Recently, it has become increasingly challenging with the widespread adoption of encryption in the Internet, e.g., as a de-facto in HTTP/2 and QUIC protocols. In the current state of encrypted traffic classification using Deep Learning (DL), we identify fundamental issues in the way it is typically approached. For instance, although complex DL models with millions of parameters are being used, these models implement a relatively simple logic based on certain header fields of the TLS handshake, limiting model robustness to future versions of encrypted protocols. Furthermore, encrypted traffic is often treated as any other raw input for DL, while crucial domain-specific considerations exist that are commonly ignored. In this paper, we design a novel feature engineering approach that generalizes well for encrypted web protocols, and develop a neural network architecture based on Stacked Long Short-Term Memory (LSTM) layers and Convolutional Neural Networks (CNN) that works very well with our feature design. We evaluate our approach on a real-world traffic dataset from a major ISP and Mobile Network Operator. We achieve an accuracy of 95% in service classification with less raw traffic and smaller number of parameters, out-performing a state-of-the-art method by nearly 50% fewer false classifications. We show that our DL model generalizes for different classification objectives and encrypted web protocols. We also evaluate our approach on a public QUIC dataset with finer and application-level granularity in labeling, achieving an overall accuracy of 99%.

References

  1. Université Toulouse 1. 2020. Blacklists UT1. http://dsi.ut-capitole.fr/blacklists/index_en.php . [Online; Accessed 01-October-2020].Google ScholarGoogle Scholar
  2. Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2018. Mobile encrypted traffic classification using deep learning. In IEEE Network Traffic Measurement and Analysis Conference (TMA). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  3. Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2019. Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE Transactions on Network and Service Management , Vol. 16, 2 (2019), 445--458.Google ScholarGoogle ScholarCross RefCross Ref
  4. Riyad Alshammari and A Nur Zincir-Heywood. 2009. Machine learning based encrypted traffic classification: Identifying ssh and skype. In IEEE symposium on computational intelligence for security and defense applications. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  5. Blake Anderson and David McGrew. 2017. Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1723--1732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Blake Anderson and David McGrew. 2020. Accurate TLS Fingerprinting using Destination Context and Knowledge Bases. arXiv preprint arXiv:2009.01939 (2020).Google ScholarGoogle Scholar
  7. Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Springer Journal of Computer Virology and Hacking Techniques , Vol. 14, 3 (2018), 195--211.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mike Belshe and Roberto Peon. 2012. SPDY Protocol. Technical Report. Network Working Group. 1--51 pages. https://tools.ietf.org/pdf/draft-mbelshe-httpbis-spdy-00.pdfGoogle ScholarGoogle Scholar
  9. Mike Belshe, Roberto Peon, and Martin Thomson. 2015. Hypertext Transfer Protocol Version 2 (HTTP/2). IETF RFC 7540. 1--96 pages.Google ScholarGoogle Scholar
  10. Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning , Vol. 79, 1--2 (2010), 151--175.Google ScholarGoogle Scholar
  11. Dario Bonfiglio, Marco Mellia, Michela Meo, Dario Rossi, and Paolo Tofanelli. 2007. Revealing skype traffic: when randomness plays with you. In ACM SIGCOMM Computer Communication Review, Vol. 37. 37--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Raouf Boutaba, Mohammad A Salahuddin, Noura Limam, Sara Ayoubi, Nashid Shahriar, Felipe Estrada-Solano, and Oscar M Caicedo. 2018. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. Springer Journal of Internet Services and Applications , Vol. 9, 1 (2018), 16.Google ScholarGoogle ScholarCross RefCross Ref
  13. Pierre-Olivier Brissaud, Jérôme Franccc is, Isabelle Chrisment, Thibault Cholez, and Olivier Bettan. 2019. Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic. IEEE Transactions on Network and Service Management , Vol. 16, 3 (2019), 842--856.Google ScholarGoogle ScholarCross RefCross Ref
  14. Francesco Bronzino, Paul Schmitt, Sara Ayoubi, Guilherme Martins, Renata Teixeira, and Nick Feamster. 2019. Inferring streaming video quality from encrypted traffic: Practical models and deployment experience. ACM on Measurement and Analysis of Computing Systems (SIGMETRICS) , Vol. 3, 3 (2019), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhiyong Bu, Bin Zhou, Pengyu Cheng, Kecheng Zhang, and Zhen-Hua Ling. 2020. Encrypted Network Traffic Classification Using Deep and Parallel Network-in-Network Models. IEEE Access , Vol. 8 (2020), 132950--132959.Google ScholarGoogle ScholarCross RefCross Ref
  16. Zhitang Chen, Ke He, Jian Li, and Yanhui Geng. 2017. Seq2img: A sequence-to-image based approach towards ip traffic classification using convolutional neural networks. In IEEE International Conference on Big Data (Big Data). 1271--1276.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ramin Hasibi, Matin Shokri, and Mehdi Dehghan. 2019. Augmentation scheme for dealing with imbalanced network traffic classification using deep learning. arXiv preprint arXiv:1901.00204 (2019).Google ScholarGoogle Scholar
  18. Jonas Höchst, Lars Baumg"artner, Matthias Hollick, and Bernd Freisleben. 2017. Unsupervised traffic flow classification using a neural autoencoder. In IEEE Conference on Local Computer Networks (LCN). 523--526.Google ScholarGoogle ScholarCross RefCross Ref
  19. Janardhan Iyengar and Ian Swett. 2015. QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2. Technical Report. Network Working Group. 1--30 pages.Google ScholarGoogle Scholar
  20. Jana Iyengar and Martin Thomson. 2018. QUIC: A UDP-based multiplexed and secure transport. Internet Engineering Task Force, Internet-Draft (2018).Google ScholarGoogle Scholar
  21. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  22. Arash Habibi Lashkari, Gerard Draper-Gil, Mohammad Saiful Islam Mamun, and Ali A Ghorbani. 2017. Characterization of Tor Traffic using Time based Features. In International Conference on Information Systems Security and Privacy (ICISSP) . 253--262.Google ScholarGoogle ScholarCross RefCross Ref
  23. Chang Liu, Longtao He, Gang Xiong, Zigang Cao, and Zhen Li. 2019. Fs-net: A flow sequence network for encrypted traffic classification. In IEEE Conference on Computer Communications (INFOCOM). 1171--1179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xun Liu, Junling You, Yulei Wu, Tong Li, Liangxiong Li, Zheyuan Zhang, and Jingguo Ge. 2020. Attention-based bidirectional gru networks for efficient https traffic classification. Elsevier Information Sciences , Vol. 541 (2020), 297--315.Google ScholarGoogle ScholarCross RefCross Ref
  25. Manuel Lopez-Martin, Belen Carro, Antonio Sanchez-Esguevillas, and Jaime Lloret. 2017. Network traffic classifier with convolutional and recurrent neural networks for Internet of Things. IEEE Access , Vol. 5 (2017), 18042--18050.Google ScholarGoogle ScholarCross RefCross Ref
  26. Mohammad Lotfollahi, Mahdi Jafari Siavoshani, Ramin Shirali Hossein Zade, and Mohammdsadegh Saberian. 2020. Deep packet: A novel approach for encrypted traffic classification using deep learning. Springer Soft Computing , Vol. 24, 3 (2020), 1999--2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jonathan Muehlstein, Yehonatan Zion, Maor Bahumi, Itay Kirshenboim, Ran Dubin, Amit Dvir, and Ofir Pele. 2017. Analyzing HTTPS encrypted traffic to identify user's operating system, browser and application. In 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2019. Large-scale mobile app identification using deep learning. IEEE Access , Vol. 8 (2019), 348--362.Google ScholarGoogle ScholarCross RefCross Ref
  29. Shahbaz Rezaei and Xin Liu. 2018. How to achieve high classification accuracy with just a few labels: semi-supervised approach using sampled packets. arXiv preprint arXiv:1812.09761 (2018).Google ScholarGoogle Scholar
  30. Vera Rimmer, Davy Preuveneers, Marc Juarez, Tom Van Goethem, and Wouter Joosen. 2017. Automated website fingerprinting through deep learning. arXiv preprint arXiv:1708.06376 (2017).Google ScholarGoogle Scholar
  31. Roei Schuster, Vitaly Shmatikov, and Eran Tromer. 2017. Beauty and the burst: Remote identification of encrypted video streams. In USENIX Security Symposium (USENIX Security 17). 1357--1374.Google ScholarGoogle Scholar
  32. Yan Shi, Dezhi Feng, and Subir Biswas. 2019. A Natural Language-Inspired Multi-label Video Streaming Traffic Classification Method Based on Deep Neural Networks. arXiv preprint arXiv:1906.02679 (2019).Google ScholarGoogle Scholar
  33. Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. computers & security , Vol. 31, 3 (2012), 357--374.Google ScholarGoogle Scholar
  34. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  35. Petr Velan, Milan vC ermák, Pavel vC eleda, and Martin Dravs ar. 2015. A survey of methods for encrypted traffic classification and analysis. International Journal of Network Management , Vol. 25, 5 (2015), 355--374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ly Vu, Cong Thanh Bui, and Quang Uy Nguyen. 2017. A deep learning based method for handling imbalanced problem in network traffic classification. In International Symposium on Information and Communication Technology. 333--339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Pan Wang, Shuhang Li, Feng Ye, Zixuan Wang, and Moxuan Zhang. 2020. PacketCGAN: Exploratory study of class imbalance for encrypted traffic classification using CGAN. In IEEE International Conference on Communications (ICC). 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  38. Wei Wang, Yiqiang Sheng, Jinlin Wang, Xuewen Zeng, Xiaozhou Ye, Yongzhong Huang, and Ming Zhu. 2018. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access , Vol. 6 (2018), 1792--1806.Google ScholarGoogle ScholarCross RefCross Ref
  39. Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen Yang. 2017. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In IEEE International Conference on Intelligence and Security Informatics (ISI). 43--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nigel Williams, Sebastian Zander, and Grenville Armitage. 2006. A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. ACM SIGCOMM Computer Communication Review , Vol. 36, 5 (2006), 5--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Haipeng Yao, Pengcheng Gao, Jingjing Wang, Peiying Zhang, Chunxiao Jiang, and Zhu Han. 2019 a. Capsule network assisted IoT traffic classification mechanism for smart cities. IEEE Internet of Things Journal , Vol. 6, 5 (2019), 7515--7525.Google ScholarGoogle ScholarCross RefCross Ref
  42. Haipeng Yao, Chong Liu, Peiying Zhang, Sheng Wu, Chunxiao Jiang, and Shui Yu. 2019 b. Identification of Encrypted Traffic Through Attention Mechanism Based Long Short Term Memory. IEEE Transactions on Big Data (2019).Google ScholarGoogle Scholar
  43. Zhuang Zou, Jingguo Ge, Hongbo Zheng, Yulei Wu, Chunjing Han, and Zhongjiang Yao. 2018. Encrypted traffic classification with a convolutional long short-term memory neural network. In IEEE International Conference on High Performance Computing and Communications; IEEE International Conference on Smart City; IEEE International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 329--334.Google ScholarGoogle Scholar

Index Terms

  1. A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!