Abstract
Data identification and classification is a key task for any Internet Service Provider (ISP) or network administrator. As port fluctuation and encryption become more common in P2P applications wishing to avoid identification, new strategies must be developed to detect and classify their flows. This article introduces a method of separating P2P and standard web traffic that can be applied as part of an offline data analysis process, based on the activity of the hosts on the network. Heuristics are analyzed and a classification system proposed that focuses on classifying those “long” flows that transfer most of the bytes across a network. The accuracy of the system is then tested using real network traffic from a core Internet router showing misclassification rates as low as 0.54% of flows in some cases. We expand on this proposed strategy to investigate its relevance to real-time, early classification problems. New proposals are made and the results of real-time experiments are compared to those obtained in the offline analysis. It is shown that classification accuracies in the real-time strategy are similar to those achieved in offline analysis with a large portion of the total web and P2P flows correctly identified.
- Basher, N., Mahanti, A., Mahanti, A., Williamson, C., and Arlitt, M. 2008. A comparative analysis of web and peer-to-peer traffic. In Proceeding of the 17th International Conference on World Wide Web. 287--296. Google Scholar
Digital Library
- Bernaille, L., Teixeira, R., and Salamatian, K. 2006a. Early application identification. In Proceedings of the 2006 ACM CoNEXT. International Conference on Emerging Networking Experiments and Technologies. Google Scholar
Digital Library
- Bernaille, L., Teixeira, R., Akodjenou, I., Soule, A., and Salamatian, K. 2006b. Traffic classification on the fly. ACM SIGCOMM Computer Communi. Review, 36, 2, 23--26. Google Scholar
Digital Library
- Bernaille, L. and Teixeira, R. 2007. Early recognition of encrypted applications. Lecture Notes in Computer Science, vol. 4427, 165--175. Google Scholar
Digital Library
- BitTorrent. A technical description of the BitTorrent protocol. http://www.cse.chalmers.se/~tsigas/Courses/DCDSeminar/Files/BitTorrent.pdf.Google Scholar
- Collins, M. and Reiter, M. 2006. Finding peer-to-peer file-sharing using coarse network behaviors. Lecture Notes in Computer Science, Comput. Secur., ESORICS, vol. 4189, 1--17. Google Scholar
Digital Library
- Constantinou, F. and Mavrommatis, P. 2006. Identifying known and unknown peer-to-peer traffic. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications. 93--102. Google Scholar
Digital Library
- Erman, J., Mahanti, A., and Arlitt, M. 2007a. Byte me: A case for byte accuracy in traffic classification. In Proceedings of MineNet’07. 25--37. Google Scholar
Digital Library
- Erman, J., Mahanti, A., Arlitt, M., Cohen, I., and Williamson, C. 2007b. Offline/realtime traffic classification using semi-supervised learning. Perfo. Eval., 64, 9-12, 1194--1213. Google Scholar
Digital Library
- Gnutella -- The Gnutella protocol specification v0.4. http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf.Google Scholar
- Guha, S., Daswani, N., and Jain, R. 2006. An experimental study of the Skype peer-to-peer VoIP system. In Proceedings of the 5th International Workshop on Peer-to-Peer Systems (IPTPS).Google Scholar
- Hu, Y., Chiu, D., and Lui, J. 2008. Application identification based on network behavioural profiles. In Proceedings of the 16th International Workshop on Quality of Service (IWQoS). 219--228.Google Scholar
- Huang, G., Jai, G., and Chao, H. 2008. Early identifying application traffic with application characteristics. In Proceedings of the IEEE Conference on Communications (ICC’08). 5788--5792.Google Scholar
- Iana. Port numbers reference, http://www.iana.org/assignments/port-numbers.Google Scholar
- John, W. and Tafvelin, S. 2008. Heuristics to classify internet backbone traffic based on connection patterns. In Proceedings of the International Conference on Information Networking (ICOIN). 1--5.Google Scholar
- Junior, G P S., Maia, J. E. B., Holanda, R., and De Sousa, J. N. 2007. P2P traffic identification using cluster analysis. In Proceedings of the Global Information Infrastructure Symposium (GIIS’07). 128--133.Google Scholar
Cross Ref
- Karagiannis, T., Broido, A., Faloutsos, M., and Claffy, Kc. 2004. Transport layer identification of P2P traffic. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement (IMC’04). 121--13. Google Scholar
Digital Library
- Karagiannis, T., Papagiannaki, K., and Faloutsos, M. 2005. BLINC: Multilevel Traffic Classification in the dark. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. 22--26. Google Scholar
Digital Library
- Kazaa, http://www.kazaa.com.Google Scholar
- Kulbak, Y., and Bickson, D. eMule -- The eMule protocol specification. http://www.cs.huji.ac.il/labs/danss/p2p/resources/emule.pdf.Google Scholar
- L7-Filter. Application layer packet classifier for Linux. http://l7-filter.sourceforge.net/.Google Scholar
- Ocampe, R., Galis, A., Todd, C., and De Meer, H. 2006. Towards context-based flow classification. In Proceedings of the International Conference on Autonomic and Autonomous Systems. 44--44. Google Scholar
Digital Library
- Ohzahata, S., Hagiwara, Y., Terada, M., and Kawashima, K. 2005. A traffic identification method and evaluations for a pure P2P application. In Proceedings of the Conference on Passive and Active Network Management (PAM’05). Lecture Notes in Computer Science, vol. 3431, Springer, Berlin, 55--68. Google Scholar
Digital Library
- Perényi, M., Dang, T., Gefferth, A., and Molnár, S. 2006. Identification and analysis of peer-to-peer traffic, J. Comm. 1, 7, 36--46.Google Scholar
Cross Ref
- Sandvine. 2009. Report on P2P. http://www.dslreports.com/shownews/Sandvine-P2P-Now-Just-20-Of-Internet-Use-105194.Google Scholar
- Sen, S., Spatscheck, O. and Wang, D. 2004. Accurate, scalable in-network identification of P2P traffic using application signatures. In Proceedings of the 13th International Conference on World Wide Web. Google Scholar
Digital Library
- Zhang, M., John, W., Claffy, K. C., and Brownlee, N. 2009. State of the art in traffic classification: A research review. In Proceedings of the Conference on Passive and Active Network Management (PAM’09).Google Scholar
- Zhou, L., Li, Z., and Liu, B. 2006. P2P traffic identification by TCP flow analysis. In Proceedings of the International Workshop on Networking, Architecture, and Storages (IWNAS’06). Google Scholar
Digital Library
Index Terms
Host-Based P2P Flow Identification and Use in Real-Time
Recommendations
Flow-Based P2P Network Traffic Classification Using Machine Learning
CYBERC '13: Proceedings of the 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge DiscoveryWith the introduction of new and new services in the market every day, the internet is growing rapidly. The network traffic generated by these network protocols and applications needs to be categorised which is an important task of network management. ...
Research of P2P offline download based on hybrid P2P network
ICACT'10: Proceedings of the 12th international conference on Advanced communication technologyThe Peer-to-peer (P2P) computing has been one of the hottest technologies today, which is popular in distributed file sharing. There are many famous download applications developed such as BitTorrent, Emule, etc. They have been widely studied in terms ...
P2P traffic classification using ensemble learning
I-CARE '13: Proceedings of the 5th IBM Collaborative Academia Research Exchange WorkshopEarly Peer-to-Peer overlay network traffic classification schemes were based on port-based and payload based inspection. In recent years researchers have focused on alternate machine learning approaches. This paper presents ensemble learning which ...






Comments