skip to main content
research-article

Properties and Evolution of Internet Traffic Networks from Anonymized Flow Data

Published:01 March 2011Publication History
Skip Abstract Section

Abstract

Many projects have tried to analyze the structure and dynamics of application overlay networks on the Internet using packet analysis and network flow data. While such analysis is essential for a variety of network management and security tasks, it is infeasible on many networks: either the volume of data is so large as to make packet inspection intractable, or privacy concerns forbid packet capture and require the dissociation of network flows from users’ actual IP addresses. Our analytical framework permits useful analysis of network usage patterns even under circumstances where the only available source of data is anonymized flow records. Using this data, we are able to uncover distributions and scaling relations in host-to-host networks that bear implications for capacity planning and network application design. We also show how to classify network applications based entirely on topological properties of their overlay networks, yielding a taxonomy that allows us to accurately identify the functions of unknown applications. We repeat this analysis on a more recent dataset, allowing us to demonstrate that the aggregate behavior of users is remarkably stable even as the population changes.

References

  1. Adamic, L. A. and Huberman, B. A. 2001. The Web’s hidden order. Comm. ACM 44, 9, 55--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alderson, D., Li, L., Willinger, W., and Doyle, J. C. 2005. Understanding Internet topology: Principles, models, and validation. IEEE/ACM Trans. Netw. 13, 6, 1205--1218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509--512.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bernaille, L., Teixeira, R., and Salamatian, K. 2006. Early application identification. In Proceedings of the Conference on Emerging Network Experiment and Technology (CoNEXT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Broder, A., Kumar, S., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. 2000. Graph structure in the Web. Comput. Netw. 33, 1-6, 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Claffy, K. 1999. Internet measurement and data analysis: Topology, workload, performance and routing statistics. In Proceedings of the National Academy of Engineering Workshop (NAE’99). CAIDA.Google ScholarGoogle Scholar
  7. Claffy, K. 2006. A day in the life of the Internet: Proposed community-wide experiment. ACM SIGCOMM Comput. Comm. Rev. 36, 5, 39--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Crovella, M. and Krishnamurthy, B. 2006. Internet Measurements: Infrastructure, Traffic and Applications. Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ebel, H., Mielsch, L.-I., and Bornholdt, S. 2002. Scale-Free topology of e-mail networks. Phys. Rev. 66, 035103.Google ScholarGoogle Scholar
  10. Erman, J., Arlitt, M., and Mahanti, A. 2006. Traffic classification using clustering algorithms. In Proceedings of the ACM SIGCOMM Workshop on Mining Network Data. 281--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Erman, J., Mahanti, A., Arlitt, M., and Williamson, C. 2007. Identifying and discriminating between Web and peer-to-peer traffic in the network core. In Proceedings of the International World Wide Web Conference (WWW). 883--892. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Estan, C., Savage, S., and Varghese, G. 2003. Automatically inferring patterns of resource consumption in network traffic. In Proceedings of the ACM SIGCOMM Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. H. 2002. Heuristically optimized trade-offs: A new paradigm for power laws in the Internet. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Forrest, S., Hofmeyr, S., and Somayaji, A. 1997. Computer immunology. Comm. ACM 40, 10, 88--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Garetto, M., Gong, W., and Towsley, D. 2003. Modeling malware spreading dynamics. In Proceedings of the 22nd Annual Joint Conference of the IEEE Computer and Communications Societies (InfoCom).Google ScholarGoogle Scholar
  16. Huberman, B. and Lukose, R. 1997. Social dilemmas and Internet congestion. Science 277, 535.Google ScholarGoogle ScholarCross RefCross Ref
  17. Huberman, B., Pirolli, P., Pitkow, J., and Lukose, R. 1998. Strong regularities in World Wide Web surfing. Science 280, 5360, 95--97.Google ScholarGoogle Scholar
  18. Huffaker, B., Fomenkov, M., Moore, D., Nemeth, E., and Claffy, K. 2000. Measurements of the Internet topology in the Asia-Pacific region. In Proceedings of the Annual Conference of the Internet Society (INET’00). The Internet Society.Google ScholarGoogle Scholar
  19. Jin, C., Chen, Q., and Jamin, S. 2000. INET: Internet topology generators. Tech. rep. CSE-TR-433-00, Electrical Engineering and Computer Science Department, University of Michigan.Google ScholarGoogle Scholar
  20. Karagiannis, T., Papagiannaki, K., and Faloutsos, M. 2005. BLINC: Multilevel traffic classification in the dark. In Proceedings of the ACM SIGCOMM Conference. 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Krioukov, D., Claffy, K., Fomenkov, M., Chung, F., Vespignani, A., and Willinger, W. 2007. The workshop on Internet topology (WIT) report. ACM SIGCOMM Comput. Comm. Rev. 37, 1, 69--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kumar, S., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. 2000. Stochastic models for the Web graph. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, 57--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lakhina, A., Crovella, M., and Diot, C. 2004a. Characterization of network-wide anomalies in traffic flows. In Proceedings of the ACM SIGCOMM Internet Measurement Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lakhina, A., Crovella, M., and Diot, C. 2004b. Diagnosing network-wide traffic anomalies. In Proceedings of the ACM SIGCOMM Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lakhina, A., Papagiannaki, K., Crovella, M., Diot, C., Kolaczyk, E. D., and Taft, N. 2004c. Structural analysis of network traffic flows. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Laura, L., Leonardi, S., Millozzi, S., Meyer, U., and Sibeyn, J. F. 2003. Algorithms and experiments for the Webgraph. In Proceedings of the European Symposium on Algorithms.Google ScholarGoogle Scholar
  27. Li, C. and Chen, C. 2007. Gnutella: Topology dynamics on phase space. Preprint cs/0702022.Google ScholarGoogle Scholar
  28. Li, L., Alderson, D., Willinger, W., and Doyle, J. 2004. A first-principles approach to understanding the Internet’s router-level topology. In Proceedings of the ACM SIGCOMM Conference. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Medina, A. and Matta, I. 2000. BRITE: A flexible generator of Internet topologies. Tech. rep. BU-CS-TR-2000-005, Boston University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Meiss, M., Menczer, F., and Vespignani, A. 2005. On the lack of typical behavior in the global Web traffic network. In Proceedings of the 14th International World Wide Web Conference. 510--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Meiss, M., Menczer, F., and Vespignani, A. 2007. A framework for analysis of anonymized network flow data. In Proceedings of the NSF Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation.Google ScholarGoogle Scholar
  32. Meiss, M., Menczer, F., Fortunato, S., Flammini, A., and Vespignani, A. 2008a. Ranking Web sites with real user traffic. In Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Meiss, M., Menczer, F., and Vespignani, A. 2008b. Structural analysis of behavioral networks from the Internet. J. Phys. A: Math. Theor. 41, 22.Google ScholarGoogle ScholarCross RefCross Ref
  34. Menczer, F. 2002. Growing and navigating the small world Web by local content. Proc. Nat. Acad. Sci. 99, 22, 14014--14019.Google ScholarGoogle ScholarCross RefCross Ref
  35. Menczer, F. 2004. The evolution of document networks. Proc. Natl. Acad. Sci. 101, 5261--5265.Google ScholarGoogle ScholarCross RefCross Ref
  36. Moore, A. W. and Zuev, D. 2005. Internet traffic classification using Bayesian analysis techniques. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 50--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Moore, D., Voelker, G., and Savage, S. 2001. Inferring Internet denial of service activity. In Proceedings of the USENIX Security Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Moore, D., Shannon, C., and Brown, J. 2002. Code-Red: A case study on the spread and victims of an Internet worm. In Proceedings of the 2nd Internet Measurement Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Newman, M. E. J., Forrest, S., and Balthrop, J. 2002. E-Mail networks and the spread of computer viruses. Phys. Rev. E 66, 035101.Google ScholarGoogle ScholarCross RefCross Ref
  40. Pastor-Satorras, R. and Vespignani, A. 2001. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200--203.Google ScholarGoogle ScholarCross RefCross Ref
  41. Pastor-Satorras, R. and Vespignani, A. 2004. Evolution and Structure of the Internet. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Patwari, N., Hero III, A. O., and Pacholski, A. 2005. Manifold learning visualization of network traffic data. In Proceedings of the ACM SIGCOMM Workshop on Mining Network Data. 191--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ripeanu, M., Foster, I., and Iamnitchi, A. 2002. Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Comput. 6, 1, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Saroiu, S., Gummadi, P. K., and Gribble, S. D. 2002. A measurement study of peer-to-peer file sharing systems. In Proceedings of the Multimedia Computing and Networking Conference (MMCN’02).Google ScholarGoogle Scholar
  45. Shavitt, Y., Sun, X., Wool, A., and Yener, B. 2004. Computing the unmeasured: An algebraic approach to Internet mapping. IEEE J. Select. Areas Comm. 22, 1, 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Singh, S., Estan, C., Varghese, G., and Savage, S. 2004. Automated worm fingerprinting. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Staniford, S., Paxson, V., and Weaver, N. 2002. How to own the Internet in your spare time. In Proceedings of the 11th USENIX Security Symposium (Security’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Uhlig, S. and Bonaventure, O. 2001. The macroscopic behavior of Internet traffic: A comparative study. Tech. rep., Infonet-TR-2001-10, University of Namur.Google ScholarGoogle Scholar
  49. Ward, J. H. 1963. Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58, 301, 236--244.Google ScholarGoogle ScholarCross RefCross Ref
  50. Yook, S.-H., Jeong, H., and Barabási, A.-L. 2002. Modeling the Internet’s large-scale topology. Proc. Nat. Acad. Sci. 99, 13382--13386.Google ScholarGoogle ScholarCross RefCross Ref
  51. Zhang, Y., Singh, S., Sen, S., Duffield, N., and Lund, C. 2004. Online identification of hierarchical heavy hitters: Algorithms, evaluation, and applications. In Proceedings of the Internet Measurement Conference. 101--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zou, C., Towsley, D., and Gong, W. 2004. Email worm modeling and defense. In Proceedings of the 13th International Conference on Computer Communications and Networks (ICCCN’04).Google ScholarGoogle Scholar

Index Terms

  1. Properties and Evolution of Internet Traffic Networks from Anonymized Flow Data

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!