Abstract
Many projects have tried to analyze the structure and dynamics of application overlay networks on the Internet using packet analysis and network flow data. While such analysis is essential for a variety of network management and security tasks, it is infeasible on many networks: either the volume of data is so large as to make packet inspection intractable, or privacy concerns forbid packet capture and require the dissociation of network flows from users’ actual IP addresses. Our analytical framework permits useful analysis of network usage patterns even under circumstances where the only available source of data is anonymized flow records. Using this data, we are able to uncover distributions and scaling relations in host-to-host networks that bear implications for capacity planning and network application design. We also show how to classify network applications based entirely on topological properties of their overlay networks, yielding a taxonomy that allows us to accurately identify the functions of unknown applications. We repeat this analysis on a more recent dataset, allowing us to demonstrate that the aggregate behavior of users is remarkably stable even as the population changes.
- Adamic, L. A. and Huberman, B. A. 2001. The Web’s hidden order. Comm. ACM 44, 9, 55--60. Google Scholar
Digital Library
- Alderson, D., Li, L., Willinger, W., and Doyle, J. C. 2005. Understanding Internet topology: Principles, models, and validation. IEEE/ACM Trans. Netw. 13, 6, 1205--1218. Google Scholar
Digital Library
- Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 509--512.Google Scholar
Cross Ref
- Bernaille, L., Teixeira, R., and Salamatian, K. 2006. Early application identification. In Proceedings of the Conference on Emerging Network Experiment and Technology (CoNEXT). Google Scholar
Digital Library
- Broder, A., Kumar, S., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. 2000. Graph structure in the Web. Comput. Netw. 33, 1-6, 309--320. Google Scholar
Digital Library
- Claffy, K. 1999. Internet measurement and data analysis: Topology, workload, performance and routing statistics. In Proceedings of the National Academy of Engineering Workshop (NAE’99). CAIDA.Google Scholar
- Claffy, K. 2006. A day in the life of the Internet: Proposed community-wide experiment. ACM SIGCOMM Comput. Comm. Rev. 36, 5, 39--40. Google Scholar
Digital Library
- Crovella, M. and Krishnamurthy, B. 2006. Internet Measurements: Infrastructure, Traffic and Applications. Wiley & Sons. Google Scholar
Digital Library
- Ebel, H., Mielsch, L.-I., and Bornholdt, S. 2002. Scale-Free topology of e-mail networks. Phys. Rev. 66, 035103.Google Scholar
- Erman, J., Arlitt, M., and Mahanti, A. 2006. Traffic classification using clustering algorithms. In Proceedings of the ACM SIGCOMM Workshop on Mining Network Data. 281--286. Google Scholar
Digital Library
- Erman, J., Mahanti, A., Arlitt, M., and Williamson, C. 2007. Identifying and discriminating between Web and peer-to-peer traffic in the network core. In Proceedings of the International World Wide Web Conference (WWW). 883--892. Google Scholar
Digital Library
- Estan, C., Savage, S., and Varghese, G. 2003. Automatically inferring patterns of resource consumption in network traffic. In Proceedings of the ACM SIGCOMM Conference. Google Scholar
Digital Library
- Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. H. 2002. Heuristically optimized trade-offs: A new paradigm for power laws in the Internet. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). Google Scholar
Digital Library
- Forrest, S., Hofmeyr, S., and Somayaji, A. 1997. Computer immunology. Comm. ACM 40, 10, 88--96. Google Scholar
Digital Library
- Garetto, M., Gong, W., and Towsley, D. 2003. Modeling malware spreading dynamics. In Proceedings of the 22nd Annual Joint Conference of the IEEE Computer and Communications Societies (InfoCom).Google Scholar
- Huberman, B. and Lukose, R. 1997. Social dilemmas and Internet congestion. Science 277, 535.Google Scholar
Cross Ref
- Huberman, B., Pirolli, P., Pitkow, J., and Lukose, R. 1998. Strong regularities in World Wide Web surfing. Science 280, 5360, 95--97.Google Scholar
- Huffaker, B., Fomenkov, M., Moore, D., Nemeth, E., and Claffy, K. 2000. Measurements of the Internet topology in the Asia-Pacific region. In Proceedings of the Annual Conference of the Internet Society (INET’00). The Internet Society.Google Scholar
- Jin, C., Chen, Q., and Jamin, S. 2000. INET: Internet topology generators. Tech. rep. CSE-TR-433-00, Electrical Engineering and Computer Science Department, University of Michigan.Google Scholar
- Karagiannis, T., Papagiannaki, K., and Faloutsos, M. 2005. BLINC: Multilevel traffic classification in the dark. In Proceedings of the ACM SIGCOMM Conference. 229--240. Google Scholar
Digital Library
- Krioukov, D., Claffy, K., Fomenkov, M., Chung, F., Vespignani, A., and Willinger, W. 2007. The workshop on Internet topology (WIT) report. ACM SIGCOMM Comput. Comm. Rev. 37, 1, 69--73. Google Scholar
Digital Library
- Kumar, S., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. 2000. Stochastic models for the Web graph. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, 57--65. Google Scholar
Digital Library
- Lakhina, A., Crovella, M., and Diot, C. 2004a. Characterization of network-wide anomalies in traffic flows. In Proceedings of the ACM SIGCOMM Internet Measurement Conference. Google Scholar
Digital Library
- Lakhina, A., Crovella, M., and Diot, C. 2004b. Diagnosing network-wide traffic anomalies. In Proceedings of the ACM SIGCOMM Conference. Google Scholar
Digital Library
- Lakhina, A., Papagiannaki, K., Crovella, M., Diot, C., Kolaczyk, E. D., and Taft, N. 2004c. Structural analysis of network traffic flows. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 61--72. Google Scholar
Digital Library
- Laura, L., Leonardi, S., Millozzi, S., Meyer, U., and Sibeyn, J. F. 2003. Algorithms and experiments for the Webgraph. In Proceedings of the European Symposium on Algorithms.Google Scholar
- Li, C. and Chen, C. 2007. Gnutella: Topology dynamics on phase space. Preprint cs/0702022.Google Scholar
- Li, L., Alderson, D., Willinger, W., and Doyle, J. 2004. A first-principles approach to understanding the Internet’s router-level topology. In Proceedings of the ACM SIGCOMM Conference. 3--14. Google Scholar
Digital Library
- Medina, A. and Matta, I. 2000. BRITE: A flexible generator of Internet topologies. Tech. rep. BU-CS-TR-2000-005, Boston University. Google Scholar
Digital Library
- Meiss, M., Menczer, F., and Vespignani, A. 2005. On the lack of typical behavior in the global Web traffic network. In Proceedings of the 14th International World Wide Web Conference. 510--18. Google Scholar
Digital Library
- Meiss, M., Menczer, F., and Vespignani, A. 2007. A framework for analysis of anonymized network flow data. In Proceedings of the NSF Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation.Google Scholar
- Meiss, M., Menczer, F., Fortunato, S., Flammini, A., and Vespignani, A. 2008a. Ranking Web sites with real user traffic. In Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM). Google Scholar
Digital Library
- Meiss, M., Menczer, F., and Vespignani, A. 2008b. Structural analysis of behavioral networks from the Internet. J. Phys. A: Math. Theor. 41, 22.Google Scholar
Cross Ref
- Menczer, F. 2002. Growing and navigating the small world Web by local content. Proc. Nat. Acad. Sci. 99, 22, 14014--14019.Google Scholar
Cross Ref
- Menczer, F. 2004. The evolution of document networks. Proc. Natl. Acad. Sci. 101, 5261--5265.Google Scholar
Cross Ref
- Moore, A. W. and Zuev, D. 2005. Internet traffic classification using Bayesian analysis techniques. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 50--60. Google Scholar
Digital Library
- Moore, D., Voelker, G., and Savage, S. 2001. Inferring Internet denial of service activity. In Proceedings of the USENIX Security Symposium. Google Scholar
Digital Library
- Moore, D., Shannon, C., and Brown, J. 2002. Code-Red: A case study on the spread and victims of an Internet worm. In Proceedings of the 2nd Internet Measurement Workshop. Google Scholar
Digital Library
- Newman, M. E. J., Forrest, S., and Balthrop, J. 2002. E-Mail networks and the spread of computer viruses. Phys. Rev. E 66, 035101.Google Scholar
Cross Ref
- Pastor-Satorras, R. and Vespignani, A. 2001. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200--203.Google Scholar
Cross Ref
- Pastor-Satorras, R. and Vespignani, A. 2004. Evolution and Structure of the Internet. Cambridge University Press, Cambridge, UK. Google Scholar
Digital Library
- Patwari, N., Hero III, A. O., and Pacholski, A. 2005. Manifold learning visualization of network traffic data. In Proceedings of the ACM SIGCOMM Workshop on Mining Network Data. 191--196. Google Scholar
Digital Library
- Ripeanu, M., Foster, I., and Iamnitchi, A. 2002. Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Comput. 6, 1, 50--57. Google Scholar
Digital Library
- Saroiu, S., Gummadi, P. K., and Gribble, S. D. 2002. A measurement study of peer-to-peer file sharing systems. In Proceedings of the Multimedia Computing and Networking Conference (MMCN’02).Google Scholar
- Shavitt, Y., Sun, X., Wool, A., and Yener, B. 2004. Computing the unmeasured: An algebraic approach to Internet mapping. IEEE J. Select. Areas Comm. 22, 1, 67--78. Google Scholar
Digital Library
- Singh, S., Estan, C., Varghese, G., and Savage, S. 2004. Automated worm fingerprinting. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation. Google Scholar
Digital Library
- Staniford, S., Paxson, V., and Weaver, N. 2002. How to own the Internet in your spare time. In Proceedings of the 11th USENIX Security Symposium (Security’02). Google Scholar
Digital Library
- Uhlig, S. and Bonaventure, O. 2001. The macroscopic behavior of Internet traffic: A comparative study. Tech. rep., Infonet-TR-2001-10, University of Namur.Google Scholar
- Ward, J. H. 1963. Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58, 301, 236--244.Google Scholar
Cross Ref
- Yook, S.-H., Jeong, H., and Barabási, A.-L. 2002. Modeling the Internet’s large-scale topology. Proc. Nat. Acad. Sci. 99, 13382--13386.Google Scholar
Cross Ref
- Zhang, Y., Singh, S., Sen, S., Duffield, N., and Lund, C. 2004. Online identification of hierarchical heavy hitters: Algorithms, evaluation, and applications. In Proceedings of the Internet Measurement Conference. 101--114. Google Scholar
Digital Library
- Zou, C., Towsley, D., and Gong, W. 2004. Email worm modeling and defense. In Proceedings of the 13th International Conference on Computer Communications and Networks (ICCCN’04).Google Scholar
Index Terms
Properties and Evolution of Internet Traffic Networks from Anonymized Flow Data
Recommendations
Evolutionary algorithms for the self-organized evolution of networks
GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computationWhile the evolution of biological networks can be modeled sensefully as a series of mutation and selection, evolution of other networks such as the social network in a city or the network of streets in a country is not determined by selection since ...
Preservation of structural properties in anonymized social networks
COLLABORATECOM '12: Proceedings of the 2012 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2012)Social networks such as Facebook, LinkedIn, or Twitter have nowadays a global reach that surpassed all previous expectations. Many social networks gather confidential information of their users, and as a result, the privacy in social networks has become ...
Preservation of Centrality Measures in Anonymized Social Networks
SOCIALCOM '13: Proceedings of the 2013 International Conference on Social ComputingSocial media sites became a pervasive presence in the nowadays society. We can learn a lot of useful information about human behavior and interaction by paying attention to the information and relations of social media users. This information can be ...






Comments