Abstract
The Named Data Networking (NDN) architecture retrieves content by names rather than connecting to specific hosts. It provides benefits such as highly efficient and resilient content distribution, which fit well to data-intensive distributed computing. This paper presents and discusses our experience in modifying Apache Hadoop, a popular MapReduce framework, to operate on an NDN network. Through this first-of-its-kind implementation process, we demonstrate the feasibility of running an existing, large, and complex piece of distributed software commonly seen in data centers over NDN. We show advantages such as simplified network code and reduced network traffic which are beneficial in a data center environment. There are also challenges faced by NDN, that are being addressed by the community, which can be magnified under data center traffic. Through detailed evaluation, we show a reduction of 16% for overall data transmission between Hadoop nodes while writing data with default replication settings. Preliminary results also show promise for in-network caching of repeated reads in distributed applications. We also show that overall performance is currently slower under NDN, and we identify challenges and opportunities for further NDN improvements.
- M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In ACM SIGCOMM Computer Communication Review, volume 38, pages 63--74. ACM, 2008. Google Scholar
Digital Library
- Arista. Networking in the Hadoop Cluster. https://www.arista.com/assets/data/pdf/TechBulletins/\\NetworkingInTheHadoopCluster.pdf, 2016.Google Scholar
- A. S. Bonifacio, A. Menolli, and F. Silva. Hadoop mapreduce configuration parameters and system performance: a systematic review. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), page 1. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2014.Google Scholar
- S. Byan, J. Lentini, A. Madan, and L. Pabon. Mercury: Host-side flash caching for the data center. In 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pages 1--12. IEEE, 2012.Google Scholar
- Google. Protocol Buffers | Google Developers. https://developers.google.com/protocol-buffers/, 2016.Google Scholar
- Intel. HiBench: HiBench is a big data benchmark suite. https://github.com/intel-hadoop/HiBench, 2016.Google Scholar
- V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, and R. L. Braynard. Networking named content. In Proceedings of the 5th international conference on Emerging networking experiments and technologies, pages 1--12. ACM, 2009. Google Scholar
Digital Library
- S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, pages 3--14, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- B. J. Ko, V. Pappas, R. Raghavendra, Y. Song, R. B. Dilmaghani, K.-w. Lee, and D. Verma. An information-centric architecture for data center networks. In Proceedings of the second edition of the ICN workshop on Information-centric networking, pages 79--84. ACM, 2012. Google Scholar
Digital Library
- D. Kreutz, F. M. V. Ramos, P. Veríssimo, C. E. Rothenberg, S. Azodolmolky, and S. Uhlig. Software-Defined Networking: A Comprehensive Survey. Proceedings of the IEEE, 103(1):63, 2015.Google Scholar
- P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff, S. Saini, and R. Biswas. Performance evaluation of Amazon EC2 for NASA HPC applications. In Proceedings of the 3rd workshop on Scientific Cloud Computing Date, pages 41--50. ACM, 2012. Google Scholar
Digital Library
- Palo Alto Research Center. CCNx | PARC's implementation of content-centric networking. https://blogs.parc.com/ccnx/, 2016.Google Scholar
- K. Schneider, C. Yi, B. Zhang, and L. Zhang. A practical congestion control scheme for named data networking. In 3rd ACM Conference on Information-Centric Networking (ICN 2016). ACM, 2016. Google Scholar
Digital Library
- J. Shafer, S. Rixner, and A. L. Cox. The hadoop distributed filesystem: Balancing portability and performance. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, pages 122--133. IEEE, 2010.Google Scholar
Cross Ref
- J. Sherwood. An implementation of content-centric networking sockets for use with hadoop. Stetson University, 12 2011. Senior research project.Google Scholar
- J. Sherwood. Proposal: An implementation of hadoop utilizing content-centric networking. Stetson University, 5 2011.Google Scholar
- J. Shi, T. Liang, H. Wu, B. Liu, and B. Zhang. Ndn-nic: Name-based filtering on network interface card. In Proceedings of the 3nd International Conference on Information-Centric Networking. ACM, 2016. Google Scholar
Digital Library
- The Apache Software Foundataion. Apache Hadoop. https://hadoop.apache.org/, 2016.Google Scholar
- The Apache Software Foundataion. Apache Hadoop - HDFS Architecture. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, 2016.Google Scholar
- The NDN Project. Named Data Networking (NDN) - A Future Internet Architecture. https://named-data.net/, 2016.Google Scholar
- The NDN Project. Tools and Applications - Named Data Networking (NDN). https://named-data.net/codebase/applications/, 2016.Google Scholar
- L. Zhang, A. Afanasyev, J. Burke, V. Jacobson, k. claffy, P. Crowley, C. Papadopoulos, L. Wang, and B. Zhang. Named data networking. SIGCOMM Comput. Commun. Rev., 44(3):66--73, July 2014. Google Scholar
Digital Library
- M. Zhu, D. Li, F. Wang, A. Li, K. Ramakrishnan, Y. Liu, J. Wu, N. Zhu, and X. Liu. CCDN: Content-Centric Data Center Networks. 2016.Google Scholar
Index Terms
Hadoop on Named Data Networking: Experience and Results
Recommendations
Hadoop on Named Data Networking: Experience and Results
SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsIn today's data centers, clusters of servers are arranged to perform various tasks in a massively distributed manner: handling web requests, processing scientific data, and running simulations of real-world problems. These clusters are very complex, and ...
Hadoop on Named Data Networking: Experience and Results
Performance evaluation reviewIn today's data centers, clusters of servers are arranged to perform various tasks in a massively distributed manner: handling web requests, processing scientific data, and running simulations of real-world problems. These clusters are very complex, and ...
A Partial Cache for Multimedia Content in Named Data Networking
PLATCON '15: Proceedings of the 2015 International Conference on Platform Technology and ServiceNamed Data Networking (NDN) is a novel transmission framework in future Internet. It is a content-centric network. The content is routed based on its unique name, not IP address in NDN. A NDN node broadcasts an interest packet with the requesting named ...






Comments